Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

阅读量：4323 次

发布时间：2019-06-06

本文共 1146 字，大约阅读时间需要 3 分钟。

fast feedback to robot with better shape reward func, and learning could be much faster

open ai baseline

rllab

multiple tasks and multiple seeds to test the robustness.

don't believe only one trial's result, it could just be a fortunate trial, unless the imporvement is huge.

KL = 0.1 is a small update

KL = 10 is a large update

DQN is not effective enough in many problems, especially on continuous control problem.

But that doesn't mean it's a bad algorithm.

So you shouldn't expect an algorithm solving everything without tunning, at least now.

batch norm, dropout, or big networks? no, we try 2 layers with 64 units.

at least now these techniques are not suitable for RL.

if you don't care much about sample complexity, PG are probably the way to go.

qlearning is more implicit what is going in it , while PG is just gradient descent

dqn and it relatives work well game like image as input, while policy based works better on continuous control tasks, like locomotion

https://en.wikipedia.org/wiki/Sample_complexity

I use randomly initialization of hyperparameters........

audience laugh....

转载于:https://www.cnblogs.com/ecoflex/p/8977582.html

你可能感兴趣的文章

2018APIO 进京赶考

Duilib程序添加托盘图标显示

在windows上搭建redis集群(redis-cluster)

【省选十连测之九】【DP】【组合计数去重】【欧拉函数】基本题

文件上传功能 -- jquery.form.js/springmvc

阿里云ecs(phpstudy一件包）

Python核心编程的四大神兽：迭代器、生成器、闭包以及装饰器

linux /proc/sys/fs/file-nr /proc/sys/fs/file-max /etc/security/limits.conf 三者的关联

AndroidStudio-快捷键

用python DIY一个图片转pdf工具并打包成exe

6月14 空控制器和空操作及命名空间

volicity文法学习和总结

block 块的内部结构

IDEA修改git账号密码

C# 插入排序

9_2二维数组

为django项目创建虚拟环境

30-RoutingMiddleware介绍以及MVC引入

【转】AB实验设计思路及实验落地

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-10-06 16:23:24 当前IP: 3.146.35.72 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我