博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation
阅读量:4323 次
发布时间:2019-06-06

本文共 1146 字,大约阅读时间需要 3 分钟。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

fast feedback to  robot with better shape reward func, and learning could be much faster

 

 

 

 

 

 

 

 

 

 

 

open ai baseline 

rllab

 

 

 

 

 

 

 

 

 

 

 

 

 

  

multiple tasks and multiple seeds to test the robustness.

  

 don't believe only one trial's result, it could just be a fortunate trial, unless the imporvement is huge.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

KL = 0.1 is a small update

KL = 10 is a large update

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DQN is not effective enough in many problems, especially on continuous control problem.

But that doesn't mean it's a bad algorithm.

So you shouldn't expect an algorithm solving everything without tunning, at least now.

batch norm, dropout, or big networks? no, we try 2 layers with 64 units.

at least now these techniques are not suitable for RL.

 

 

 

 if you don't care much about sample complexity,  PG are probably the way to go.

 qlearning is more implicit what is going in it , while PG is just gradient descent

 dqn and it relatives work well game like image as input, while policy based works better on continuous control tasks, like locomotion

 

 

 

 

 https://en.wikipedia.org/wiki/Sample_complexity

 

 

 

 

 

 

 

 I use randomly initialization of hyperparameters........

audience laugh.... 

转载于:https://www.cnblogs.com/ecoflex/p/8977582.html

你可能感兴趣的文章
2018APIO 进京赶考
查看>>
Duilib程序添加托盘图标显示
查看>>
在windows上搭建redis集群(redis-cluster)
查看>>
【省选十连测之九】【DP】【组合计数去重】【欧拉函数】基本题
查看>>
文件上传功能 -- jquery.form.js/springmvc
查看>>
阿里云ecs(phpstudy一件包)
查看>>
Python核心编程的四大神兽:迭代器、生成器、闭包以及装饰器
查看>>
linux /proc/sys/fs/file-nr /proc/sys/fs/file-max /etc/security/limits.conf 三者的关联
查看>>
AndroidStudio-快捷键
查看>>
用python DIY一个图片转pdf工具并打包成exe
查看>>
6月14 空控制器和空操作及命名空间
查看>>
volicity文法学习和总结
查看>>
block 块的内部结构
查看>>
IDEA修改git账号密码
查看>>
C# 插入排序
查看>>
每周总结16
查看>>
9_2二维数组
查看>>
为django项目创建虚拟环境
查看>>
30-RoutingMiddleware介绍以及MVC引入
查看>>
【转】AB实验设计思路及实验落地
查看>>