Reinforcement Learning: Fixed-Point Estimate of State-Action Value Function & Least-Squares Policy Iteration
 
Runtime: 42m 33s | Views: 1883 | Comments: 0

Add Video To Your Social Bookmarks!
Comment On This Video
 

Please help keep this site FUN, CLEAN, and REAL..
Flag this video : Feature This! - Innappropiate

Video Information
 
Rating (0 votes): Not yet rated
Added: 02-11-2010
From: admin ( Send PM )
(55) | (0) | (0)

Description:
<a href="http://www.searching-eye.com/sanjeevs.html" target="_blank">SANJEEV SHARMA</a> : 2nd Nov 2010: Reinforcement Learning: Phase-II, Presentation-2: ARL-10/11 - (Lecture-2): Fixed-Point Estimation of State-Action Value Function & Least-Squares Policy Iteration. <br> <br> CONTENTS: Fixed-Point Estimation of State-Action Value Function, Lease-Squares Policy Iteration (LSPI). <br> <br> DESCRIPTION: LSPI is an off-policy algorithm. It's a modification over the LSTD. LSPI has an advantage over the LSTD algorithm, for it doesn't require the new samples to be collected for the computation of value function for each new policy. Being an off-policy algorithm, it can accept samples from any random policy. This provides the data-efficiency to the LSPI algorithm which was not possessed LSTD Algorithm. LSPI, however uses an algorithm, LSQ (LSTDQ) to compute the value function for a policy, which is actually the off-policy innovation in the LSPI algorithm. LSPI returns the Fixed-Point Solution which is discussed in detail in the lecture.
Channels: Reinforcement Learning. 
Tags: LSPI,  Fixed-Point  Solution,  Bellman  Operator,  Acrobot,  Chain-Walk-Domain 

Video URL :
Embed URL: