• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 11:26
CET 17:26
KST 01:26
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Revival - 2025 Season Finals Preview8RSL Season 3 - Playoffs Preview0RSL Season 3 - RO16 Groups C & D Preview0RSL Season 3 - RO16 Groups A & B Preview2TL.net Map Contest #21: Winners12
Community News
Weekly Cups (Dec 15-21): Classic wins big, MaxPax & Clem take weeklies1ComeBackTV's documentary on Byun's Career !10Weekly Cups (Dec 8-14): MaxPax, Clem, Cure win4Weekly Cups (Dec 1-7): Clem doubles, Solar gets over the hump1Weekly Cups (Nov 24-30): MaxPax, Clem, herO win2
StarCraft 2
General
Weekly Cups (Dec 15-21): Classic wins big, MaxPax & Clem take weeklies ComeBackTV's documentary on Byun's Career ! Micro Lags When Playing SC2? When will we find out if there are more tournament Weekly Cups (Dec 8-14): MaxPax, Clem, Cure win
Tourneys
$5,000+ WardiTV 2025 Championship Sparkling Tuna Cup - Weekly Open Tournament $100 Prize Pool - Winter Warp Gate Masters Showdow Winter Warp Gate Amateur Showdown #1 RSL Offline Finals Info - Dec 13 and 14!
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 505 Rise From Ashes Mutation # 504 Retribution Mutation # 503 Fowl Play Mutation # 502 Negative Reinforcement
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ Klaucher discontinued / in-game color settings Anyone remember me from 2000s Bnet EAST server? How Rain Became ProGamer in Just 3 Months FlaSh on: Biggest Problem With SnOw's Playstyle
Tourneys
[BSL21] LB QuarterFinals - Sunday 21:00 CET Small VOD Thread 2.0 [Megathread] Daily Proleagues [BSL21] WB SEMIFINALS - Saturday 21:00 CET
Strategy
Simple Questions, Simple Answers Game Theory for Starcraft Current Meta Fighting Spirit mining rates
Other Games
General Games
Stormgate/Frost Giant Megathread Beyond All Reason Path of Exile General RTS Discussion Thread Nintendo Switch Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Mafia Game Mode Feedback/Ideas Survivor II: The Amazon Sengoku Mafia TL Mafia Community Thread
Community
General
US Politics Mega-thread The Games Industry And ATVI Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine YouTube Thread
Fan Clubs
White-Ra Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece Movie Discussion!
Sports
2024 - 2026 Football Thread Formula 1 Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
TL+ Announced Where to ask questions and add stream?
Blogs
The (Hidden) Drug Problem in…
TrAiDoS
I decided to write a webnov…
DjKniteX
James Bond movies ranking - pa…
Topin
Thanks for the RSL
Hildegard
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1681 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
Wardi Open
12:00
Monday #66
WardiTV1515
TKL 360
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
TKL 360
RotterdaM 236
ProTech128
Liquid`VortiX 113
DivinesiaTV 15
StarCraft: Brood War
Bisu 2584
Sea 2144
Larva 1046
Mini 1007
Light 807
Soma 656
Stork 409
Snow 352
firebathero 234
Rush 209
[ Show more ]
ggaemo 156
Sharp 116
PianO 79
Aegong 53
sorry 39
Yoon 39
Movie 34
soO 28
Bale 28
Terrorterran 25
910 24
Sacsri 14
Dota 2
singsing5061
syndereN996
XcaliburYe569
Counter-Strike
adren_tv91
Heroes of the Storm
Khaldor217
Other Games
Grubby5942
B2W.Neo1158
crisheroes358
Fuzer 326
Lowko306
hiko241
FrodaN109
Mew2King96
QueenE91
nookyyy 54
Trikslyr26
RushiSC16
Organizations
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 13 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• WagamamaTV1144
League of Legends
• Nemesis2788
• TFBlade990
Upcoming Events
Monday Night Weeklies
34m
WardiTV Invitational
1d 19h
Replay Cast
2 days
WardiTV Invitational
2 days
ByuN vs Solar
Clem vs Classic
Cure vs herO
Reynor vs MaxPax
Replay Cast
4 days
Sparkling Tuna Cup
5 days
Replay Cast
6 days
Wardi Open
6 days
Liquipedia Results

Completed

YSL S2
WardiTV 2025
META Madness #9

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
BSL Season 21
Slon Tour Season 2
CSL Season 19: Qualifier 2
eXTREMESLAND 2025
SL Budapest Major 2025
ESL Impact League Season 8
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22

Upcoming

CSL 2025 WINTER (S19)
BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
Bellum Gens Elite Stara Zagora 2026
HSC XXVIII
Big Gabe Cup #3
OSC Championship Season 13
Nations Cup 2026
ESL Pro League Season 23
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.