• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 17:25
CET 23:25
KST 07:25
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Season 3 - Playoffs Preview0RSL Season 3 - RO16 Groups C & D Preview0RSL Season 3 - RO16 Groups A & B Preview2TL.net Map Contest #21: Winners12Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10
Community News
Weekly Cups (Nov 24-30): MaxPax, Clem, herO win2BGE Stara Zagora 2026 announced15[BSL21] Ro.16 Group Stage (C->B->A->D)4Weekly Cups (Nov 17-23): Solar, MaxPax, Clem win3RSL Season 3: RO16 results & RO8 bracket13
StarCraft 2
General
Chinese SC2 server to reopen; live all-star event in Hangzhou Maestros of the Game: Live Finals Preview (RO4) BGE Stara Zagora 2026 announced Weekly Cups (Nov 24-30): MaxPax, Clem, herO win SC2 Proleague Discontinued; SKT, KT, SGK, CJ disband
Tourneys
Sparkling Tuna Cup - Weekly Open Tournament RSL Offline Finals Info - Dec 13 and 14! StarCraft Evolution League (SC Evo Biweekly) Sea Duckling Open (Global, Bronze-Diamond) $5,000+ WardiTV 2025 Championship
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 503 Fowl Play Mutation # 502 Negative Reinforcement Mutation # 501 Price of Progress Mutation # 500 Fright night
Brood War
General
The top three worst maps of all time Foreign Brood War BGH Auto Balance -> http://bghmmr.eu/ Data analysis on 70 million replays BW General Discussion
Tourneys
Small VOD Thread 2.0 [Megathread] Daily Proleagues [BSL21] RO16 Group D - Sunday 21:00 CET [BSL21] RO16 Group A - Saturday 21:00 CET
Strategy
Current Meta Game Theory for Starcraft How to stay on top of macro? PvZ map balance
Other Games
General Games
Nintendo Switch Thread Stormgate/Frost Giant Megathread Path of Exile ZeroSpace Megathread The Perfect Game
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Mafia Game Mode Feedback/Ideas TL Mafia Community Thread
Community
General
US Politics Mega-thread Things Aren’t Peaceful in Palestine European Politico-economics QA Mega-thread Russo-Ukrainian War Thread The Big Programming Thread
Fan Clubs
White-Ra Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece Movie Discussion!
Sports
2024 - 2026 Football Thread Formula 1 Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
Where to ask questions and add stream? The Automated Ban List
Blogs
I decided to write a webnov…
DjKniteX
Physical Exertion During Gam…
TrAiDoS
James Bond movies ranking - pa…
Topin
Thanks for the RSL
Hildegard
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1549 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
BSL 21
20:00
RO16: Group D
Bonyth vs StRyKeR
Tarson vs Dandy
ZZZero.O331
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Liquid`TLO 308
JuggernautJason180
ProTech130
CosmosSc2 109
PiGStarcraft15
elazer 11
StarCraft: Brood War
Shuttle 594
ZZZero.O 331
Dota 2
Dendi1258
syndereN331
LuMiX1
League of Legends
Nathanias9
Counter-Strike
fl0m3766
Heroes of the Storm
Khaldor185
Other Games
Grubby6945
shahzam332
XaKoH 105
Mew2King97
fpsfer 2
Organizations
Other Games
EGCTV2587
gamesdonequick1581
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 16 non-featured ]
StarCraft 2
• Hupsaiya 19
• intothetv
• AfreecaTV YouTube
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• WagamamaTV648
• Ler92
League of Legends
• Doublelift281
Other Games
• imaqtpie2068
• Shiphtur274
Upcoming Events
Replay Cast
10h 35m
Wardi Open
13h 35m
StarCraft2.fi
17h 35m
Monday Night Weeklies
18h 35m
Replay Cast
1d 1h
WardiTV 2025
1d 13h
StarCraft2.fi
1d 17h
PiGosaur Monday
2 days
StarCraft2.fi
2 days
Tenacious Turtle Tussle
3 days
[ Show More ]
The PondCast
3 days
WardiTV 2025
3 days
StarCraft2.fi
3 days
WardiTV 2025
4 days
StarCraft2.fi
5 days
RSL Revival
5 days
IPSL
5 days
Sziky vs JDConan
RSL Revival
6 days
Classic vs TBD
herO vs Zoun
WardiTV 2025
6 days
IPSL
6 days
Tarson vs DragOn
Liquipedia Results

Completed

Proleague 2025-12-04
RSL Revival: Season 3
Light HT

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
YSL S2
BSL Season 21
Slon Tour Season 2
Acropolis #4 - TS3
WardiTV 2025
META Madness #9
Kuram Kup
SL Budapest Major 2025
ESL Impact League Season 8
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22

Upcoming

BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
Bellum Gens Elite Stara Zagora 2026
HSC XXVIII
Big Gabe Cup #3
RSL Offline Finals
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual
eXTREMESLAND 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.