• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 18:19
CET 00:19
KST 08:19
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
herO wins SC2 All-Star Invitational10SC2 All-Star Invitational: Tournament Preview5RSL Revival - 2025 Season Finals Preview8RSL Season 3 - Playoffs Preview0RSL Season 3 - RO16 Groups C & D Preview0
Community News
Weekly Cups (Jan 12-18): herO, MaxPax, Solar win0BSL Season 2025 - Full Overview and Conclusion8Weekly Cups (Jan 5-11): Clem wins big offline, Trigger upsets4$21,000 Rongyi Cup Season 3 announced (Jan 22-Feb 7)17Weekly Cups (Dec 29-Jan 4): Protoss rolls, 2v2 returns7
StarCraft 2
General
Starcraft 2 will not be in the Esports World Cup herO wins SC2 All-Star Invitational PhD study /w SC2 - help with a survey! SC2 Spotted on the EWC 2026 list? When will we find out if there are more tournament
Tourneys
$21,000 Rongyi Cup Season 3 announced (Jan 22-Feb 7) OSC Season 13 World Championship $70 Prize Pool Ladder Legends Academy Weekly Open! SC2 All-Star Invitational: Jan 17-18 Sparkling Tuna Cup - Weekly Open Tournament
Strategy
Simple Questions Simple Answers
Custom Maps
[A] Starcraft Sound Mod
External Content
Mutation # 509 Doomsday Report Mutation # 508 Violent Night Mutation # 507 Well Trained Mutation # 506 Warp Zone
Brood War
General
Which foreign pros are considered the best? [ASL21] Potential Map Candidates BW General Discussion BW AKA finder tool Gypsy to Korea
Tourneys
[Megathread] Daily Proleagues [BSL21] Non-Korean Championship - Starts Jan 10 Small VOD Thread 2.0 Azhi's Colosseum - Season 2
Strategy
Current Meta Simple Questions, Simple Answers Soma's 9 hatch build from ASL Game 2 Game Theory for Starcraft
Other Games
General Games
Battle Aces/David Kim RTS Megathread Nintendo Switch Thread Stormgate/Frost Giant Megathread Beyond All Reason Awesome Games Done Quick 2026!
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia Mafia Game Mode Feedback/Ideas
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread NASA and the Private Sector Canadian Politics Mega-thread Things Aren’t Peaceful in Palestine
Fan Clubs
The herO Fan Club! The IdrA Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece
Sports
2024 - 2026 Football Thread
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List
Blogs
Navigating the Risks and Rew…
TrAiDoS
My 2025 Magic: The Gathering…
DARKING
Life Update and thoughts.
FuDDx
How do archons sleep?
8882
James Bond movies ranking - pa…
Topin
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1463 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
Next event in 11h 42m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
PiGStarcraft411
Nathanias 115
JuggernautJason65
SpeCial 5
StarCraft: Brood War
Artosis 191
Shuttle 176
Dota 2
syndereN311
Pyrionflax217
LuMiX1
League of Legends
C9.Mang0183
Counter-Strike
fl0m4942
byalli2202
Fnx 1649
rGuardiaN76
Super Smash Bros
hungrybox457
Other Games
summit1g5828
tarik_tv5411
Grubby2193
FrodaN1838
shahzam544
KnowMe39
ZombieGrub25
ViBE19
Maynarde12
Liquid`Ken4
Organizations
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 19 non-featured ]
StarCraft 2
• Hupsaiya 85
• mYiSmile130
• Laughngamez YouTube
• sooper7s
• AfreecaTV YouTube
• intothetv
• Kozan
• Migwel
• IndyKCrew
• LaughNgamezSOOP
StarCraft: Brood War
• RayReign 1
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• masondota21095
• WagamamaTV124
League of Legends
• Doublelift3065
Other Games
• imaqtpie3563
• Shiphtur239
Upcoming Events
RongYI Cup
11h 42m
ByuN vs TriGGeR
herO vs Rogue
OSC
11h 42m
herO vs Clem
Cure vs TBD
Solar vs TBD
Classic vs TBD
RongYI Cup
1d 11h
Clem vs ShoWTimE
Zoun vs Bunny
Big Brain Bouts
1d 17h
Serral vs TBD
RongYI Cup
2 days
SHIN vs Creator
Classic vs Percival
OSC
2 days
BSL 21
2 days
RongYI Cup
3 days
Maru vs Cyan
Solar vs Krystianer
uThermal 2v2 Circuit
3 days
BSL 21
3 days
[ Show More ]
Wardi Open
4 days
Monday Night Weeklies
4 days
OSC
5 days
WardiTV Invitational
5 days
WardiTV Invitational
6 days
Liquipedia Results

Completed

Proleague 2026-01-20
SC2 All-Star Inv. 2025
NA Kuram Kup

Ongoing

C-Race Season 1
BSL 21 Non-Korean Championship
CSL 2025 WINTER (S19)
KCM Race Survival 2026 Season 1
OSC Championship Season 13
Underdog Cup #3
BLAST Bounty Winter Qual
eXTREMESLAND 2025
SL Budapest Major 2025
ESL Impact League Season 8
BLAST Rivals Fall 2025
IEM Chengdu 2025

Upcoming

Escore Tournament S1: W5
Acropolis #4 - TS4
Acropolis #4
IPSL Spring 2026
uThermal 2v2 2026 Main Event
Bellum Gens Elite Stara Zagora 2026
HSC XXVIII
Rongyi Cup S3
Nations Cup 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
ESL Pro League Season 23
ESL Pro League Season 23
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.