• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 19:00
CET 01:00
KST 09:00
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10[ASL20] Finals Preview: Arrival13TL.net Map Contest #21: Voting12[ASL20] Ro4 Preview: Descent11Team TLMC #5: Winners Announced!3
Community News
$5,000+ WardiTV 2025 Championship4[BSL21] RO32 Group Stage3Weekly Cups (Oct 26-Nov 2): Liquid, Clem, Solar win; LAN in Philly2Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win92025 RSL Offline Finals Dates + Ticket Sales!10
StarCraft 2
General
Starcraft, SC2, HoTS, WC3, returning to Blizzcon! RotterdaM "Serral is the GOAT, and it's not close" Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win 5.0.15 Patch Balance Hotfix (2025-10-8) TL.net Map Contest #21: Voting
Tourneys
$5,000+ WardiTV 2025 Championship Sea Duckling Open (Global, Bronze-Diamond) $3,500 WardiTV Korean Royale S4 WardiTV Mondays Sparkling Tuna Cup - Weekly Open Tournament
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 498 Wheel of Misfortune|Cradle of Death Mutation # 497 Battle Haredened Mutation # 496 Endless Infection Mutation # 495 Rest In Peace
Brood War
General
SnOw's ASL S20 Finals Review [BSL21] RO32 Group Stage BGH Auto Balance -> http://bghmmr.eu/ Practice Partners (Official) [ASL20] Ask the mapmakers — Drop your questions
Tourneys
BSL21 Open Qualifiers Week & CONFIRM PARTICIPATION [ASL20] Grand Finals Small VOD Thread 2.0 The Casual Games of the Week Thread
Strategy
Current Meta How to stay on top of macro? PvZ map balance Soma's 9 hatch build from ASL Game 2
Other Games
General Games
Stormgate/Frost Giant Megathread Dawn of War IV Nintendo Switch Thread ZeroSpace Megathread General RTS Discussion Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread SPIRED by.ASL Mafia {211640}
Community
General
Russo-Ukrainian War Thread US Politics Mega-thread Dating: How's your luck? Things Aren’t Peaceful in Palestine Canadian Politics Mega-thread
Fan Clubs
White-Ra Fan Club The herO Fan Club!
Media & Entertainment
[Manga] One Piece Movie Discussion! Anime Discussion Thread Korean Music Discussion Series you have seen recently...
Sports
2024 - 2026 Football Thread NBA General Discussion MLB/Baseball 2023 TeamLiquid Health and Fitness Initiative For 2023 Formula 1 Discussion
World Cup 2022
Tech Support
SC2 Client Relocalization [Change SC2 Language] Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List Recent Gifted Posts
Blogs
Why we need SC3
Hildegard
Career Paths and Skills for …
TrAiDoS
Reality "theory" prov…
perfectspheres
Our Last Hope in th…
KrillinFromwales
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1536 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
OSC
23:00
OSC Elite Rising Star #17
Liquipedia
LAN Event
18:00
Merivale 8: Swiss Groups Day 2
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
SpeCial 61
CosmosSc2 31
StarCraft: Brood War
Shuttle 575
Artosis 538
NaDa 52
Dota 2
LuMiX1
Counter-Strike
Foxcn270
Super Smash Bros
hungrybox914
Mew2King42
AZ_Axe15
Other Games
tarik_tv5790
Grubby2166
C9.Mang0196
Maynarde113
ZombieGrub67
ViBE6
Organizations
Counter-Strike
PGL173
Other Games
gamesdonequick88
StarCraft 2
CranKy Ducklings80
Other Games
BasetradeTV67
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 15 non-featured ]
StarCraft 2
• Hupsaiya 78
• RyuSc2 56
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• masondota21054
Other Games
• imaqtpie1082
• Scarra710
Upcoming Events
The PondCast
10h
LAN Event
15h
Replay Cast
23h
OSC
1d 12h
LAN Event
1d 15h
Korean StarCraft League
2 days
CranKy Ducklings
2 days
WardiTV Korean Royale
2 days
LAN Event
2 days
IPSL
2 days
dxtr13 vs OldBoy
Napoleon vs Doodle
[ Show More ]
BSL 21
2 days
Gosudark vs Kyrie
Gypsy vs Sterling
UltrA vs Radley
Dandy vs Ptak
Replay Cast
2 days
Sparkling Tuna Cup
3 days
WardiTV Korean Royale
3 days
LAN Event
3 days
IPSL
3 days
JDConan vs WIZARD
WolFix vs Cross
BSL 21
3 days
spx vs rasowy
HBO vs KameZerg
Cross vs Razz
dxtr13 vs ZZZero
Replay Cast
4 days
Wardi Open
4 days
WardiTV Korean Royale
5 days
Replay Cast
6 days
Kung Fu Cup
6 days
Classic vs Solar
herO vs Cure
Reynor vs GuMiho
ByuN vs ShoWTimE
Tenacious Turtle Tussle
6 days
Liquipedia Results

Completed

BSL 21 Points
SC4ALL: StarCraft II
Eternal Conflict S1

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
SOOP Univ League 2025
YSL S2
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
Esports World Cup 2025

Upcoming

BSL Season 21
SLON Tour Season 2
BSL 21 Non-Korean Championship
Acropolis #4
HSC XXVIII
RSL Offline Finals
WardiTV 2025
RSL Revival: Season 3
Stellar Fest
META Madness #9
LHT Stage 1
BLAST Bounty Winter 2026: Closed Qualifier
eXTREMESLAND 2025
ESL Impact League Season 8
SL Budapest Major 2025
BLAST Rivals Fall 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.