• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 22:14
CET 03:14
KST 11:14
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
ByuL: The Forgotten Master of ZvT30Behind the Blue - Team Liquid History Book19Clem wins HomeStory Cup 289HomeStory Cup 28 - Info & Preview13Rongyi Cup S3 - Preview & Info8
Community News
BGE Stara Zagora 2026 cancelled10Blizzard Classic Cup - Tastosis announced as captains12Weekly Cups (March 2-8): ByuN overcomes PvT block4GSL CK - New online series18BSL Season 224
StarCraft 2
General
BGE Stara Zagora 2026 cancelled BGE Stara Zagora 2026 announced ByuL: The Forgotten Master of ZvT Terran AddOns placement Blizzard Classic Cup - Tastosis announced as captains
Tourneys
[GSL CK] Team Maru vs. Team herO StarCraft Evolution League (SC Evo Biweekly) WardiTV Team League Season 10 Master Swan Open (Global Bronze-Master 2) RSL Season 4 announced for March-April
Strategy
Custom Maps
Publishing has been re-enabled! [Feb 24th 2026] Map Editor closed ?
External Content
The PondCast: SC2 News & Results Mutation # 516 Specter of Death Mutation # 515 Together Forever Mutation # 514 Ulnar New Year
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ ASL21 General Discussion BW General Discussion Gypsy to Korea Are you ready for ASL 21? Hype VIDEO
Tourneys
[Megathread] Daily Proleagues [BSL22] Open Qualifiers & Ladder Tours IPSL Spring 2026 is here! ASL Season 21 Qualifiers March 7-8
Strategy
Simple Questions, Simple Answers Soma's 9 hatch build from ASL Game 2 Fighting Spirit mining rates Zealot bombing is no longer popular?
Other Games
General Games
Stormgate/Frost Giant Megathread Path of Exile Nintendo Switch Thread PC Games Sales Thread No Man's Sky (PS4 and PC)
Dota 2
Official 'what is Dota anymore' discussion The Story of Wings Gaming
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Five o'clock TL Mafia Mafia Game Mode Feedback/Ideas Vanilla Mini Mafia TL Mafia Community Thread
Community
General
US Politics Mega-thread Things Aren’t Peaceful in Palestine Russo-Ukrainian War Thread Mexico's Drug War NASA and the Private Sector
Fan Clubs
The IdrA Fan Club
Media & Entertainment
Movie Discussion! [Req][Books] Good Fantasy/SciFi books [Manga] One Piece
Sports
Formula 1 Discussion 2024 - 2026 Football Thread General nutrition recommendations Cricket [SPORT] TL MMA Pick'em Pool 2013
World Cup 2022
Tech Support
Laptop capable of using Photoshop Lightroom?
TL Community
The Automated Ban List
Blogs
Iranian anarchists: organize…
XenOsky
FS++
Kraekkling
Shocked by a laser…
Spydermine0240
Gaming-Related Deaths
TrAiDoS
Unintentional protectionism…
Uldridge
ASL S21 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1632 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
Replay Cast
00:00
Code For Giants Cup #28
CranKy Ducklings125
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
PiLiPiLi 16
StarCraft: Brood War
NaDa 60
Dota 2
monkeys_forever462
LuMiX1
Counter-Strike
taco 929
Super Smash Bros
C9.Mang0349
hungrybox314
Other Games
summit1g13046
JimRising 412
ViBE65
RuFF_SC246
Organizations
Other Games
gamesdonequick2136
ComeBackTV 100
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 19 non-featured ]
StarCraft 2
• Hupsaiya 198
• davetesta84
• CranKy Ducklings SOOP4
• sooper7s
• Migwel
• LaughNgamezSOOP
• IndyKCrew
• Kozan
• intothetv
• AfreecaTV YouTube
StarCraft: Brood War
• RayReign 59
• HerbMon 8
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• masondota21226
League of Legends
• Doublelift5181
• Lourlo349
Other Games
• Scarra1253
Upcoming Events
CranKy Ducklings
7h 46m
RSL Revival
7h 46m
MaxPax vs Rogue
Clem vs Bunny
WardiTV Team League
9h 46m
uThermal 2v2 Circuit
14h 46m
BSL
17h 46m
Sparkling Tuna Cup
1d 7h
RSL Revival
1d 7h
ByuN vs SHIN
Maru vs Krystianer
WardiTV Team League
1d 9h
Patches Events
1d 14h
BSL
1d 17h
[ Show More ]
Replay Cast
1d 21h
Replay Cast
2 days
Wardi Open
2 days
Monday Night Weeklies
2 days
WardiTV Team League
3 days
GSL
4 days
The PondCast
5 days
WardiTV Team League
5 days
Replay Cast
5 days
WardiTV Team League
6 days
Liquipedia Results

Completed

Proleague 2026-03-13
WardiTV Winter 2026
Underdog Cup #3

Ongoing

KCM Race Survival 2026 Season 1
Jeongseon Sooper Cup
BSL Season 22
RSL Revival: Season 4
Nations Cup 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual

Upcoming

CSL Elite League 2026
ASL Season 21
Acropolis #4 - TS6
2026 Changsha Offline CUP
Acropolis #4
IPSL Spring 2026
CSLAN 4
HSC XXIX
uThermal 2v2 2026 Main Event
NationLESS Cup
Stake Ranked Episode 2
CS Asia Championships 2026
IEM Atlanta 2026
Asian Champions League 2026
PGL Astana 2026
BLAST Rivals Spring 2026
CCT Season 3 Global Finals
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.