• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 22:18
CEST 04:18
KST 11:18
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Team Liquid Map Contest #22: Results and Winners7Code S Season 2 (2026): RO4 and Finals Preview12TL.net Map Contest #22 - Voting & Ladder Map Selection7Code S Season 2 (2026) - RO8 Preview8[ASL21] Finals Preview: Two Legacies21
Community News
ZeroSpace at Steam NextFest - Last free demo16Weekly Cups (June 8-14): Clem and Solar double, PTR tested0RSL: S6 Finals played at BlizzCon 202611Douyu Cup 2026: $20,000 Legends Event (June 26-28)10[BSL22] Non-Korean Championship from 13 to 28 June4
StarCraft 2
General
StarCraft II 5.0.16 PTR Patch Notes may 26th Daily SC2 Player Grid - feedback wanted J188 – Nhà Cái Cá Cược Trực Tuyến Đẳng Cấp Châu Á Code S Season 2 (2026) - RO8 Preview TL Poll: How do you feel about the 5.0.16 PTR balance changes?
Tourneys
GSL CK #4 20-21th June Sparkling Tuna Cup - Weekly Open Tournament Master Swan Open (Global Bronze-Master 2) Crank Gathers Season 4: BW vs SC2 Team League Douyu Cup 2026: $20,000 Legends Event (June 26-28)
Strategy
[G] Having the right mentality to improve
Custom Maps
Work In Progress Melee Maps [D]RTS in all its shapes and glory <3
External Content
Mutation # 530 One For All The PondCast: SC2 News & Results Mutation # 529 Opportunities Unleashed Mutation # 528 Infection Detected
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ Battle cruiser feet vs Carrier fleet Fact based Zerg Upgrade Tier List vespene.gg — BW replays in browser Data needed
Tourneys
CSLAN 4 is Coming! [Megathread] Daily Proleagues Small VOD Thread 2.0 The Casual Games of the Week Thread
Strategy
Simple Questions, Simple Answers Relatively freeroll strategies Creating a full chart of Zerg builds Why doesn't anyone use restoration?
Other Games
General Games
Path of Exile ZeroSpace at Steam NextFest - Last free demo Stormgate/Frost Giant Megathread Nintendo Switch Thread ZeroSpace Megathread
Dota 2
Looking for a Dota Mentor Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug
TL Mafia
Vanilla Mini Mafia {D-2} Late to making 20.06.2026 memorable [p]94718
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread [H]Internet/Gaming Cafe Tips and Tricks The Games Industry And ATVI UK Politics Mega-thread
Fan Clubs
The HerO Fan Club! The herO Fan Club!
Media & Entertainment
Movie Discussion! [Req][Books] Good Fantasy/SciFi books [TV/BOOK] *SPOILERS* Game of Thrones Discussion
Sports
2024 - 2026 Football Thread McBoner: A hockey love story TeamLiquid Health and Fitness Initiative For 2023 Formula 1 Discussion Cricket [SPORT]
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread Facing Challenges in Mobile App Development
TL Community
The Automated Ban List
Blogs
How To Predict Tilt in Espor…
TrAiDoS
An Exploration of th…
waywardstrategy
I'm an arrogant trash talke…
FlaShFTW
Gauntlet SC2: A Retrospectiv…
Ctone23
Why RTS gamers make better f…
gosubay
Customize Sidebar...

Website Feedback

Closed Threads



Active: 7855 users

Reinforcement learning

Blogs > Qzy
Post a Reply
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-01-17 20:11:12
January 17 2011 20:08 GMT
#1
Hi my fellow nerds =)

I'm studying for my exam in "modern artificial intelligence in games". I'm a bit confused about some of the many types of reinforcement learning. Perhaps someone knows a good way to tell them all apart? I got some holes in my knowledge - can someone help me fill them?

Q-learning Link
Q-learning looks at the next state (s+1), and updates the current state as such:

[image loading]

Q-learning uses bootstrapping:
Bootstrapping: Estimate how good a state is based on how good we think the next state is

TD(λ)
Is exactly like Q-learning, but uses λ to find out how far it should bootstrap. TD(0) = Q-learning.

SARSA - Link
Looks at State(t+1), Action(t+1), Reward(t+2), State(t+2), Action(t+2).
[image loading]

(What's the difference between SARSA and Q-learning? Looks very alike)

MC Link
Monte Carlo methods uses no bootstrapping.
Updates a state purely based on values returned by performing actions in the given state.

Dynamic Programming
It's a bit out of scope, but I have no idea how it works.

Any input on these subjects is appreciated - many papers on this is poorly explained (well I think so at least).

Thanks!

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 20:51 GMT
#2
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.
Developer for http://mtgfiddle.com
ScrubS
Profile Joined September 2010
Netherlands436 Posts
January 17 2011 20:53 GMT
#3
I am not really into all of this, but I find this really intresting. Wikipedia does wonders:

Diffrence between TD and SARSA:
'The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy'

TD is a combination of Dynamic Programming and MC:
'TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping).'

Probably could find some more if I would keep on looking. As I only understand half of this stuff, it might not help you but I did really found this to be very intresting

Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:12 GMT
#4
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
darmousseh
Profile Blog Joined May 2010
United States3437 Posts
January 17 2011 21:19 GMT
#5
On January 18 2011 06:12 Qzy wrote:
Show nested quote +
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.
Developer for http://mtgfiddle.com
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
January 17 2011 21:39 GMT
#6
On January 18 2011 06:19 darmousseh wrote:
Show nested quote +
On January 18 2011 06:12 Qzy wrote:
On January 18 2011 05:51 darmousseh wrote:
Dynamic programming is a completely different topic altogether and is an algorithm rather than anything to do with AI.

You have Q learning correct. TD is the base method and updates all of the previous states, but at a varying factor depending on how much the current state is relevant to previous states.

Q learning follows a specific pattern for how to learn. Sarsa is like a dynamic Q learning method where it is learning the most efficient way of getting new information.


Monte Carlo is as you said, simply a method of evaluating a specific move by taking a huge sample. The best Go program in the world uses monte carlo and has no information other than the current state. It can only work in certain situations.

Dynamic programming is any algorithm which solves a problem by solving the individual parts such as the shortest path problem.


Thanks, I'm still a bit confused about SARSA. Could you give an example?

.


If you already have full information about the environment (such as chess) then you would use Q learning since you would know how to exploit the environment already. The goal in chess is to capture to opponents king

If you have little to no information about the environment you would likely use SARSA since they typically use an annotated nueral network with it. For example, a maze solving algorithm with no information about the maze other than simple feedback.


I assume it's due to the exploration vs. exploitation in Q-learning? SARSA doesn't utilize such thing it builds its own?
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Please log in or register to reply.
Live Events Refresh
Replay Cast
00:00
GSL CK #4 - Day 1
CranKy Ducklings142
EnkiAlexander 83
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
RuFF_SC2 178
StarCraft: Brood War
Noble 35
Mind 17
Dota 2
LuMiX1
League of Legends
Doublelift6796
Counter-Strike
summit1g11501
Other Games
PiGStarcraft2201
JimRising 700
WinterStarcraft437
ViBE149
Livibee84
Trikslyr67
Nina46
Organizations
Other Games
gamesdonequick1627
Dota 2
PGL Dota 2 - Secondary Stream727
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
[ Show 13 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• RayReign 9
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• Lourlo645
• Stunt84
Upcoming Events
WardiTV Spring Champion…
8h 42m
GSL
9h 42m
Maru vs Reynor
Lambo vs Solar
IPSL
13h 42m
Hawk vs Julia
Patches Events
14h 42m
BSL22 NKC (BSL vs China)
16h 42m
Dewalt vs Messiah
Bonyth vs Mihu
TerrOr vs XuanXuan
eOnzErG vs Messiah
Jaystar vs Mihu
Dewalt vs XuanXuan
Bonyth vs TerrOr
Replay Cast
21h 42m
WardiTV Weekly
1d 8h
Monday Night Weeklies
1d 13h
Sparkling Tuna Cup
2 days
The PondCast
3 days
[ Show More ]
Douyu Cup 2020
4 days
Oliveira vs Trap
Jieshi vs XY
soO vs FanTaSy
TY vs Coffee
Douyu Cup 2020
5 days
Neeb vs Impact
MacSed vs Cyan
Scarlett vs Kelazhur
INnoVation vs Dear
Douyu Cup 2020
6 days
Maestros of the Game
6 days
herO vs Classic
Maru vs Serral
BSL22 NKC (BSL vs China)
6 days
Liquipedia Results

Completed

Proleague 2026-06-19
uThermal 2v2 2026 Main Event
Heroes Pulsing #2

Ongoing

IPSL Spring 2026
Acropolis #4
CSCL: Masked Kings S4
YSL S3
BSL 22 Non-Korean Championship
CSL Season 21: Qualifier 1
SCTL 2026 Spring
Maestros of the Game 2
WardiTV Spring 2026
Murky Cup 2026
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
Asian Champions League 2026
IEM Atlanta 2026
PGL Astana 2026
BLAST Rivals Spring 2026
IEM Rio 2026
PGL Bucharest 2026

Upcoming

CSL Season 21: Qualifier 2
CSL 2026 Summer (S21)
CSLAN 4
Blizzard Classic Cup 2026
Kung Fu Cup 2026 Grand Finals
RSL Revival: Season 6
CranK Gathers Season 4: BW vs SC2 Team League
HSC XXIX
Douyu Cup 2026
BCC 2026
Light HT
Heroes Pulsing #3
BLAST Open Fall 2026
Esports World Cup 2026
BLAST Bounty Summer 2026
BLAST Bounty Summer Qual
Stake Ranked Episode 3
XSE Pro League 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.