• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 08:59
CEST 14:59
KST 21:59
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL21] Ro8 Preview Pt2: Progenitors8Code S Season 1 - RO12 Group A: Rogue, Percival, Solar, Zoun13[ASL21] Ro8 Preview Pt1: Inheritors16[ASL21] Ro16 Preview Pt2: All Star10Team Liquid Map Contest #22 - The Finalists22
Community News
Weekly Cups (April 27-May 4): Clem takes triple0RSL Revival: Season 5 - Qualifiers and Main Event11Code S Season 1 (2026) - RO12 Results12026 GSL Season 1 Qualifiers25Maestros of the Game 2 announced9
StarCraft 2
General
Weekly Cups (April 27-May 4): Clem takes triple Blizzard Classic Cup @ BlizzCon 2026 - $100k prize pool Code S Season 1 (2026) - RO12 Results Code S Season 1 - RO12 Group A: Rogue, Percival, Solar, Zoun Team Liquid Map Contest #22 - The Finalists
Tourneys
Sparkling Tuna Cup - Weekly Open Tournament RSL Revival: Season 5 - Qualifiers and Main Event StarCraft Evolution League (SC Evo Biweekly) 2026 GSL Season 2 Qualifiers $1,400 SEL Season 3 Ladder Invitational
Strategy
Custom Maps
[D]RTS in all its shapes and glory <3 [A] Nemrods 1/4 players [M] (2) Frigid Storage
External Content
Mutation # 524 Death and Taxes The PondCast: SC2 News & Results Mutation # 523 Firewall Mutation # 522 Flip My Base
Brood War
General
ASL21 General Discussion AI Question Using AI to optimize marketing campaigns [ASL21] Ro8 Preview Pt2: Progenitors Why there arent any 256x256 pro maps?
Tourneys
[ASL21] Ro8 Day 4 [ASL21] Ro8 Day 3 [Megathread] Daily Proleagues [ASL21] Ro8 Day 2
Strategy
Simple Questions, Simple Answers Fighting Spirit mining rates What's the deal with APM & what's its true value Any training maps people recommend?
Other Games
General Games
Stormgate/Frost Giant Megathread Dawn of War IV OutLive 25 (RTS Game) Daigo vs Menard Best of 10 Nintendo Switch Thread
Dota 2
The Story of Wings Gaming
League of Legends
G2 just beat GenG in First stand
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia Mafia Game Mode Feedback/Ideas TL Mafia Community Thread Five o'clock TL Mafia
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread European Politico-economics QA Mega-thread 3D technology/software discussion Canadian Politics Mega-thread
Fan Clubs
The IdrA Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece [Req][Books] Good Fantasy/SciFi books
Sports
2024 - 2026 Football Thread Formula 1 Discussion McBoner: A hockey love story
World Cup 2022
Tech Support
streaming software Strange computer issues (software) [G] How to Block Livestream Ads
TL Community
The Automated Ban List
Blogs
Movie Stars In Video Games: …
TrAiDoS
ramps on octagon
StaticNine
Broowar part 2
qwaykee
Funny Nicknames
LUCKY_NOOB
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1393 users

Replay Classification Project

Blogs > StRyKeR
Post a Reply
1 2 Next All
StRyKeR
Profile Blog Joined January 2006
United States1739 Posts
Last Edited: 2008-11-19 08:06:43
November 19 2008 07:05 GMT
#1
I've embarked on a personal project of mine involving a high-tech way to classify the players in a Starcraft replay.

Basically, imagine training a computer to learn to recognize who played in a replay with an accuracy of prediction better than (paladin)roMAD's.

I was inspired by romad's superhuman ability to recognize who played in a replay just by looking at hotkey signatures. I use his example as a benchmark for my machine.

Since hotkey usage is not that sophisticated to measure and compile, I figured that the high-tech machine learning tools that I've been learning in class this term could easily apply to this domain.

Early results are very promising.

The procedure is rather simple. I take a replay and grab 212 features and put it into a 212-dimensional vector. I note who played in the replay (for the training set, I need to know who played in the replay) and label the vector accordingly. I do this for a whole bunch of replays and I apply a machine learning algorithm that uses this set of vectors and labels to train itself.

Then I go about testing the machine by giving it examples it hasn't seen yet and seeing how accurate it is.

Currently, for all matchups except TvP, I can achieve about 90% accuracy. I have to add though, that as I add more players into the mix, the accuracy might go down. Right now I'm classifying among 10 players and 90% seems to be the resulting accuracy.

For some reason, the machine finds TvP hard to learn. I think this makes sense, since there's not many variations in strategy or unit composition in TvP. I don't play T (I play Z) so it's harder for me to know how to fix TvP learning.

Currently, I'm using Taiche's RepASM library to convert mass replays into mass 212-dimensional vectors. While his library is awesome, I'd like more features. Right now, his library cannot tell which unit is actually being clicked or saved into a hotkey. All I know is the unit's ID, which doesn't tell me anything. Having that information might bump up the accuracy to 95% or even higher.

Having said that, it's quite amazing how even stupid things like hotkey typing frequencies are consistent for one player's games and helps in training the machine. Right now, 10 dimensions (one for each hotkey) in the feature vector simply count the relative frequencies of the hotkeys being used. If someone prefers to use 1 a lot, it would be reflected as a high percentage in one of these 10 dimensions.

Some actions are habitual, even if the player does not recognize them.

I'm going to eventually make a lot of this automated, so that it can be of help to a lot of people wondering who's playing in a replay. This would prove useful especially for the iCCup Who's Who thread.

I'm setting up a server to do, but I'm still learning how to setup an automated system, since I'm using MATLAB to train the machine. I'm thinking that I could manually train the machine and provide an interface that could automatically classify replays. I might retrain the machine every week or so (would take some time retraining, especially with more and more replays).

One important thing is for me to get as many replays as possible, especially the pro replays. I have a ton of TSL replays, so right now I can classify a lot of foreigners with a good amount of accuracy.

One caveat is that I need a lot of replays for training. Preferably over 50. 20 might be okay. 10 is probably not enough.

I'll probably setup a website where you guys can upload replays with labels on them (remember, I need labels for training). The labels better be right though; otherwise, it would confuse the machine (wouldn't be catastrophic but still).

I'll issue a self-challenge. Send me a replay of unknown players and ask me to classify it. I'll post the results and analysis that results from feeding the replay into my machine.

Here's an example. I trained the machine using TSL replays and I just fed the replay of Mondragon playing MistrZZZ (recent replay, look in Replay section). Here are the results:

[image loading]


Okay, ng.stryker is me, and I never played in the TSL. But I thought I might include myself.

The confidence indicates how many replays I had to train that particular classifier. Basically, the more replays I used to train, the more likely it is to be correct.

Positive similarities indicate well, good similarity.
A value of 1 means that the classifier thinks the keystrokes are pretty damn similar.
A value of -1 means that the keystrokes are pretty damn different.

Just to let you know that these features need some work, here is a bad example.

[image loading]


Apparently, the machine thinks the player is very similar to David. However, the player is none other than Jaedong. In its defense, maybe if we had trained a Jaedong classifier, we'd get a Jaedong similarity score of 2 or something which would beat out David's.

I took a look at the replay, and there are some obvious difference between Jaedong and David that the 212 dimensions do not yet capture. That will be work for tomorrow.

Because I have very few replays of progamers, if you give me a progamer replay, the machine may try to fit the player as a foreigner and give strange predictions. However, we would still be able to observe how close the player's signature is compared to the foreigners, even though the player may not be one.

EDIT: I'm adding some interesting things I've found.

Interesting Things
* Some keystrokes are habitual and consistent for a player, even if the player does not know about them.
* Analyzing the entire replay is often worse than just looking at the first 10 minutes or so. Right now, I've capped the replay analysis to 9 minutes. I've filtered out replays that were too short (I think right now I only admit replays over 4 minutes long).
* APM is factored into the algorithm, but I don't really know how useful it is. It's pretty consistent for a player, so I suppose it helps. There are two dimensions for it -- one is the APM average for the first minute, the second is the average for the entire game. I figured that the first-minute average is very unlikely to change regardless of the opponent.
* I do an initial screen based on matchup. For example, if I get a replay that is ZvP, I won't be asking the machine to identify whether Nada played it (unless I trained some examples of Nada playing Zerg).
* Machine's ZvP keystrokes seem to be radically different from all the other players I've seen. Whenever I test Player X's replay, where X is not Machine, Machine consistently ranks last in similarity.

EDIT 2: I'm wondering if I should make this blog day-by-day and have a new entry every time or just keep updating this one. Okay. This entry is getting too long. I will update on new ones from now on.

****
Ars longa, vita brevis, principia aeturna.
Grobyc
Profile Blog Joined June 2008
Canada18410 Posts
November 19 2008 07:08 GMT
#2
Sounds really wicked. If you complete this me <3 you long time!
If you watch Godzilla backwards it's about a benevolent lizard who helps rebuild a city and then moonwalks into the ocean.
EtherealDeath
Profile Blog Joined July 2007
United States8366 Posts
November 19 2008 07:08 GMT
#3
Someone give him a replay of that olympic ranked player o.O
sixghost
Profile Blog Joined November 2007
United States2096 Posts
November 19 2008 07:11 GMT
#4
http://www.teamliquid.net/forum/viewmessage.php?topic_id=65607

There's a bunch of pro gamer replays in there if you havent already found that. I'm not sure if they are up yet though.

mG.sixghost @ iCCup || One ling, two ling, three ling, four... Camp four gas, then ultra-whore . -Saracen
StRyKeR
Profile Blog Joined January 2006
United States1739 Posts
Last Edited: 2008-11-19 07:12:18
November 19 2008 07:11 GMT
#5
On November 19 2008 16:11 lgdDante wrote:
http://www.teamliquid.net/forum/viewmessage.php?topic_id=65607

There's a bunch of pro gamer replays in there if you havent already found that. I'm not sure if they are up yet though.



Yea, I found that a few weeks ago except the links are all dead.

EDIT: Hmm, some are good.
Ars longa, vita brevis, principia aeturna.
sixghost
Profile Blog Joined November 2007
United States2096 Posts
November 19 2008 07:15 GMT
#6
Have you noticed if there's general minimum length the replay has to be for the accuracy to be near the average?
mG.sixghost @ iCCup || One ling, two ling, three ling, four... Camp four gas, then ultra-whore . -Saracen
OneOther
Profile Blog Joined August 2004
United States10774 Posts
November 19 2008 07:31 GMT
#7
Wow this would be so sick.
GHOSTCLAW
Profile Blog Joined February 2008
United States17042 Posts
November 19 2008 07:43 GMT
#8
If you can manage this that would be amazing. I'm not sure how reliable it would be, but it would be very interesting.
PhotographerLiquipedia. Drop me a pm if you've got questions/need help.
MasterReY
Profile Blog Joined August 2007
Germany2708 Posts
November 19 2008 07:44 GMT
#9
wow, i never thought a thing like that can work, but its nice to hear you already have about 90% accuracy.
PLease work hard to complete this program.
People will love you forever lol :D
gj
https://www.twitch.tv/MasterReY/ ~ Biggest Reach fan on TL.net (Don't even dare to mention LR now) ~ R.I.P Violet ~ Developer of SCRChart
TL+ Member
pachi
Profile Joined October 2006
Melbourne5338 Posts
November 19 2008 07:49 GMT
#10
If you want some replays for source material, Heres an old pack of several Babara Seasons collected by yakii, which should contain many korean reps most of which are probably pros by now.

http://rapidshare.com/files/151429614/babara.rar.html

(Also here is the replay pack from Itemmania, where everyone (koreans + draco) played under a number instead of their usual nick. http://rapidshare.com/files/151430307/Itemmania.rar.html)
Moderatorpachi fanclub http://goto.tl/6DI9 。◕‿◕。
magusmind
Profile Blog Joined May 2008
50 Posts
November 19 2008 07:57 GMT
#11
Wow, this idea brilliant.

Out of curiosity, what ML algorithm are you using? I assume it's a variant of perceptron. (You mention noisy data would be bad but not disastrous, and perceptron is somewhat resilient to noise) Also, how are you choosing the 212 features? I'm wondering because you couldn't have hand picked all of them.

As a side note, every replay (assuming 1v1) would actually be 2 examples, right? How are you counting your accuracy? (+1 for guessing both players correctly? +1 for guessing each correct player?) I'm also curious... does classifying one player correctly correlate with classifying the other one correctly?

Anyway, great idea. Keep us updated. I'm very interested in the results.
BanZu
Profile Blog Joined June 2008
United States3329 Posts
November 19 2008 08:02 GMT
#12
Oh snap this is some nifty stuff

I'd like to see how efficient you can get this thing to be
Sun Tzu once said, "Defiler becomes useless at the presences of a vessel."
StRyKeR
Profile Blog Joined January 2006
United States1739 Posts
Last Edited: 2008-11-19 08:38:19
November 19 2008 08:33 GMT
#13
On November 19 2008 16:49 pachi wrote:
If you want some replays for source material, Heres an old pack of several Babara Seasons collected by yakii, which should contain many korean reps most of which are probably pros by now.

http://rapidshare.com/files/151429614/babara.rar.html

(Also here is the replay pack from Itemmania, where everyone (koreans + draco) played under a number instead of their usual nick. http://rapidshare.com/files/151430307/Itemmania.rar.html)


Hehe, this gets fun.

I ran some replays from Itemmania through the machine.

Interestingly, I got the following results for a particular player.

itemmania_29
I found 3 replays, PvT, 2 replays, PvZ. They all point to Draco.

Result for one PvT:
[image loading]


I'm gonna guess that itemmania_29 is Draco.

Also, I got pretty weak results for itemmania_16, but is it mistrzzz?
2 replays, PvT
[image loading]

[image loading]
Ars longa, vita brevis, principia aeturna.
magusmind
Profile Blog Joined May 2008
50 Posts
November 19 2008 09:02 GMT
#14
Dude, I just noticed you can extend your current project to look at how similarly progamers play.

If you train using the multiclass perceptron (one weight vector for each player), at the end, each vector is pretty much a "characteristic" of how that player plays. Since these are just vectors in a high dimensional space, their dot product tells you how similar they are. In addition, if you run some sort of clustering algorithm on the vectors, you might see some interesting groupings.

This can let us ask some really interesting questions, eg:
Who is really boxer's protege? (Which SKT terran plays most similarly to boxer?)
Is there a clear distinction between how aggro zergs (eg: July) and macro zergs (eg: sAviOr) play? (Do all the aggro zergs and macro zergs cluster into distinct groups?)

Of course this requires a lot of training data that probably isn't available. But if you one day get your hands on them, I can totally imagine these types of questions being answered (As well as many others I can't think of right now).

+ Show Spoiler +
If you couldn't tell, I am well versed in the art of machine learning. If you have any questions or run into any difficulties I'd be glad to help.
AttackZerg
Profile Blog Joined January 2003
United States7517 Posts
November 19 2008 11:12 GMT
#15
This project has my stamp of approval.

This is a neat thing your doing, I'm interested to see how hard it is to fake your machine out .
NarutO
Profile Blog Joined December 2006
Germany18839 Posts
November 19 2008 11:39 GMT
#16
Thats pretty damn sick! Keep up the work, awesome dude! Really.
CommentatorPolt | MMA | Jjakji | BoxeR | NaDa | MVP | MKP ... truly inspiring.
roMAD
Profile Blog Joined April 2004
Russia2355 Posts
Last Edited: 2008-11-29 20:29:57
November 29 2008 20:29 GMT
#17
Wait. Itemmania_29 by any means is NOT Draco. It's not even close. It's beast[fOu]. And iteammania_16 is not Mistrzz at all. Your program is very inaccurate
MasterOfChaos
Profile Blog Joined April 2007
Germany2896 Posts
November 29 2008 22:13 GMT
#18
Impressive. What do you use to analyse the vector? And which observables do you use to create it?
LiquipediaOne eye to kill. Two eyes to live.
SpiritoftheTunA
Profile Blog Joined August 2006
United States20903 Posts
November 29 2008 22:27 GMT
#19
romad desperately avoids being outsourced~

jk, i'd trust romad over a program any day
posting on liquid sites in current year
LosingID8
Profile Blog Joined December 2006
CA10830 Posts
November 29 2008 22:40 GMT
#20
On November 30 2008 05:29 (paladin)roMAD wrote:
Wait. Itemmania_29 by any means is NOT Draco. It's not even close. It's beast[fOu]. And iteammania_16 is not Mistrzz at all. Your program is very inaccurate

true it's inaccurate right now, but thats probably because he doesn't have a good amount of reps for beast[fou] and other players.

once he gets a decent amount of reps for all known gamers then i feel that it would be pretty decent.


but i'd still want romad's opinion to be 100% sure
ModeratorResident K-POP Elitist
1 2 Next All
Please log in or register to reply.
Live Events Refresh
WardiTV Invitational
11:00
Wardi Spring Cup
SHIN vs Nicoract
Solar vs Nice
WardiTV674
TKL 178
Rex136
LiquipediaDiscussion
Sparkling Tuna Cup
10:00
Weekly #130 (TLMC 22 Edition)
ByuN vs ClassicLIVE!
herO vs TBD
CranKy Ducklings169
CranKy Ducklings SOOP48
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Lowko376
TKL 178
Rex 136
StarCraft: Brood War
Britney 55378
Calm 7972
Bisu 3267
Horang2 1196
Mini 616
EffOrt 488
Light 454
Soma 339
actioN 269
ggaemo 229
[ Show more ]
Hyun 209
Zeus 159
ZerO 154
Pusan 148
Leta 139
Sharp 91
hero 73
Sea.KH 70
Aegong 68
Mind 67
ToSsGirL 67
PianO 53
Killer 40
Hm[arnc] 39
Backho 30
sSak 30
Shinee 24
JulyZerg 22
Noble 21
IntoTheRainbow 20
Bale 17
Sacsri 15
sorry 14
GoRush 11
Movie 11
Icarus 10
ajuk12(nOOB) 9
Terrorterran 9
SilentControl 8
zelot 7
Counter-Strike
olofmeister3010
byalli1678
zeus388
x6flipin370
allub253
markeloff244
kRYSTAL_21
Other Games
singsing2048
B2W.Neo951
hiko431
Sick264
Mew2King135
monkeys_forever110
ArmadaUGS69
ZerO(Twitch)10
QueenE9
Organizations
Dota 2
PGL Dota 2 - Main Stream51
StarCraft: Brood War
lovetv 18
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
[ Show 12 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• TFBlade1534
Other Games
• WagamamaTV282
Upcoming Events
PiGosaur Cup
11h 1m
GSL
20h 31m
Classic vs Cure
Maru vs Rogue
GSL
1d 20h
SHIN vs Zoun
ByuN vs herO
OSC
1d 22h
OSC
2 days
Replay Cast
2 days
Escore
2 days
The PondCast
2 days
WardiTV Invitational
2 days
Zoun vs Ryung
Lambo vs ShoWTimE
OSC
3 days
[ Show More ]
Replay Cast
3 days
CranKy Ducklings
3 days
RSL Revival
3 days
SHIN vs Bunny
ByuN vs Shameless
WardiTV Invitational
3 days
Krystianer vs TriGGeR
Cure vs Rogue
uThermal 2v2 Circuit
4 days
BSL
4 days
Replay Cast
4 days
Sparkling Tuna Cup
4 days
RSL Revival
4 days
Cure vs Zoun
Clem vs Lambo
WardiTV Invitational
4 days
BSL
5 days
GSL
5 days
Afreeca Starleague
5 days
Monday Night Weeklies
6 days
Afreeca Starleague
6 days
CranKy Ducklings
6 days
Liquipedia Results

Completed

Proleague 2026-05-02
WardiTV TLMC #16
Nations Cup 2026

Ongoing

BSL Season 22
ASL Season 21
CSL 2026 SPRING (S20)
IPSL Spring 2026
KCM Race Survival 2026 Season 2
Acropolis #4
SCTL 2026 Spring
RSL Revival: Season 5
2026 GSL S1
BLAST Rivals Spring 2026
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026

Upcoming

YSL S3
Escore Tournament S2: W6
KK 2v2 League Season 1
BSL 22 Non-Korean Championship
Escore Tournament S2: W7
Escore Tournament S2: W8
CSLAN 4
Kung Fu Cup 2026 Grand Finals
HSC XXIX
uThermal 2v2 2026 Main Event
Maestros of the Game 2
2026 GSL S2
Stake Ranked Episode 3
XSE Pro League 2026
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
IEM Atlanta 2026
Asian Champions League 2026
PGL Astana 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.