Basically, imagine training a computer to learn to recognize who played in a replay with an accuracy of prediction better than (paladin)roMAD's.
I was inspired by romad's superhuman ability to recognize who played in a replay just by looking at hotkey signatures. I use his example as a benchmark for my machine.
Since hotkey usage is not that sophisticated to measure and compile, I figured that the high-tech machine learning tools that I've been learning in class this term could easily apply to this domain.
Early results are very promising.
The procedure is rather simple. I take a replay and grab 212 features and put it into a 212-dimensional vector. I note who played in the replay (for the training set, I need to know who played in the replay) and label the vector accordingly. I do this for a whole bunch of replays and I apply a machine learning algorithm that uses this set of vectors and labels to train itself.
Then I go about testing the machine by giving it examples it hasn't seen yet and seeing how accurate it is.
Currently, for all matchups except TvP, I can achieve about 90% accuracy. I have to add though, that as I add more players into the mix, the accuracy might go down. Right now I'm classifying among 10 players and 90% seems to be the resulting accuracy.
For some reason, the machine finds TvP hard to learn. I think this makes sense, since there's not many variations in strategy or unit composition in TvP. I don't play T (I play Z) so it's harder for me to know how to fix TvP learning.
Currently, I'm using Taiche's RepASM library to convert mass replays into mass 212-dimensional vectors. While his library is awesome, I'd like more features. Right now, his library cannot tell which unit is actually being clicked or saved into a hotkey. All I know is the unit's ID, which doesn't tell me anything. Having that information might bump up the accuracy to 95% or even higher.
Having said that, it's quite amazing how even stupid things like hotkey typing frequencies are consistent for one player's games and helps in training the machine. Right now, 10 dimensions (one for each hotkey) in the feature vector simply count the relative frequencies of the hotkeys being used. If someone prefers to use 1 a lot, it would be reflected as a high percentage in one of these 10 dimensions.
Some actions are habitual, even if the player does not recognize them.
I'm going to eventually make a lot of this automated, so that it can be of help to a lot of people wondering who's playing in a replay. This would prove useful especially for the iCCup Who's Who thread.
I'm setting up a server to do, but I'm still learning how to setup an automated system, since I'm using MATLAB to train the machine. I'm thinking that I could manually train the machine and provide an interface that could automatically classify replays. I might retrain the machine every week or so (would take some time retraining, especially with more and more replays).
One important thing is for me to get as many replays as possible, especially the pro replays. I have a ton of TSL replays, so right now I can classify a lot of foreigners with a good amount of accuracy.
One caveat is that I need a lot of replays for training. Preferably over 50. 20 might be okay. 10 is probably not enough.
I'll probably setup a website where you guys can upload replays with labels on them (remember, I need labels for training). The labels better be right though; otherwise, it would confuse the machine (wouldn't be catastrophic but still).
I'll issue a self-challenge. Send me a replay of unknown players and ask me to classify it. I'll post the results and analysis that results from feeding the replay into my machine.
Here's an example. I trained the machine using TSL replays and I just fed the replay of Mondragon playing MistrZZZ (recent replay, look in Replay section). Here are the results:
Okay, ng.stryker is me, and I never played in the TSL. But I thought I might include myself.
The confidence indicates how many replays I had to train that particular classifier. Basically, the more replays I used to train, the more likely it is to be correct.
Positive similarities indicate well, good similarity.
A value of 1 means that the classifier thinks the keystrokes are pretty damn similar.
A value of -1 means that the keystrokes are pretty damn different.
Just to let you know that these features need some work, here is a bad example.
Apparently, the machine thinks the player is very similar to David. However, the player is none other than Jaedong. In its defense, maybe if we had trained a Jaedong classifier, we'd get a Jaedong similarity score of 2 or something which would beat out David's.
I took a look at the replay, and there are some obvious difference between Jaedong and David that the 212 dimensions do not yet capture. That will be work for tomorrow.
Because I have very few replays of progamers, if you give me a progamer replay, the machine may try to fit the player as a foreigner and give strange predictions. However, we would still be able to observe how close the player's signature is compared to the foreigners, even though the player may not be one.
EDIT: I'm adding some interesting things I've found.
Interesting Things
* Some keystrokes are habitual and consistent for a player, even if the player does not know about them.
* Analyzing the entire replay is often worse than just looking at the first 10 minutes or so. Right now, I've capped the replay analysis to 9 minutes. I've filtered out replays that were too short (I think right now I only admit replays over 4 minutes long).
* APM is factored into the algorithm, but I don't really know how useful it is. It's pretty consistent for a player, so I suppose it helps. There are two dimensions for it -- one is the APM average for the first minute, the second is the average for the entire game. I figured that the first-minute average is very unlikely to change regardless of the opponent.
* I do an initial screen based on matchup. For example, if I get a replay that is ZvP, I won't be asking the machine to identify whether Nada played it (unless I trained some examples of Nada playing Zerg).
* Machine's ZvP keystrokes seem to be radically different from all the other players I've seen. Whenever I test Player X's replay, where X is not Machine, Machine consistently ranks last in similarity.
EDIT 2: I'm wondering if I should make this blog day-by-day and have a new entry every time or just keep updating this one. Okay. This entry is getting too long. I will update on new ones from now on.