I'm working on an algorithm that would take a replay and determine who played in it. I always thought it would be possible because humans can pretty easily spot patterns in hotkey signatures and conclude who played. For more reference, refer to my first blog entry.
Basically, the machine needs training data (previous replays) and after it trains on that data, we can begin to ask it questions.
As a test, I've applied the current machine to Nony's 6 replays from the Replays section. It's a good sign that all replays have been classified successfully as Nony's.
The confidence is basically the number of samples I had available to train the machine. The more experience the machine has, the more samples it has seen, the better it would be in classifying new examples. The more practice it gets, the better it is going to do in action. Of course, it's not a completely one-to-one correlation. A machine trained on a few samples might end up being amazingly accurate.
The similarity measure is how close the machine thinks the owner of the replay is. A value of 1 means a pretty damn close match (as shown below, with Nony). A value of -1 means a pretty far match.
I'm still working on making the machine logistics easier to deal with (organizing replays and compiling data). Basically, that means I'm trying to write scripts that automate a lot of things. That's been my focus lately more than tweaking and improving the algorithm itself.
That said, here are some new results.
I put the six new Nony replays (found in the Replays section) through the machine and got the following results.
An overwhelming vote for Nony. Whew, it didn't get it wrong
The classifier used for these graphs was trained using 502-dimensional feature vectors grabbed from replays from 9 well-known Starcraft protoss players. In this example, the machine clearly points to Nony. In harder examples, other players may show close resemblances, making it a harder problem.