Forward TL;DR What I've created thus far is just a hodge-podge of random stuff I've found I can garner from replays these days, as well as what what I think people could find useful with them. This is meant to answer a set of questions about large numbers of games of StarCraft.
Before I go any further: Shoutouts to TLO for letting me use all his ladder games. What you're seeing is information from his last 380 ladder games.
This time, with the 2.0.8 patch last year, I found out that a lot more information was stored in replays. In the past it was just raw user inputs that was simulated in the game when you ran the replay. However, now this is not the case. You have seen with things like SCELight that you can get the buildings you created, upgrades made, etc. With tools like that giving you qualitative analysis of each of your games, and sites such as Aligulac giving you quantitative analysis of players and how they do in tournaments, I thought maybe I could combine the two. Turns out, I can!
I started out thinking that, if tournaments release their replays, I can grab all the games from all the players and use this to profile them. Create an overview of this player in a tournament setting. Could I use this to predict what they would do on any given map in a given matchup? Could I say "Ziggy says there's a 74% chance that MC will go quick stargate on this map vs Zerg"? Well, maybe, but as I was working on it and talking to some people about it, it turned out that we could turn this into a training tool, or in fact a tool that could actually do quite a few things. Introduction over
Parsing Replays: This tool uses Blizzard's S2 Protocol (https://github.com/Blizzard/s2protocol - Thanks, Blizzard!). I have not written my own replay parsing engine as of yet. I might do it I get the time and/or inclination to do so. As for now, this tool provides what I'm after doing!
It takes ~ 15-60 seconds to parse a replay based on time. Because of this, Replays are then stored off in a format that just keeps track of what I'm interested in. This means I can load replays much faster after they've been parsed. I do this because, unlike qualitative tools, I'm not interested in how you hotkey your army. I'm just here to give you STATS ABOUT ALL YOUR GAMES AT ONCE. RIGHT TO YOUR FACE.
Player Matchup Overviews: You want to know how well you've been doing in each of your matchups? NO PROBLEM! Obviously I have no real information from the last 2-7 days, because he's been at SSC So far so Aligulac, but there's more!
Build Orders: "What does a build look like?"
I have included the ability to configure build orders. With this you can fill in any length of build, beginning and ending at any point. This is a build where Zerg goes onto 3 bases, at least one queen per base, gets a reasonably fast ling speed (7 minutes) and at least 24 drones.
When configuring the build, each section has its own start/end point. If you wanted to, you could add another 8 lings that start and end at a different point, it's all good.
Or you could use it to filter replays based on whether a roach warren was built at all:
Player Build Overviews: "What do my games look like?" A straight up look at how your games have been
Can be toggled to show upgrades/units, too!
I tried to add some extra information and functionality to this, so - Want to see the times instead of food values for these, and group them up to the first 7:30? YOU SURE CAN!
It will also give you information in each grouped build item. You can see first/average/latest time they were placed:
As you can see, it gives you your overall win percentage, as well as the lengths of all the games you've played. On the right, you can see it copies that information. This is actually updated based on which of the replays you've selected, as such:
From this we can see that TLO's 3 post common builds up to the 7:30 mark actually give him a slightly higher win rate than anything else. It is also possible to filter by map as well.
You can see a bunch of strings on the right under the filtered win percentages. Those are all the replays that correspond to the highlighted rows. If you have access to them and they are in the location from which they are parsed, you can copy them off to another folder for a closer look, if you so like.
Why would you do this? It's possible to filter, as you can see in the top right, based on your saved build order configurations. Useful if you want to practice a build and copy out your games where you've done them. You can tag replays, too, which will allow you to easily filter those replays in future.
Build Order Scoring: "How good am I getting at this build?" This one is very much a work in progress. Is it possible to rate how well you do a build? Still in the process of finding out, but this is what I have so far:
So, you can pick a player and a build and it'll show information on that build. You can see the best, average and worst times for any replay that fulfils the build order criteria. This extends to each element of the build.
The graphs show: 1) The completion times and how often that time is hit, as well as how many of those games you win with. 2) The time at which you completed the build over... time. If you see what I mean.
On the right you can see the "best" values for each build. The bottom right allows you to compare an unparsed replay. If it contains the build you're filtering against it'll show you where it comes in relation to the already parsed replays. COMPARE YOURSELF TO THE PROS, YO.
Build Order Wars: "What should I be doing against X?" Something that is generally already known, but comparison of build orders. Some basic ones: 3 base zerg (3 hatches before 8 minutes) vs 3 base Protoss (3 nexuses before 10 minutes):
Zerg wins 53 out of 85 of these games. you can see how the matchup swings back and forth over time. I have no idea why. This, I think, is just straight up for numbers. We never see enough numbers in StarCraft casts. One thing you saw in the first link I showed was a heatmap. I bet you're all wondering where those are. FEAR NOT!
Heat Maps: That's right. I have heat maps for two different kinds of things. Buildings:
You can filter by map, matchup, spawn location, building type, replay filter. You want to know where protoss builds proxy pylons before the 9 minute mark on Daedelus, when they spawn top left? I GOT YOU, BRO.
Obviously this is going to be a really small subset of games, hence there aren't many (it's all Protoss vs TLO games on this map!) As well as being useful to know where people are building proxy pylons at you, you can see the glory that is where all the units have died.
This can, again, be filtered in the same way (except not by building, that would be odd). Aspiring map makers may be able to use this to make sure that fights are varied and spread over the map, that all bases are taken where possible and that it's not just all focused on one area the whole time. I'm basically just asking you not make Habitation Station The "All Mid" of SC2 Maps
Anyway, that's all I've got so far: Pre-Emptive Answers It's rather ugly Yes, I know it's not very pretty. I'm bad at art based stuff, and I basically learned WPF as I was making this, so it's a bit sloppy in places. However, it works!
Can I use this? I'll be potentially releasing this for use in the next couple of weeks. I'll bundle everything that you would need to use to run it where I can. However, you will most likely need to install Python (2.7) yourself to make it work. This is to run S2Protocol. This requirement will possibly be removed in later iterations if I can get the time to write my own replay parsing engine.
That's a dumb name I can't think fo a better one. I've already used StarGraphed. Maybe "StarGraphed: The Next Generation" and I could rename all the buttons that do things to "Make It So"
Feel free to hit me up here on a PM or grab me on Twitter @Gowerly if you have any questions or suggestions. I'll add the most common ones to the OP here.
Hi Dave, Robert here, I was wondering what you are going to do next? Do I have to use my own replays or can I use replays from pro players and tournaments too?
On March 12 2014 06:33 Yapa wrote: Hi Dave, Robert here, I was wondering what you are going to do next? Do I have to use my own replays or can I use replays from pro players and tournaments too?
You can set it up to automatically parse any replays that are added to a folder, so it's always up to date with your laddering progress.
You can also get it to just parse a bunch of replays in a folder, too. They don't have to be yours (as you can see by me using TLO's) they just have to be replays.
I'm hoping to grab replays from SSC (if they're released) and/or IEM and doing some analysis on them. I'm looking to get things like
Most common proxy locations (rax, pylons, etc) From this I'm hoping to be able to figure out the best scouting routes to check for these things
Most used builds/Most successful builds Which builds were used the most and which were the most successful? Which matchup outcomes changed the most over the duration of a game?
Anything that warrants discussion Do we see that Zergs end up being defensive? Maybe that Protoss vs Terran ends up with a slow push from the Terran base to Protoss'?
I think there's a lot that can be discussed by looking at a lot of replays. I'm hoping to get the chance to do that!
Oh my God! All of TLO's secrets out in the Open!!!!111eleven
Seriously, great work! Nice to hear that even Blizz themselves are now giving some official attention to data mining SC2 replays. But just as with StarGraphed, all credit goes to you for turning raw data into insight. Beautiful!
I need to finish up some steps: - I don't have pictures for all the maps and it's tedious using the map editor to get them all - I need to make sure I can't make it crash by doing things like mashing buttons or something. I'd like to make sure that when it's released I don't get 50% of the people using it saying "My PC exploded and now there's metal in my face" or something.
Also, it'll be free. I'm not looking to make any money from this. I just thought it was something cool to do.
Thanks to iHiro for this one: I should be able to have this work for Starbow maps as well. It should take a little bit of modification to work with the new unit types. There was an invitational that was done recently with it, so I will try to do some small analysis on those games. I'll post more when it's done.
This looks really cool. Cant wait to try it out. Will you ever try and support things like macro mechanics (IE: larvae injections, avg energy on CC / nexus) and screens switches per minute like scelite?
On March 15 2014 10:52 Valeranth wrote: This looks really cool. Cant wait to try it out. Will you ever try and support things like macro mechanics (IE: larvae injections, avg energy on CC / nexus) and screens switches per minute like scelite?
Potentially. At the moment, I'm more interested in high level analysis, such as builds, building placements, engagement locations, etc.
However, all the information is available from the replays, as you said. If I can find a way to make that easily viewable over a large number of matches, I'll definitely be getting that in there, too.
I'm adding all of the upgrade icons (parsing the replays is hilarious: Did you know that the upgrade "hydraliskspeed" is the range upgrade for them? And that "HighCapacityBarrels" is Blue Flame?) and Starbow's data as well and then I'm pretty much ready to do a first release.
I am also eagerly awaiting this release. I have python installed already, so if you have the current version available that would be greatly appreciated!
I'll look into it. It's mainly all .NET/WPF related at the moment, so OSX will have to be a complete port. Java makes me sad, but maybe a port to Qt would be possible.
It turns out I'm dumb and SC2Reader exists (so many random things available!) so I'm investigating as to whether that will make getting information from replays easier for me.
Stay tuned, I really do hope to get this out and usable within a week.
In the "show" drop-down list of the heat map window, what else is there available apart from "buildings" and "unit deaths"? As a mapmaker, I think it's not only interesting where fights occured, but also what the common harass routes are and so on. Basically, if a heat map would show just where units generally are (e.g. captured in 10-second intervals) that would be very helpful. If possible, showing only specific kinds of units would be even more awesome, e.g.:
Ground army units on move (or a-move) command
Air units
Overlords
Observers
Burrowed banelings
Units that have just killed a worker
Not sure how feasible any of that is. Ideally, when grouping several unit types together, they would also be weighted by supply or cost, and maybe in addition to the "up to" slider you'd also have an analogous "up from" slider, since on most maps the early game will focus on certain "mid lanes". I think for example HS will look much better when excluding the first 10 or 15 minutes.
On April 07 2014 01:24 And G wrote: In the "show" drop-down list of the heat map window, what else is there available apart from "buildings" and "unit deaths"? As a mapmaker, I think it's not only interesting where fights occured, but also what the common harass routes are and so on. Basically, if a heat map would show just where units generally are (e.g. captured in 10-second intervals) that would be very helpful. If possible, showing only specific kinds of units would be even more awesome, e.g.:
Ground army units on move (or a-move) command
Air units
Overlords
Observers
Burrowed banelings
Units that have just killed a worker
Not sure how feasible any of that is. Ideally, when grouping several unit types together, they would also be weighted by supply or cost, and maybe in addition to the "up to" slider you'd also have an analogous "up from" slider, since on most maps the early game will focus on certain "mid lanes". I think for example HS will look much better when excluding the first 10 or 15 minutes.
Just some crazy ideas here...
Unsure about this one as of yet. The problem with this is that unit positions aren't really stored in snapshots of game (for some reason, unless they're damaged), so finding out where units are doesn't seem doable. I could be wrong, however, and I'll keep looking.
The reason this doesn't exist is because it doesn't need to. The game re-simulates the replay from the data that's saved (which is generally just user input), so unit positions don't need to be saved, as they'll be worked out when the game is simulated.
This is the reason why you can't really just jump forward in a replay, the game simulates at x8 - x16 speed to get to that point.
I'm still porting some of the information over to SC2 reader and getting unit statistics from it. It's taking a bit longer than I'd hoped because I'm slow and easily distracted.
On the plus side it's helping me get the tool to differentiate between HotS games and Starbow games, which means I can release the tool for both games at the same time!
Just wondering if there will be an update for this soon.
I am considering subscribing to a replay analysiser program, but none of them have near as much to offer as yours, in terms of positional / depth of multiple replay anaylsis.
GG tracker is amazing.. indvidually
sc2replaystats... also amazing but more so for more improvement over time
and yours gives this raw data of whats working, and whats not working, where pylons are, and the ability to filter threw and find key timings that are causing greef.
im not really sure what to get.. with their was trails lol, but you should really get a simulaa website up and running, cause as i said.. i really like how the depth of your analyzer :D
.. anyways just checking in on the status.. of whether or not you will release this :D
I've successfully ported it to use SC2reader to give me the data from the replays. I'm now adding what I hope to be a player overview, which I think could be cool, as a quick look at a player, effectively grading them at aspects of the game. I'm having trouble making the SC2reader python code into a standalone executable, but I can just thrust python at you until then.
But it runs again now so I'll be hunkering down at it over the next week.
Yeah, I'll be posting a blog today about the progress and plans for release.
I've been trying to put in a rudimentary search system so that you can plot X against Y for a bunch of things, e.g. "Number of Marines made in a game vs Winrate", to see if correlations happen.
A lot of that has become how to present the data and how to make it useful. However, my data science knowledge is terrible, so I'm making it up as I go along, which is slowing me down!
Hello, hello. I was having trouble getting useful comparators for the actual graphing section. I'll update tonight with the garbage I'd managed to put on the screen.
I will probably see if I can package what I have up for people to try out this weekend. Sorry it's been taking so long, I've been rather distracted.
Changes - Added league filtering - Added custom/ladder filtering All that standard stuff one would expect from a replay searching tool
Also: I have most of the graphing done. Quite important when writing something called StarGraphed.
Everything was looking pretty good, and then I got 8000 replays. Suddenly, things weren't looking so good:
Useful, right? nice to know when you're winning or losing
I've been trying to add multiple things that can be searched against... well, other things. I haven't gotten as far as effective SQL-type queries (although it's something I've considered doing, and really want to).
So, what I've been experimenting with is adding some sort of granularity to the graphs. Average values over small timeframes to make sure that I don't just have 3000 data points smooshed into 600 pixels.
This seems to be working reasonably well, but means that some areas of the graph have more data points packed into one than others. However, considering the alternative is to potentially have half of the X Axis as one data point, it makes a lot of sense to do it this way.
What does that look like? Well, let's take Snute's question of "Mutas in ZvT - Show me how many and when vs winrate". Okay:
Earlier mutas seems to be better
Also more mutas seems to be better
My Data Science studies continues (thanks, coursera!) so I hope to be able to understand enough to create more impressive data to show in the near future.
As it stands, I'm looking to release something within a couple of weekends (before I head off to DH Stockholm to get humiliated). I get fibre internet on the 18th (after a FOUR AND A HALF YEAR DELAY), so that should help with me getting an upload!
Fixed a bunch of crashes and updated the way you enter a filter. I am now deep in working on a parsing engine so people who want to use it don't have to go through the crappy process of installing python, setuptools, s2protocol, etc. Would rather people just downloaded it and used it!
I got off my ass (or onto my ass, pick one) and made it go. It's not pretty and some of the stuff doesn't work properly right now, but you can put it on your machine and it'll load your replays and give you some stats.
I'll see about getting a trial version out imminently. I'm sorry for the hilarious lag on doing this.
I am at the point where I think I can get some people to test some of the features for me. Bear i mind that this will be a pretty new thing, so the UI is ass and expect standard issues, etc.
If you are interested, please PM me and I'll set you up with what you need to run it.
Hooray! I am at a release. I am excited to see what peope can do with it! Here's what's available at the moment:
Configurations: Select what folder(s) from which to load replays. Tag your replays - Create multiple tags for extra filtering when exploring builds. Build orders - Set up builds (some included) to filter games. Examples below.
Build Matchups: Compare builds. Here is any protoss vs any zerg. However, any build you can create in the configuration pane can be applied here. Find hidden counters in here!
Build Overview: The first page I ever made for this. What builds are the most popular? Can be shown by time or supply. Adjustable slider at the top to help show when different builds diverge. Figure out when is best the best time to scout! Filtered replays can also be copied out of the massive amount of replays you have. Easily grab those replays.
Heat Maps: The pretty page. See where your units are dying most. Here's all Protoss games where they spawn bottom left on Overgrowth.
Where are you hiding those raxes, Terrans?!
Matchups See how you're doing in all the matchups over time. Are you getting better at your vs T? Are you improving in general? A really general view for you here. I don't have any recent replays here, so you just get Life's play in the latter half of 2014's competitions.
If you're interested in helping me check this out, please let me know. I'm excited to hopefully get some more eyes on it before I just do a general release, so it's not me going "Hey look at this" and everyone saying "IT CRASHES AND NOW MY PC IS A DONUT WHY WOULD YOU DO THIS".