|
Introduction: Hello everyone. I have been a TL member and a SC2 player for a while. I love the game and TL and Liquipedia are the places i come to check for new strategies, VODs, streams, etc. That said, what’s this post all about?
Well, I think we all agree that SC2 is a strategy game. And as in any strategy game (e.g. Chess), our first moves are what matter the most. A small edge achieved in the first minutes of a game will produce a huge advantage later on. This is why Chess players have a great literature on the openings, and if you want to play Chess competitively you have to study the openings and their counters. We do the same with SC2, we study build orders. We study our replays, we watch professional player streams, we check the strategy forums, etc. in order to learn and discover new build orders and strategy to achieve that small advantage in the first minutes of a match. But I find this approach to have a (quite obvious) problem: apart from our personal perceptions, there is, in my opinion, a lack of data to build an objective idea on.
SC2Stats.net: With this in mind, back in November I got an idea: we don’t actually miss data, we just miss something to elaborate it and get information and possibly knowledge out of it. We know that there are places on the web that host saved matches (i.e. replays), and this is the raw data we need. What we missed is something that, given a replay, tells what was the strategy played: that’s what I tried to do and what I think I did. Once we can infer the build order from a replay, we can see, over a bunch of replays using that same build order, how much success that strategy has. And this is what I did, I ran my ‘build order recognizer’ on over 9000 (pun not intended) ‘pro’ replays and hosted the results on www.sc2stats.net. There you can see, for each race, what are the the most successful build orders, how much they win, how much they are played, etc.
This Is It?: No, actually, that’s just the homepage. If you register yourself on the website (which is completely free, obviously) and you upload your replays, you will have access to features like: - Replay Analysis: you will be able to see the list of the orders you and your opponent gave during the game. - Build Order Statistics: what build orders you play the most, what are your winning strategies on a certain match-up, what opening instead you seem to don’t get along with, etc. - Race and Map Statistics: keep track and visualize your win ratio for each match-up and its trend over time; you can see joint break-downs too (e.g. where you seem to win the most against Zerg). - APM Statistics: check you APMs on each game you played and their average trend over time. - More to come: the website is really new and can be considered a beta at the moment. We plan to add new features that we already have in mind, but we’re also very open to community suggestions; so if you think we miss something important that you want to see in just tell us at feedback@sc2stats.net. I probably also forgot features that are already included so if you see it in person it’s better.
So yeah, this is it. If you have question to ask me or want to let me know what do you think I’ll be here to answer and of course you can contact me at feedback@sc2stats.net, or hit me on BNet (EU, riso.269). And if you want to check the website the URL is www.sc2stats.net of course.
riso
|
Build orders arent the only factor which win games.
|
On February 26 2011 21:33 limonovich wrote: Build orders arent the only factor which win games.
This is true indeed. And this make the statistics on the replays not completely accurate. But you will agree that the build order plays a big role on the winning of a match. So you can say that in 100 replays using the same build order you win 80, probably there is a correlation between the build order and the win ratio. So on a big data set you can be more confident that you can show correct relations between build orders and win ratios. (technically the alpha value of the test decreases with the sample size..)
|
i can 6 pool against a terran on shakuras with far positions and still win even if he walls off. it doesnt mean that 6 pool is a viable strat vs t on shakuras with far spawns.
i agree with the statistics gathering, i just think it can be done a little bit more effectively in order to reach far more solid conclusions.
|
On February 26 2011 21:58 limonovich wrote: i can 6 pool against a terran on shakuras with far positions and still win even if he walls off. it doesnt mean that 6 pool is a viable strat vs t on shakuras with far spawns.
i agree with the statistics gathering, i just think it can be done a little bit more effectively in order to reach far more solid conclusions.
There are 3 possible scenarios in my opinion: 1. you play an opponent 'easy': you will win every time 2. you play an opponent stronger than you: you should never win. 3. you play an opponent of your same skill level: you win x times.
The first 2 cases aren't interesting and if you actually include them you just add noise on data. So let's say that you just analyse games that fall in the third category. Also you analyse a lot of those. Let's say you analyse 100 replays of 6pool played from Idra against Jinro (or any equivalent 'high level' players). If 6pool wins 80 times, I think you can say, with some confidence, that 6pool is a winning strategy against terran.
|
Interesting. I'll be checking back to see how this grows. I'm not sure how much data you can get out of a replay file but there are tons of possibilities depending on the data available.
|
On February 27 2011 06:42 DiDigital wrote: Interesting. I'll be checking back to see how this grows. I'm not sure how much data you can get out of a replay file but there are tons of possibilities depending on the data available.
Thank you anyway the parsing of a replay is done with the php library provided by http://code.google.com/p/phpsc2replay/ so the data we can get is basically the list of actions that have been played in the game and other 'meta informations' like player number and names, the date the game was played on, game duration, the map on which the game was played. Well for a complete list you might want to check their API. What we store for the build order statistics is actually much less: a simplified action list and whether the build order won or not.
|
Is there a reason that unit locations and other command inputs are not able to be parsed? Is that because they are not stored or because no one has discovered how to access them? In order for a replay to properly function there seems to need to be a little more information stored in the file.
|
Interesting it seems that what everyone already knew for zerg against terran that if you go ling mutas you have a high win percentage... Though it doesn't seem to take into account bunker rushing and etc vs fast expands.
|
This is great! Will need to check this out.
Now, I also happen to play chess and have seen the chess databases. What the databases will show is the winning percentage if a particular move is chosen. If the "build order analyser" can somehow recognise the different variations of an opening and churn out statistics based on pro-level replays, this will be incredibly helpful.
For example, take TvZ 2-rax openings. The analyser can categorise them something like this: - 2-rax expand: T wins 55% - 2-rax marine/SCV all-in: T wins 40% - 2-1-1: T wins 35% etc
I also thought of something: Is it possible to use the match history of top players in order to do this analysis? Because this would give a proper representation. Generally, people don't save replays if the game is not good (i.e. the early rush ends it).
|
Firstly riso, you are a cadillac of men for doing this.
Secondly, I am wondering how sensitive your analysis is to game length. In the case of a macro game, the opening is designed to allow you to enter the midgame on an even footing (or perhaps with a lead if your opponent made some control errors, etc). So, for a macro type game the winning or losing actually happens in the midgame. My point therefore is that different openings could have led to a similar midgame and thus the opening doesn't determine whether the game was won or not (apart from allowing a player to reach the midgame of course). Naturally things are different for an all-in type strategy, where it either works or it doesn't.
What are your thoughts? Did I miss something obvious here, or should this midgame factor be taken into account?
Thanks, and keep being awesome!
|
You rock man, great job. I've been using a XLS sheet to analize my data.
I'd be a nice addition to show apm averages (and improvement over time).
Also, it would be great but don't know if really possible, to see statistics of ur opponent's ladder points and league. At least, that's what I also like to store on my XLS sheet. You can remove the bonus pool factor from ur opponent's points and check if u are facing (and defeating) stronger opponents. Don't know if u can get this data from blizzard profile page, or even from sc2ranks? It would be HUGE.
I know this isn't maybe ur site objective, but well, i'm just throwing this out =)
Thank u anyways, i'll be using this a lot.
PS: Pardon my engrish prease.
|
On February 28 2011 17:45 DiDigital wrote: Is there a reason that unit locations and other command inputs are not able to be parsed? Is that because they are not stored or because no one has discovered how to access them? In order for a replay to properly function there seems to need to be a little more information stored in the file.
This is a bit of a gray area of my knowledge on replay parsing so don't trust me too much; the guys that developed the parsing libraries are really helpful and a lot more knowledgeable than me on this argument. Anyway this is what I think: 1. Information like 'right click on position (x,y)' is certainly available in the replay. There are other tools (SC2 gears come to mind) that show a map of your mouse movements during a match, so this is certainly true. As for the parsing libraries we're using, I'm not actually sure that they parse the position (I think they do), but they parse commands like 'right click'. ATM we're just ignoring them. 2. Assuming that a replay contains something like 'select marine and issue attack move to position (x,y)' can we say that the marine will actually move to position (x,y)? This is a bit tricky because when you run this command through the game engine, this is enough information: the marine will compute its pathing to the location, it will start move and if he find something to shoot he will start shooting and eventually die as any good marine should do. We can't do the same simply reading the replay; so if this is the only information available you can't know things like position of unit X at time t.
Anyway my friend (I never said this here but we're two developing this actually) has the idea to implement some kind of javascript replay player (you see what happened during a match on a minimap), so if this will ever see the light means that we found a way of reading such informations .
DrBoo wrote: Interesting it seems that what everyone already knew for zerg against terran that if you go ling mutas you have a high win percentage... Though it doesn't seem to take into account bunker rushing and etc vs fast expands.
Azzur wrote: This is great! Will need to check this out.
Now, I also happen to play chess and have seen the chess databases. What the databases will show is the winning percentage if a particular move is chosen. If the "build order analyser" can somehow recognise the different variations of an opening and churn out statistics based on pro-level replays, this will be incredibly helpful.
For example, take TvZ 2-rax openings. The analyser can categorise them something like this: - 2-rax expand: T wins 55% - 2-rax marine/SCV all-in: T wins 40% - 2-1-1: T wins 35% etc
I also thought of something: Is it possible to use the match history of top players in order to do this analysis? Because this would give a proper representation. Generally, people don't save replays if the game is not good (i.e. the early rush ends it).
I think these can be addressed together. The question would be if some kind of "what-if" analysis would be possible, i.e. what is the success of strategy X if used as a response to strategy Y. The answer is that I'm not sure. It is certainly possible to do, but I'm not sure if the results would hold any value. The problem is that chess is different from SC: in the former an opening is an exact sequence of moves; so if you play 2 games and use the same strategy, you will do the exact same moves. In the latter, you might playing the same strategy but your moves might differ a bit. In the FAQ page (www.sc2stats.net/st_faq.php) there is a description of the algorithm that does the build order recognition, you might want to check it out to have a better idea. Anyway, just keep in mind that build order recognition is never exact: and intuitively you can understand that if you apply the algorithm 2 times you amplify the possibility of an error. This is why I'm unsure of the effectiveness of a "what-if" analysis. But this would be certainly interesting, and I'm not saying that it's impossible so I'll add this to my 'to do' list and see whether is possible or not.
As for what replay we use to do the build order analysis, the answer is that we already use replay from top players :D.
ZXRP wrote: Firstly riso, you are a cadillac of men for doing this.
Secondly, I am wondering how sensitive your analysis is to game length. In the case of a macro game, the opening is designed to allow you to enter the midgame on an even footing (or perhaps with a lead if your opponent made some control errors, etc). So, for a macro type game the winning or losing actually happens in the midgame. My point therefore is that different openings could have led to a similar midgame and thus the opening doesn't determine whether the game was won or not (apart from allowing a player to reach the midgame of course). Naturally things are different for an all-in type strategy, where it either works or it doesn't.
What are your thoughts? Did I miss something obvious here, or should this midgame factor be taken into account?
Thanks, and keep being awesome!
You are right. We're missing some kind of weighting on the game length. The opening in a 40 min replay is certainly less important that the opening in a 10 min one. Good news is that we already thought this!. It's a thing I wanted in and will do asap, since it shouldn't be much of a problem to add.
And honestly, being a cadillac of men is one of the most awesome things someone ever told me :D.
Finally, I want to thank everyone for the feedback I want to add that we just made a first round of small bug fixes yesterday and made the registration process a bit easier. We're currently working on a major bug that prevents many replays from being analysed, a more complete build order browser (you can say only some selected build orders now) and a logo 
|
On February 28 2011 21:11 Fede wrote: You rock man, great job. I've been using a XLS sheet to analize my data.
I'd be a nice addition to show apm averages (and improvement over time).
Also, it would be great but don't know if really possible, to see statistics of ur opponent's ladder points and league. At least, that's what I also like to store on my XLS sheet. You can remove the bonus pool factor from ur opponent's points and check if u are facing (and defeating) stronger opponents. Don't know if u can get this data from blizzard profile page, or even from sc2ranks? It would be HUGE.
I know this isn't maybe ur site objective, but well, i'm just throwing this out =)
Thank u anyways, i'll be using this a lot.
PS: Pardon my engrish prease.
Hey, for the APM shouldn't really be a problem. The parsing libraries actually already provide a reading of APM so it's just about storing it and drawing a chart. So we'll definitely do this.
As for the opponent statstics, I fear this to be unfeasible since without the character code (which is not included in the replay afaik), there is no way to identify your opponent. I'll look into this but don't get your hopes up!
|
Awesome, thanks for the quick reply.
Are you sure the charcodes are not in the replay? I think after watching a replay in-game and going to the score screen you can check the player's profile. It wouldn't be possible without the charcodes there, right? I'm probably wrong since u must certainly be better informed about this.
GL !
|
There is no score screen included in the reply.^^
|
yeah the score screen is not included in the replay. I thought that since you can check the player's profile even after the game (i.e. in your matches history) maybe the character code was contained in the replay but apparently it's just saved somewhere else, maybe in the BNet servers or in some other file.. :\
|
First of all, I don't mean to be annoying I am really looking forward to see your project grow big, as I think it could possibly be huge.
I just remembered that when uploading a replay to sc2replayed u can actually check the player's bnet profile, so they are able to get that from the replay. No idea how obviously
|
|
|
Just the fact that people are looking into the statistical side of SC2 and creating these huge projects like this, and other awesome resources (sc2gears etc.), is really cool. I'll upload my replays to the site to try and help contribute to this project, and analyze my own replays of course =). Thanks for all of your hard work and good luck!
There are two main concerns that come to my mind when thinking of a project like this though:
1. How do you deal with duplicate replays? For example two players play a game vs. each other and both upload the replay. Is that replay incorrectly weighted twice or are they identified as duplicates and counted correctly as a single game? I could see this really being a problem when pro players release replay packs and people upload them.
2. StarCraft, even in the early game, is not a game of completely static/rigid build orders but rather a game of adaptation. For example, 14 hatch 14 pool, 14 hatch 15 pool, and 15 hatch 15 pool are essentially all the same opening that has a slight variation based on the map and scouting information. The difference in these openers is not significant to win or lose a game, but may (or may not?) appear as different strategies in the database. From my perspective hatch first then pool should be considered a single opener regardless of the supply count that things go down. Is this the way it works or are they counted separately in its' current state?
|
|
|
|
|
|