Data analysis on 8 million games - Page 6
Forum Index > BW General |
earob84
Germany172 Posts
| ||
Darkwhite
Norway348 Posts
On October 16 2023 07:07 TMNT wrote: You know what, one thing that has been bugging me is the 60-70% win rate around 2-3 minutes in PvT at all mmr range. Because it makes little sense. I can understand that win rate in all the Zerg matchups because of 4 pool. But for PvT, if you do proxy gate, the first zealot arrives at Terran's base around 2:30-2:40, so the Terrans just leave the game immediately without a fight? But if that is the case, the win rate must be close to 100% for Protoss. So I guess the 30-40% Terran wins at that minute mark is due to Protoss leaving the game immediately after BBS. So no one bothers to pull probes and SCVs to micro? It only shows the win-rates for the given duration. Longer games probably mostly end in a normal way. When filtering for games that are shorter than what is reasonable, the data is likely dominated by abnormal stuff. It's hard to guess what fractions of the super short games are actual decisive results and what is basically ragequits, from losing a worker, from not wanting to play vs gas steal or proxy or from accidentally cancelling a supply depot. Also, note that analysing the win-by-duration graphs does not actually show what you might expect it to show. Matchmaking sort-of guarantees that the win-rate over all possible game lengths will be close to 50%, so a strong early timing cannot change the overall balance of the matchup (because matchmaking controls for it), but must instead "move" or "borrow" win percentage from other game durations to the timing window. An easy way of understanding this is imagining a player who learns a great PvT DT build an suddenly wins a lot more games around 8 minutes. If he still plays exactly the same as before in all his non-DT games, his win rate at other durations will still drop because his MMR adjusts and his opponents are better. | ||
TentativePanda
United States800 Posts
On October 17 2023 01:20 Kraekkling wrote: You could try and go to repmastered.app and examine this type of games, maybe you'll be able to find some pattern? Other than that, we're looking at a small numer of games overall. There also is some baseline probabilty of players just randomly quitting games or disconnecting, which results in a number of games within this period with a win rate of 50% for both races. This effect could be quantified by looking at the win rate at the 0-1 minute period, where we expect basically no player interaction at all. This information is not available to us, because games shorter than 2 minute were filtered out in the pre-selection of data. The only useful estimate which I can provide here is that the overall rate of disconnects is somewhere around 2% of all games. The win rate in PvT around 2-3 minutes is 58% for all players combined. In the MMR-bracket plot all data points except the one with the biggest error bar are close to 60%. Some speculative thoughts: + Show Spoiler + Scenarios that might make a Terran quit around 2-3 minutes: - scouting probe enters before depot finishes - barracks delayed due to scouting probe - scouting probe kills scv - gas stolen - manner pylon - Terran scouts proxy gate(s) - zealot enters Scenarios that might make a Protoss quit around 2-3 minutes: - something involving 12nex? - ??? I'm not even joking when I say this, I think there is an effect from protoss players who leave when they don't scout the Terran first. I have definitely went back into replay to see why a protoss left and it was when they reach first empty base with scouting probe. They are sick people LOL | ||
Physician
United States4146 Posts
| ||
ajmbek
Italy459 Posts
| ||
Dakota_Fanning
Hungary2332 Posts
On October 20 2023 06:50 ajmbek wrote: As this work looks good and meticulous I would like to point at just one small detail. There is no such a thing as a 7:00 - 8:00 1 minute interval. It can be 7:00 - 7:59 or 7:01 8:00. I believe that does not change the data in any way, but can make you count twice all the games a good amount of games, precisely 1 every 30. I'm sure the 7-8 minute phrase is just for easy comprehension. I'm sure the upper bound is exclusive and no game is counted twice in multiple periods. It's much easier to read and write "7-8 minute" than "7:00-7:59". | ||
moktira
Ireland1542 Posts
| ||
zimp
Hungary951 Posts
I see that the majority of the games are new, but what was wrong with reps before 2018? do you consider the dataset too small before that? | ||
JieXian
Malaysia4677 Posts
| ||
Kraekkling
Romania367 Posts
If you really enjoyed the insights, consider donating a small amount to repmastered, which made this possible. Many more working hours were needed for that project and there are recurring costs to host all those replays... On October 17 2023 03:08 Cryoc wrote: Very nice analysis. Could you maybe also add the cumulative distribution functions of your histograms? This would paint a better picture of how much a time frame with a very one-sided win rate actually matters for the overall win rate. I thought about this but couldn't come up with a non-cluttered way to visualize this. In particular for the plots with multiple selections, you'd need cumulative distributions for each of them. Also when looking at mutlipe ones on the same plot they'd need to be normalized. And, one might need to adjust some of the selection criteria to deal with low statistics in the samples. That's where I mostly stopped thinking about it. If you have something specific in mind, I might look into it. On October 22 2023 22:43 zimp wrote: I see that the majority of the games are new, but what was wrong with reps before 2018? do you consider the dataset too small before that? No particular reason for that, the cutoff is arbitrary. But yes, we're losing a negligible amount of data. | ||
scroogec
1 Post
| ||
LUCKY_NOOB
Bulgaria1339 Posts
On October 22 2023 22:43 zimp wrote: thank you! great! I see that the majority of the games are new, but what was wrong with reps before 2018? do you consider the dataset too small before that? I wouldn't say it's a small data set even before 2018 (just comparatively small). I think the newer the data the better. As with time more things have been figured out. | ||
tankgirl
314 Posts
is it possible to search the repmastered database for matching hotkeys between different playernames? e.g. analyze hotkey usage from old FlaSh replays and then cross-check recent/new replays to see if he is active again... | ||
Diggity
United States806 Posts
I would love to see a comparison of race win percentages vs mmr which could then be correlated to the timings to see what players potentially need to work on in mmr brackets for a particular race. Ex) +1 timings seem effective in tvp up until this MMR range at which point terran need to incorporate new strategies As an aside, is it possible to replace the yellow with another color? It is challenging to track visually. | ||
AntiHack
Switzerland552 Posts
On October 11 2023 12:21 TT1 wrote: Is it possible to do this with progamer replays or 2500+ ladder games from cwal (the ladder games would prob be way better cus of sample size)? I know the sample size would be way less but the quality of games is way more important for this type of analysis. Thanks for this tho, great work. There's already one with all the ASL/KSL games featured in a recent Tasteless video | ||
| ||