• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 02:09
CEST 08:09
KST 15:09
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Maestros of the Game: Week 1/Play-in Preview9[ASL20] Ro24 Preview Pt2: Take-Off7[ASL20] Ro24 Preview Pt1: Runway132v2 & SC: Evo Complete: Weekend Double Feature4Team Liquid Map Contest #21 - Presented by Monster Energy10
Community News
Weekly Cups (August 25-31): Clem's Last Straw?32Weekly Cups (Aug 18-24): herO dethrones MaxPax6Maestros of The Game—$20k event w/ live finals in Paris46Weekly Cups (Aug 11-17): MaxPax triples again!15Weekly Cups (Aug 4-10): MaxPax wins a triple6
StarCraft 2
General
Geoff 'iNcontroL' Robinson has passed away Team Liquid Map Contest #21 - Presented by Monster Energy Heaven's Balance Suggestions (roast me) Speculation of future Wardii series Weekly Cups (August 25-31): Clem's Last Straw?
Tourneys
LiuLi Cup - September 2025 Tournaments Sea Duckling Open (Global, Bronze-Diamond) Sparkling Tuna Cup - Weekly Open Tournament Maestros of The Game—$20k event w/ live finals in Paris Monday Nights Weeklies
Strategy
Custom Maps
External Content
Mutation # 489 Bannable Offense Mutation # 488 What Goes Around Mutation # 487 Think Fast Mutation # 486 Watch the Skies
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ Simple editing of Brood War save files? (.mlx) ASL20 General Discussion Starcraft at lower levels TvP BW General Discussion
Tourneys
[Megathread] Daily Proleagues Is there English video for group selection for ASL [ASL20] Ro24 Group F [IPSL] CSLAN Review and CSLPRO Reimagined!
Strategy
Simple Questions, Simple Answers Muta micro map competition Fighting Spirit mining rates [G] Mineral Boosting
Other Games
General Games
Stormgate/Frost Giant Megathread General RTS Discussion Thread Warcraft III: The Frozen Throne Nintendo Switch Thread Mechabellum
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine Canadian Politics Mega-thread YouTube Thread
Fan Clubs
The Happy Fan Club!
Media & Entertainment
Anime Discussion Thread Movie Discussion! [Manga] One Piece [\m/] Heavy Metal Thread
Sports
2024 - 2026 Football Thread Formula 1 Discussion TeamLiquid Health and Fitness Initiative For 2023
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread High temperatures on bridge(s) Gtx660 graphics card replacement
TL Community
The Automated Ban List TeamLiquid Team Shirt On Sale
Blogs
A very expensive lesson on ma…
Garnet
hello world
radishsoup
Lemme tell you a thing o…
JoinTheRain
How Culture and Conflict Imp…
TrAiDoS
RTS Design in Hypercoven
a11
Evil Gacha Games and the…
ffswowsucks
INDEPENDIENTE LA CTM
XenOsky
Customize Sidebar...

Website Feedback

Closed Threads



Active: 931 users

AlphaStar AI goes 10-1 against human pros in demonstration…

Forum Index > SC2 General
374 CommentsPost a Reply
Prev 1 13 14 15 16 17 19 Next All
Acrofales
Profile Joined August 2010
Spain18030 Posts
January 30 2019 11:38 GMT
#281
On January 30 2019 19:42 Grumbels wrote:
Show nested quote +
On January 30 2019 09:51 maybenexttime wrote:
Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?

What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.

Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.

Is what I'm proposing very far-fetched?

I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.

Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.

But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.


This is a good point. I'm not sure. It would mean that a game of SC2 of normally ~30 minutes would be played in 0.2 seconds. Even having the map and everything loaded into memory in advance, that seems *very* fast to simulate SC2 with 2 quite heavy RL algorithms making the decisions on both sides. On the other hand, they are running it on a rather powerful setup. 16 TPUs can run a pretty hefty NN in very little time. However, the SC2 engine itself is not easily parallelized, and it still needs to compute every unit's actions every step of the simulation.


archonOOid
Profile Blog Joined March 2011
1983 Posts
January 30 2019 11:44 GMT
#282
I found it weird that Mana had to play Alphastar without any practice sessions because it seems like the AI agent had a playing style. Against human opponents Mana is aware of player tendencies and therefore the matchup gains weight by mind games caused by the meta developed between them and the broader meta Mana engages with online while laddering. Would it not make sense for Mana and also Alphastar do have played practice games before the big 5 matches?
I'm Quotable (IQ)
maybenexttime
Profile Blog Joined November 2006
Poland5608 Posts
Last Edited: 2019-01-30 12:16:13
January 30 2019 12:10 GMT
#283
On January 30 2019 19:42 Grumbels wrote:
Show nested quote +
On January 30 2019 09:51 maybenexttime wrote:
Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?

What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.

Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.

Is what I'm proposing very far-fetched?

I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.

Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.

But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.


Not exactly. The training stage of that module would take place before it would be used in actual games. It would involve trying to recreate replays having information from one player's perspective only. So it would use replays to verify its "predictions" regarding how the game unfolded, but only at the training stage. In the final implementation, where it'd be playing actual opponents (AI or human), the AI would model the game up to the current point in real time. It would rely on early scouting information to narrow down the number of game tree paths to consider - similar to how humans analyze the game. The scouting information would serve as the boundary conditions/anchors.

E.g. let's say the AI sees Nexus first, followed by two Gates and a bunch of Zealots. Firstly, it will reject game tree paths with proxy openings as very unlikely. Secondly, it would simulate various scenarios of how the opponent got there and choose those most probable. After early game it will have a certain belief, as you put it, as to how the game has progressed for both sides so far. This will narrow down the number of game tree paths for it to consider in mid game. The process would closely resemble what humans are currently doing, i.e. creating a mental image of the game.

The implementation I'm proposing would need to be able to simulate SC2 games in quasi-real time. Like you're saying, the ratio of 1 to 10,000 seems excessive. But is it simply a matter of having enough processing power? I'd have to check what sort of hardware they're using to train the AI and then to play against human opponents.


edit: @Acrofales

Would you actually need to parallelize SC2? By that, do you mean simply running one client in parallel with another or something else? Because doing this internally in SC2 could be difficult, but would running multiple clients be a problem? And, as Grumbels said, you'd have to do away with any sort of graphics.
deacon.frost
Profile Joined February 2013
Czech Republic12129 Posts
January 30 2019 13:04 GMT
#284
They don't play SC2 per se, they use some binary build which they launch
I imagine France should be able to take this unless Lilbow is busy practicing for Starcraft III. | KadaverBB is my fairy ban mother.
Zzoram
Profile Joined February 2008
Canada7115 Posts
January 30 2019 13:13 GMT
#285
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.
Xophy
Profile Joined June 2012
Germany79 Posts
January 30 2019 13:30 GMT
#286
On January 30 2019 22:13 Zzoram wrote:
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.


That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.
Polypoetes
Profile Joined January 2019
20 Posts
Last Edited: 2019-01-30 16:19:56
January 30 2019 16:17 GMT
#287
On January 30 2019 10:08 cpower wrote:
Show nested quote +
On January 28 2019 21:15 Polypoetes wrote:
But an AI doesn't get fatigued. Why would you hard-code in artificial fatigue so that the NN develops to avoid the effect of fatigue that it doesn't suffer from in the first place? Also, I don't think even for a human playing a Bo5, fatigue plays a big role. Unless you are jet-lagged or something. I assume you mean mental fatigue, which is hard to notice yourself. From my experience, humans have no obvious problems concentrating for 5x30 minutes.

I don't understand why you say that an AI is not useful unless it has all the flaws humans have.

I may have put in in a wrong way but misclicks do happen a lot in real games and AI is not designed to have misclicks so it's not really a fair battle to start with. I actually have talked with some developers on this program and see if they will try to implement that in the next phases.


Again, what does 'fair' really mean? Humans always blunder in chess. No one in the chess community has ever demanded chess engines blunder on purpose for it to be 'fair' to claim that human computers are better at chess than humans.

Yes, there is the excellent point made earlier about contempt settings. A chess engine doesn't estimate the strength of the opponent. Say you are playing chess and you can make two different moves. One move solidifies your tactical advantage. There is no clear win, but you get a good position with equal material. The other move presents your opponent with big tactical challenges. The opponent has 3 to 4 candidate moves and it is unclear to what position they lead. You have to calculate deep and every new positions has several candidate moves. Incorrect play will lose you a piece. But you have seen all these positions already (because you are a strong engine) and you know the best moves will lead your opponent to win a pawn and have a slightly more active position.

Clearly the best move is to keep the position simple and keep the advantage. The other move would lose you your advantage, and you will lose a pawn as well. But if you know your opponent will never find the best move, you can win the game in the next few moves by playing the continuation you have seen is inferior.

Clearly, the same is true in Starcraft. There is no reason to play around dangerous micro of your enemy when you have identified it is not capable of this.

We don't know if AlphaStar has some special properties to it's NN. It probably has, but not necessarily so. But an ordinary neural network is deterministic. You put in a matrix of data, and the weights and biases give as output a new matrix of data that it's training have taught it belongs to that input. So given exactly the same input, it will do the exact same thing. But there might be many stochastic effects that are not relevant but that do lead to the AI doing different things. So an AI might go DTs or not based on something irrelevant as building placement.

We also don't know if in the internal League the agents knew and learned what engine they were up against. If they don't know, and you let them alternately play thousands of games against two different agents, it will use the adaptions it made against A also against B, and vice versa.

But if you let the same agent play against two static opponent agents. And part of the input is which agent it is, the NN has the potential to overfit to exploit the opponent A and B independently. Similarily, you can select or train AI's against the ability of other agents to find and exploit their weaknesses. You can take an agent you want to improve. You keep it static while you evolve several different agents to exploit it. Then you train your agent of interest against these overfitted exploiter NNs. In parallel, you also need your NN to maintain it's original winrate against normal NNs.

This will discard decision paths and micro tricks that have the potential to be exploited.

You need special tricks for an AI to do mindgames. You could write a NN that does nothing but select the right agent for the right game. You have access to a bunch of replays of human players you play against. You match the patterns you see in these games with what this NN knows about the strengths and weaknesses of your agents. Then you select the agent most likely to beat the human.


As for running the version of SC2 during training, you can run games in parallel. If you have thousands of games you need to simulate, there is no need to run one game in more than 1 core. Just put one game in one thread. Also, you do not need to render graphics. You just need to know the outcome or generate a replay. I don't know how SC2 was written, but in the ideal case, this cuts down computation by a lot, as most power required is needed to render the graphics. I don't think SC2 has any physics engine that actually affects the game outcome, right? So it just needs to keep track of where each unit is, where it is oriented towards, what command it was, etc.
Grumbels
Profile Blog Joined May 2009
Netherlands7031 Posts
January 30 2019 17:38 GMT
#288
Again, what does 'fair' really mean? Humans always blunder in chess. No one in the chess community has ever demanded chess engines blunder on purpose for it to be 'fair' to claim that human computers are better at chess than humans.

While I don't think this question is necessarily very interesting for Deepmind, there is a market for chess engines that can simulate human players of arbitrary skill. Think about how "ladder anxiety" is a real phenomenon and how beneficial it would be for casual players to be able to face off against engines that play human-like and are capable of learning, while being able to dynamically lower their Elo to be just under yours. If there was an inexpensive method of achieving this, that would have meaningful economic value for the gaming industry (about 180 billion dollars in revenue yearly). Bots aren't capable of this, they can be exploited too easily and they don't play like humans.
Well, now I tell you, I never seen good come o' goodness yet. Him as strikes first is my fancy; dead men don't bite; them's my views--amen, so be it.
Acrofales
Profile Joined August 2010
Spain18030 Posts
Last Edited: 2019-01-30 17:40:14
January 30 2019 17:39 GMT
#289
On January 30 2019 22:30 Xophy wrote:
Show nested quote +
On January 30 2019 22:13 Zzoram wrote:
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.


That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.

I don't think this is even a fair qualifier. Each agent right now specifies a specific, heavily optimized build order with assorted micro, contingency plans for when things go wrong, etc. etc. etc. While I agree the more interesting route from an AI point of view is to see whether adaptive play can be learned in this way (although you'll probably need some way to "model" your opponent), for the sake of competition, they might just as well have said it was all a single agent that had learned 5 different strategies, and it would use any one of them. Internally, the "single agent" selects one of the ASL agents at random and loads that. In order to preempt exploitation of a "single strategy" bot, having some rng is very useful.
Grumbels
Profile Blog Joined May 2009
Netherlands7031 Posts
Last Edited: 2019-01-30 18:31:45
January 30 2019 18:31 GMT
#290
On January 31 2019 02:39 Acrofales wrote:
Show nested quote +
On January 30 2019 22:30 Xophy wrote:
On January 30 2019 22:13 Zzoram wrote:
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.


That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.

I don't think this is even a fair qualifier. Each agent right now specifies a specific, heavily optimized build order with assorted micro, contingency plans for when things go wrong, etc. etc. etc. While I agree the more interesting route from an AI point of view is to see whether adaptive play can be learned in this way (although you'll probably need some way to "model" your opponent), for the sake of competition, they might just as well have said it was all a single agent that had learned 5 different strategies, and it would use any one of them. Internally, the "single agent" selects one of the ASL agents at random and loads that. In order to preempt exploitation of a "single strategy" bot, having some rng is very useful.

It's not really a highly specific build order though. It's not chess, where you can specify an opening sequence, because in Starcraft II the exact sequence of what-actions-to-execute-when differs literally every game due to stochastic effects and opponent interaction. I don't think it's that easy to have the agent choose randomly from a catalogue of openings, that strikes me as an AI challenge in itself.
Well, now I tell you, I never seen good come o' goodness yet. Him as strikes first is my fancy; dead men don't bite; them's my views--amen, so be it.
Grumbels
Profile Blog Joined May 2009
Netherlands7031 Posts
Last Edited: 2019-01-31 11:20:11
January 31 2019 11:18 GMT
#291
So, I'm reading through the blog post (I was calling it a paper, but that's not released yet).

Can anyone explain this picture to me?
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#image-34674

This is in the context of the following information.
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.


The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.
Well, now I tell you, I never seen good come o' goodness yet. Him as strikes first is my fancy; dead men don't bite; them's my views--amen, so be it.
maybenexttime
Profile Blog Joined November 2006
Poland5608 Posts
Last Edited: 2019-01-31 11:32:23
January 31 2019 11:28 GMT
#292
They also used a seriously flawed APM benchmark, and didn't seem bothered by the fact that TLO's APM peaked way above what is physically possible. While their work is very interesting, there are many serious issues with how they present it. :<

If what that quote claims is correct, reconstructing the game in quasi-real time seems very possible.
Polypoetes
Profile Joined January 2019
20 Posts
Last Edited: 2019-01-31 11:33:23
January 31 2019 11:31 GMT
#293
I don't know how they can place Mana and TLO on that graph. Maybe this is their Blizzard ladder MMR? Anyway, you cannot compare MMRs of two distinct populations. And furthermore, MMR doesn't take into account that there is a rock-paper-scissor effect where certain styles counter others. This is clearly so in SC in general, and Deepmind has made a point out of it several times that their best agent is consistently beaten by an agent that isn't so highly rated. And one reason for them to have this match is to see how play strength in their agent league translates to play strength in the human realm.

So I guess this chart refers to the MMR of the agents inside their agent league. So it means that the agent with the window restriction was able to be quite strong compared to the non-restricted agents. But their window restricted agent bugged out, for whatever reason. So it lost. May be related to it having to use a window, just less training, Mana adapting and trying harder to find an exploit, or luck.

BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.


They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.
Poopi
Profile Blog Joined November 2010
France12887 Posts
January 31 2019 11:36 GMT
#294
On January 31 2019 20:18 Grumbels wrote:
So, I'm reading through the blog post (I was calling it a paper, but that's not released yet).

Can anyone explain this picture to me?
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#image-34674

This is in the context of the following information.
Show nested quote +
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.


The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.

Their estimated MMR is probably based on their AlphaLeague but with the same MMR calculations that Blizzard use, however they compare it to MaNa real MMR. If they were to let the agents play on the ladder in real time a lot of games, the MMR of the agents would probably be different than their estimated MMR (especially if it has flaws for everyone to abuse ^_^).

Plus the fact that their APM are not calculated the same way as blizzard (blizzard counts 2 for building something for example, because you need to press 2 keys, and 0 for camera movement from the player, but alphastar counts 1 for everything) and their shady APM graphics (TLO hides their agent APM graph but they didn't specify that it was because of rapid fire and essentially useless spam from TLO) make these MMR comparisons even more pointless.

edit: https://blog.usejournal.com/an-analysis-on-how-deepminds-starcraft-2-ai-s-superhuman-speed-could-be-a-band-aid-fix-for-the-1702fb8344d6 they acknowledged that TLO's apm is due to rapid fire in an update of their article apparently, a step in the right direction.
WriterMaru
Grumbels
Profile Blog Joined May 2009
Netherlands7031 Posts
Last Edited: 2019-01-31 11:56:51
January 31 2019 11:49 GMT
#295
On January 31 2019 20:31 Polypoetes wrote:
BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.

I highlighted 'seven days' to contrast with the fourteen days for the agents that beat Mana. I think it's reasonable, based on their play, that these five agents are actually incredibly strong and nigh undefeatable with conventional play. But not the agent that trained for half of the time with a camera restriction and was somewhat simply defeated by Mana. And not the agents that played amateurishly against TLO, those might have been pretty good, but clearly not top tier level.

On January 31 2019 20:36 Poopi wrote:
Show nested quote +
On January 31 2019 20:18 Grumbels wrote:
So, I'm reading through the blog post (I was calling it a paper, but that's not released yet).

Can anyone explain this picture to me?
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#image-34674

This is in the context of the following information.
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.


The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.

edit: https://blog.usejournal.com/an-analysis-on-how-deepminds-starcraft-2-ai-s-superhuman-speed-could-be-a-band-aid-fix-for-the-1702fb8344d6 they acknowledged that TLO's apm is due to rapid fire in an update of their article apparently, a step in the right direction.

Yeah, this is the quote they added underneath the image that offended everyone.
CLARIFICATION (29/01/19): TLO’s APM appears higher than both AlphaStar and MaNa because of his use of rapid-fire hot-keys and use of the “remove and add to control group” key bindings. Also note that AlphaStar's effective APM bursts are sometimes higher than both players.

And this was there before:
In its games against TLO and MaNa, AlphaStar had an average APM of around 280, significantly lower than the professional players, although its actions may be more precise.

So, that seems pretty fair, but then if you look at the conclusion it says.
"These results suggest that AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface."

Which is of course pretty ridiculous, since they just had to admit that the agents had a superior click-rate and when they changed the interface it lost in an embarrassing fashion.

But in my opinion it's not that interesting to continuously litigate this point, and it's a pity that Deepmind wasn't more careful in their presentation. If they had only been a bit more cautious they wouldn't have this level of blowback.
Well, now I tell you, I never seen good come o' goodness yet. Him as strikes first is my fancy; dead men don't bite; them's my views--amen, so be it.
maybenexttime
Profile Blog Joined November 2006
Poland5608 Posts
January 31 2019 11:59 GMT
#296
On January 31 2019 20:31 Polypoetes wrote:They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.


They bother because their goal seems to be to design an AI that beats humans by outstrategizing them in a game of incomplete information. If they have an AI that is poor at decision making in an incomplete information environment but makes up for it with insane mechanics, that completely defeats the purpose.

I think you're very wrong in your thinking that they want to make an AI that is good at SC2 in general, regardless of what it excels in.
Polypoetes
Profile Joined January 2019
20 Posts
Last Edited: 2019-01-31 13:10:25
January 31 2019 12:55 GMT
#297
I think you are very wrong to think that making an AI good at SC2 is going to result in an AI that can trick and outmindgame humans. If you think so, you are delusional about what kind of game SC2 is. And the AI will actually show this.

I have asked this before and I still haven't really got an answer from anyone. But how did you think the AI would outsmart the human and win without outmicroing and battle decision making? How would that look? Maybe it is because of my experience and background, but I think I understand that the AI would just always seem 'lucky' and just win. Which is exactly what we saw in these games. People say the AI didn't scout for DTs and would have auto-lost vs DTs, for example. I don't think so. I think it knew. Maybe not the TLO one, but the one Mana played against, pretty sure. Same with the Pylon build in TLO's base and where there AI used all these shield batteries to keep one immortal alive. I think it didn't bug out and place a pylon there. I think it placed it there to be killed, so the opponent's stalkers don't do something else like scout the choke. Same with the Stargate it build at the ramp, then cancelled it as it was scouted, then rebuild it in the back of the base.
Another obvious thing is the AI building more probes while SC2 players think 2 a patch is enough because Blizzard put n/16 over each nexus.

Do I think humans can beat these AIs by playing against the same identical AI over and over? Probably, so in that respect it is different from chess or go. But that doesn't really matter because you can generate many agents that are all stronger and different enough they cannot be exploited by the same thing.

The AI just knows what to value because it always weights every parameter there is to weight, and it has been trained enough to give each parameter a very good weight. So in chess AlphaZero always seems to know when to be material and when to be positional. Or find moves that work for all three possible positions. Humans are irrational and are simply not able to do this. They have their tendency and playstyle and that will weaken them. It doesn't look impressive when the AI decides to give up an expansion and keep their army alive, like in the carrier game with TLO. But it is a complex calculation which humans have a hard time evaluating.

I think this holds true in general, that an AI that cannot be beaten by humans but that is making decisions that seem to be mistakes are likely not mistakes. This we saw in go where top players said Alphago made mistakes and weak moves. You can only show them to be mistakes by exploiting them and winning them. Of course this was expecially relevant in go because Alphago always knew which move to make to get a position that wins 52% of the time with 1 more point than your opponent, compared to winning only 51% of the time but now with a larger point difference. It was able to ignore at which margin it would win. So if this translates to Starcraft, then the AI rarely wins decisively, but as a result loses way less. Humans don't just want to win. They want to win convincingly. If you win but it feels like you never played any better, you aren't that satisfied.

I already said I agreed that in the future it will be interesting to make AIs that play weak like a human. But that is certainly not what Deepmind's goal is so far. They want to show they can 'solve' the problem of SC2 by winning against the best player.

To all these people that think the bot didn't out-strategize anyone; you will never be satisfied by the bots ability to strategize, because the nature of SC2 is not what you think it is. For you to get what you want you need a different game.
Grumbels
Profile Blog Joined May 2009
Netherlands7031 Posts
Last Edited: 2019-01-31 16:02:53
January 31 2019 15:42 GMT
#298
TLO said Deepmind told him to not make hallucinated units, because it confuses the AI. — in the Pylon show. Someone else said it doesn’t understand burrowed units due to the Blizzard api, and apparently it sort of bugged out as terran because it would lift its buildings in order to not lose and not make progress in training.

Another funny bug that TLO mentioned was that the AI learned to kinda crash the game in order to avoid losing during training.
Well, now I tell you, I never seen good come o' goodness yet. Him as strikes first is my fancy; dead men don't bite; them's my views--amen, so be it.
Dazed.
Profile Blog Joined March 2008
Canada3301 Posts
January 31 2019 15:54 GMT
#299
So how valid are the builds used by a human against a human?/how has experimentation gone with over building workers, etc?

stalker oracle pvp seemed interesting
Never say Die! ||| Fight you? No, I want to kill you.
Dangermousecatdog
Profile Joined December 2010
United Kingdom7084 Posts
January 31 2019 16:24 GMT
#300
Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by. Overmaking workers is an opportunity cost after 16 where you don't get your money back till 2.5 mins after you queued the worker. It makes sense if you are planning to transfer or losing units to harrass. Zerg players for example notably do and does overdrone.The point of deepmind PR stunt was not to show it can win against the best human player (mana and tlo aren't even close to the best) but to show that it could outstrategize humans, but in general it just outmuscled them with massive and accurate spikes of APM.
Prev 1 13 14 15 16 17 19 Next All
Please log in or register to reply.
Live Events Refresh
Replay Cast
00:00
SEL S2 Championship: Playoffs
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft: Brood War
Britney 24502
Sea 652
Leta 319
PianO 273
sSak 109
yabsab 86
NaDa 31
Icarus 9
Dota 2
NeuroSwarm107
League of Legends
JimRising 599
Counter-Strike
Stewie2K569
Other Games
summit1g8573
shahzam715
WinterStarcraft539
C9.Mang0504
crisheroes437
hungrybox209
Sick123
Maynarde76
Mew2King39
Organizations
Other Games
gamesdonequick548
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 14 non-featured ]
StarCraft 2
• Berry_CruncH302
• Sammyuel 40
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• Lourlo1256
• Stunt335
Upcoming Events
The PondCast
3h 51m
RSL Revival
3h 51m
Maru vs SHIN
MaNa vs MaxPax
Maestros of the Game
10h 51m
Classic vs TriGGeR
Reynor vs SHIN
OSC
20h 51m
MaNa vs SHIN
SKillous vs ShoWTimE
Bunny vs TBD
Cham vs TBD
RSL Revival
1d 3h
Reynor vs Astrea
Classic vs sOs
Maestros of the Game
1d 10h
Serral vs Ryung
ByuN vs Zoun
BSL Team Wars
1d 12h
Team Bonyth vs Team Dewalt
CranKy Ducklings
2 days
RSL Revival
2 days
GuMiho vs Cham
ByuN vs TriGGeR
Cosmonarchy
2 days
TriGGeR vs YoungYakov
YoungYakov vs HonMonO
HonMonO vs TriGGeR
[ Show More ]
Maestros of the Game
2 days
Solar vs Bunny
Clem vs Rogue
[BSL 2025] Weekly
2 days
RSL Revival
3 days
Cure vs Bunny
Creator vs Zoun
Maestros of the Game
3 days
Maru vs Lambo
herO vs ShoWTimE
BSL Team Wars
3 days
Team Hawk vs Team Sziky
Sparkling Tuna Cup
4 days
Monday Night Weeklies
4 days
Liquipedia Results

Completed

CSL Season 18: Qualifier 2
SEL Season 2 Championship
HCC Europe

Ongoing

Copa Latinoamericana 4
BSL 20 Team Wars
KCM Race Survival 2025 Season 3
BSL 21 Qualifiers
ASL Season 20
CSL 2025 AUTUMN (S18)
RSL Revival: Season 2
Maestros of the Game
Sisters' Call Cup
BLAST Open Fall Qual
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025

Upcoming

LASL Season 20
2025 Chongqing Offline CUP
BSL Season 21
BSL 21 Team A
Chzzk MurlocKing SC1 vs SC2 Cup #2
EC S1
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
MESA Nomadic Masters Fall
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.