AlphaStar AI goes 10-1 against human pros in demonstration…

Acrofales

Spain17832 Posts

January 30 2019 11:38 GMT

#281

On January 30 2019 19:42 Grumbels wrote:

I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.

Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.

But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.

This is a good point. I'm not sure. It would mean that a game of SC2 of normally ~30 minutes would be played in 0.2 seconds. Even having the map and everything loaded into memory in advance, that seems *very* fast to simulate SC2 with 2 quite heavy RL algorithms making the decisions on both sides. On the other hand, they are running it on a rather powerful setup. 16 TPUs can run a pretty hefty NN in very little time. However, the SC2 engine itself is not easily parallelized, and it still needs to compute every unit's actions every step of the simulation.

archonOOid

1983 Posts

January 30 2019 11:44 GMT

#282

I found it weird that Mana had to play Alphastar without any practice sessions because it seems like the AI agent had a playing style. Against human opponents Mana is aware of player tendencies and therefore the matchup gains weight by mind games caused by the meta developed between them and the broader meta Mana engages with online while laddering. Would it not make sense for Mana and also Alphastar do have played practice games before the big 5 matches?

maybenexttime

Poland5419 Posts

January 30 2019 12:10 GMT

#283

On January 30 2019 19:42 Grumbels wrote:

Show nested quote +

Not exactly. The training stage of that module would take place before it would be used in actual games. It would involve trying to recreate replays having information from one player's perspective only. So it would use replays to verify its "predictions" regarding how the game unfolded, but only at the training stage. In the final implementation, where it'd be playing actual opponents (AI or human), the AI would model the game up to the current point in real time. It would rely on early scouting information to narrow down the number of game tree paths to consider - similar to how humans analyze the game. The scouting information would serve as the boundary conditions/anchors.

E.g. let's say the AI sees Nexus first, followed by two Gates and a bunch of Zealots. Firstly, it will reject game tree paths with proxy openings as very unlikely. Secondly, it would simulate various scenarios of how the opponent got there and choose those most probable. After early game it will have a certain belief, as you put it, as to how the game has progressed for both sides so far. This will narrow down the number of game tree paths for it to consider in mid game. The process would closely resemble what humans are currently doing, i.e. creating a mental image of the game.

The implementation I'm proposing would need to be able to simulate SC2 games in quasi-real time. Like you're saying, the ratio of 1 to 10,000 seems excessive. But is it simply a matter of having enough processing power? I'd have to check what sort of hardware they're using to train the AI and then to play against human opponents.

edit: @Acrofales

Would you actually need to parallelize SC2? By that, do you mean simply running one client in parallel with another or something else? Because doing this internally in SC2 could be difficult, but would running multiple clients be a problem? And, as Grumbels said, you'd have to do away with any sort of graphics.

deacon.frost

Czech Republic12128 Posts

January 30 2019 13:04 GMT

#284

They don't play SC2 per se, they use some binary build which they launch

Zzoram

Canada7115 Posts

January 30 2019 13:13 GMT

#285

It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.

Xophy

Germany78 Posts

January 30 2019 13:30 GMT

#286

On January 30 2019 22:13 Zzoram wrote:
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.

That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.

Polypoetes

20 Posts

January 30 2019 16:17 GMT

#287

On January 30 2019 10:08 cpower wrote:

Show nested quote +

I may have put in in a wrong way but misclicks do happen a lot in real games and AI is not designed to have misclicks so it's not really a fair battle to start with. I actually have talked with some developers on this program and see if they will try to implement that in the next phases.

Again, what does 'fair' really mean? Humans always blunder in chess. No one in the chess community has ever demanded chess engines blunder on purpose for it to be 'fair' to claim that human computers are better at chess than humans.

Yes, there is the excellent point made earlier about contempt settings. A chess engine doesn't estimate the strength of the opponent. Say you are playing chess and you can make two different moves. One move solidifies your tactical advantage. There is no clear win, but you get a good position with equal material. The other move presents your opponent with big tactical challenges. The opponent has 3 to 4 candidate moves and it is unclear to what position they lead. You have to calculate deep and every new positions has several candidate moves. Incorrect play will lose you a piece. But you have seen all these positions already (because you are a strong engine) and you know the best moves will lead your opponent to win a pawn and have a slightly more active position.

Clearly the best move is to keep the position simple and keep the advantage. The other move would lose you your advantage, and you will lose a pawn as well. But if you know your opponent will never find the best move, you can win the game in the next few moves by playing the continuation you have seen is inferior.

Clearly, the same is true in Starcraft. There is no reason to play around dangerous micro of your enemy when you have identified it is not capable of this.

We don't know if AlphaStar has some special properties to it's NN. It probably has, but not necessarily so. But an ordinary neural network is deterministic. You put in a matrix of data, and the weights and biases give as output a new matrix of data that it's training have taught it belongs to that input. So given exactly the same input, it will do the exact same thing. But there might be many stochastic effects that are not relevant but that do lead to the AI doing different things. So an AI might go DTs or not based on something irrelevant as building placement.

We also don't know if in the internal League the agents knew and learned what engine they were up against. If they don't know, and you let them alternately play thousands of games against two different agents, it will use the adaptions it made against A also against B, and vice versa.

But if you let the same agent play against two static opponent agents. And part of the input is which agent it is, the NN has the potential to overfit to exploit the opponent A and B independently. Similarily, you can select or train AI's against the ability of other agents to find and exploit their weaknesses. You can take an agent you want to improve. You keep it static while you evolve several different agents to exploit it. Then you train your agent of interest against these overfitted exploiter NNs. In parallel, you also need your NN to maintain it's original winrate against normal NNs.

This will discard decision paths and micro tricks that have the potential to be exploited.

You need special tricks for an AI to do mindgames. You could write a NN that does nothing but select the right agent for the right game. You have access to a bunch of replays of human players you play against. You match the patterns you see in these games with what this NN knows about the strengths and weaknesses of your agents. Then you select the agent most likely to beat the human.

As for running the version of SC2 during training, you can run games in parallel. If you have thousands of games you need to simulate, there is no need to run one game in more than 1 core. Just put one game in one thread. Also, you do not need to render graphics. You just need to know the outcome or generate a replay. I don't know how SC2 was written, but in the ideal case, this cuts down computation by a lot, as most power required is needed to render the graphics. I don't think SC2 has any physics engine that actually affects the game outcome, right? So it just needs to keep track of where each unit is, where it is oriented towards, what command it was, etc.

Grumbels

Netherlands7028 Posts

January 30 2019 17:38 GMT

#288

While I don't think this question is necessarily very interesting for Deepmind, there is a market for chess engines that can simulate human players of arbitrary skill. Think about how "ladder anxiety" is a real phenomenon and how beneficial it would be for casual players to be able to face off against engines that play human-like and are capable of learning, while being able to dynamically lower their Elo to be just under yours. If there was an inexpensive method of achieving this, that would have meaningful economic value for the gaming industry (about 180 billion dollars in revenue yearly). Bots aren't capable of this, they can be exploited too easily and they don't play like humans.

Acrofales

Spain17832 Posts

January 30 2019 17:39 GMT

#289

On January 30 2019 22:30 Xophy wrote:

Show nested quote +

I don't think this is even a fair qualifier. Each agent right now specifies a specific, heavily optimized build order with assorted micro, contingency plans for when things go wrong, etc. etc. etc. While I agree the more interesting route from an AI point of view is to see whether adaptive play can be learned in this way (although you'll probably need some way to "model" your opponent), for the sake of competition, they might just as well have said it was all a single agent that had learned 5 different strategies, and it would use any one of them. Internally, the "single agent" selects one of the ASL agents at random and loads that. In order to preempt exploitation of a "single strategy" bot, having some rng is very useful.

Grumbels

Netherlands7028 Posts

January 30 2019 18:31 GMT

#290

On January 31 2019 02:39 Acrofales wrote:

Show nested quote +

It's not really a highly specific build order though. It's not chess, where you can specify an opening sequence, because in Starcraft II the exact sequence of what-actions-to-execute-when differs literally every game due to stochastic effects and opponent interaction. I don't think it's that easy to have the agent choose randomly from a catalogue of openings, that strikes me as an AI challenge in itself.

Grumbels

Netherlands7028 Posts

January 31 2019 11:18 GMT

#291

So, I'm reading through the blog post (I was calling it a paper, but that's not released yet).

Can anyone explain this picture to me?
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#image-34674

This is in the context of the following information.

In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.

The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.

maybenexttime

Poland5419 Posts

January 31 2019 11:28 GMT

#292

They also used a seriously flawed APM benchmark, and didn't seem bothered by the fact that TLO's APM peaked way above what is physically possible. While their work is very interesting, there are many serious issues with how they present it. :<

If what that quote claims is correct, reconstructing the game in quasi-real time seems very possible.

Polypoetes

20 Posts

January 31 2019 11:31 GMT

#293

I don't know how they can place Mana and TLO on that graph. Maybe this is their Blizzard ladder MMR? Anyway, you cannot compare MMRs of two distinct populations. And furthermore, MMR doesn't take into account that there is a rock-paper-scissor effect where certain styles counter others. This is clearly so in SC in general, and Deepmind has made a point out of it several times that their best agent is consistently beaten by an agent that isn't so highly rated. And one reason for them to have this match is to see how play strength in their agent league translates to play strength in the human realm.

So I guess this chart refers to the MMR of the agents inside their agent league. So it means that the agent with the window restriction was able to be quite strong compared to the non-restricted agents. But their window restricted agent bugged out, for whatever reason. So it lost. May be related to it having to use a window, just less training, Mana adapting and trying harder to find an exploit, or luck.

BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.

They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.

Poopi

France12758 Posts

January 31 2019 11:36 GMT

#294

On January 31 2019 20:18 Grumbels wrote:
So, I'm reading through the blog post (I was calling it a paper, but that's not released yet).

Can anyone explain this picture to me?
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#image-34674

This is in the context of the following information.

Show nested quote +

Their estimated MMR is probably based on their AlphaLeague but with the same MMR calculations that Blizzard use, however they compare it to MaNa real MMR. If they were to let the agents play on the ladder in real time a lot of games, the MMR of the agents would probably be different than their estimated MMR (especially if it has flaws for everyone to abuse ^_^).

Plus the fact that their APM are not calculated the same way as blizzard (blizzard counts 2 for building something for example, because you need to press 2 keys, and 0 for camera movement from the player, but alphastar counts 1 for everything) and their shady APM graphics (TLO hides their agent APM graph but they didn't specify that it was because of rapid fire and essentially useless spam from TLO) make these MMR comparisons even more pointless.

edit: https://blog.usejournal.com/an-analysis-on-how-deepminds-starcraft-2-ai-s-superhuman-speed-could-be-a-band-aid-fix-for-the-1702fb8344d6 they acknowledged that TLO's apm is due to rapid fire in an update of their article apparently, a step in the right direction.

Grumbels

Netherlands7028 Posts

January 31 2019 11:49 GMT

#295

On January 31 2019 20:31 Polypoetes wrote:
BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.

I highlighted 'seven days' to contrast with the fourteen days for the agents that beat Mana. I think it's reasonable, based on their play, that these five agents are actually incredibly strong and nigh undefeatable with conventional play. But not the agent that trained for half of the time with a camera restriction and was somewhat simply defeated by Mana. And not the agents that played amateurishly against TLO, those might have been pretty good, but clearly not top tier level.

On January 31 2019 20:36 Poopi wrote:

Show nested quote +

edit: https://blog.usejournal.com/an-analysis-on-how-deepminds-starcraft-2-ai-s-superhuman-speed-could-be-a-band-aid-fix-for-the-1702fb8344d6 they acknowledged that TLO's apm is due to rapid fire in an update of their article apparently, a step in the right direction.

Yeah, this is the quote they added underneath the image that offended everyone.

CLARIFICATION (29/01/19): TLO’s APM appears higher than both AlphaStar and MaNa because of his use of rapid-fire hot-keys and use of the “remove and add to control group” key bindings. Also note that AlphaStar's effective APM bursts are sometimes higher than both players.

And this was there before:

In its games against TLO and MaNa, AlphaStar had an average APM of around 280, significantly lower than the professional players, although its actions may be more precise.

So, that seems pretty fair, but then if you look at the conclusion it says.

"These results suggest that AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface."

Which is of course pretty ridiculous, since they just had to admit that the agents had a superior click-rate and when they changed the interface it lost in an embarrassing fashion.

But in my opinion it's not that interesting to continuously litigate this point, and it's a pity that Deepmind wasn't more careful in their presentation. If they had only been a bit more cautious they wouldn't have this level of blowback.

maybenexttime

Poland5419 Posts

January 31 2019 11:59 GMT

#296

On January 31 2019 20:31 Polypoetes wrote:They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.

They bother because their goal seems to be to design an AI that beats humans by outstrategizing them in a game of incomplete information. If they have an AI that is poor at decision making in an incomplete information environment but makes up for it with insane mechanics, that completely defeats the purpose.

I think you're very wrong in your thinking that they want to make an AI that is good at SC2 in general, regardless of what it excels in.

Polypoetes

20 Posts

January 31 2019 12:55 GMT

#297

I think you are very wrong to think that making an AI good at SC2 is going to result in an AI that can trick and outmindgame humans. If you think so, you are delusional about what kind of game SC2 is. And the AI will actually show this.

I have asked this before and I still haven't really got an answer from anyone. But how did you think the AI would outsmart the human and win without outmicroing and battle decision making? How would that look? Maybe it is because of my experience and background, but I think I understand that the AI would just always seem 'lucky' and just win. Which is exactly what we saw in these games. People say the AI didn't scout for DTs and would have auto-lost vs DTs, for example. I don't think so. I think it knew. Maybe not the TLO one, but the one Mana played against, pretty sure. Same with the Pylon build in TLO's base and where there AI used all these shield batteries to keep one immortal alive. I think it didn't bug out and place a pylon there. I think it placed it there to be killed, so the opponent's stalkers don't do something else like scout the choke. Same with the Stargate it build at the ramp, then cancelled it as it was scouted, then rebuild it in the back of the base.
Another obvious thing is the AI building more probes while SC2 players think 2 a patch is enough because Blizzard put n/16 over each nexus.

Do I think humans can beat these AIs by playing against the same identical AI over and over? Probably, so in that respect it is different from chess or go. But that doesn't really matter because you can generate many agents that are all stronger and different enough they cannot be exploited by the same thing.

The AI just knows what to value because it always weights every parameter there is to weight, and it has been trained enough to give each parameter a very good weight. So in chess AlphaZero always seems to know when to be material and when to be positional. Or find moves that work for all three possible positions. Humans are irrational and are simply not able to do this. They have their tendency and playstyle and that will weaken them. It doesn't look impressive when the AI decides to give up an expansion and keep their army alive, like in the carrier game with TLO. But it is a complex calculation which humans have a hard time evaluating.

I think this holds true in general, that an AI that cannot be beaten by humans but that is making decisions that seem to be mistakes are likely not mistakes. This we saw in go where top players said Alphago made mistakes and weak moves. You can only show them to be mistakes by exploiting them and winning them. Of course this was expecially relevant in go because Alphago always knew which move to make to get a position that wins 52% of the time with 1 more point than your opponent, compared to winning only 51% of the time but now with a larger point difference. It was able to ignore at which margin it would win. So if this translates to Starcraft, then the AI rarely wins decisively, but as a result loses way less. Humans don't just want to win. They want to win convincingly. If you win but it feels like you never played any better, you aren't that satisfied.

I already said I agreed that in the future it will be interesting to make AIs that play weak like a human. But that is certainly not what Deepmind's goal is so far. They want to show they can 'solve' the problem of SC2 by winning against the best player.

To all these people that think the bot didn't out-strategize anyone; you will never be satisfied by the bots ability to strategize, because the nature of SC2 is not what you think it is. For you to get what you want you need a different game.

Grumbels

Netherlands7028 Posts

January 31 2019 15:42 GMT

#298

TLO said Deepmind told him to not make hallucinated units, because it confuses the AI. — in the Pylon show. Someone else said it doesn’t understand burrowed units due to the Blizzard api, and apparently it sort of bugged out as terran because it would lift its buildings in order to not lose and not make progress in training.

Another funny bug that TLO mentioned was that the AI learned to kinda crash the game in order to avoid losing during training.

Dazed.

Canada3301 Posts

January 31 2019 15:54 GMT

#299

So how valid are the builds used by a human against a human?/how has experimentation gone with over building workers, etc?

stalker oracle pvp seemed interesting

Dangermousecatdog

United Kingdom7084 Posts

January 31 2019 16:24 GMT

#300

Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by. Overmaking workers is an opportunity cost after 16 where you don't get your money back till 2.5 mins after you queued the worker. It makes sense if you are planning to transfer or losing units to harrass. Zerg players for example notably do and does overdrone.The point of deepmind PR stunt was not to show it can win against the best human player (mana and tlo aren't even close to the best) but to show that it could outstrategize humans, but in general it just outmuscled them with massive and accurate spikes of APM.

Prev 1 13 14 15 16 17 19 Next All

Please or register to reply.

AlphaStar AI goes 10-1 against human pros in demonstration…

Completed

Ongoing

Upcoming