AlphaStar AI goes 10-1 against human pros in demonstration…

imp42

398 Posts

January 25 2019 15:47 GMT

#161

To sum up most of the debate so far:

- was the setup "fair" or not?
- did the AI play well or not?

In the chess community we were blown away by the games AlphaZero played about a year ago. We had never seen anything like it. However, it was also very disappointing to realize Deepmind wasn't interested in chess at all. It was nothing but a playing field to demonstrate the capabilities of their neural net. As soon as the experiment concluded successfully, the Deepmind team moved on and left the chess world wondering what could have been, if they had access. Imagine somebody allowing you to peek into a treasure chest full of amazing content, but then closes it and stores it away, not to be opened again.

Realistically we are in the same situation with StarCraft of course. Once Deepmind "beats the game" they will move on without missing a beat. But

"GOOD NEWS EVERYONE"

(Prof. Farnsworth)

Independent of the two questions above (fair setup? good play?) I think it's safe to say StarCraft still holds plenty of challenges for the AI. And even if at some point the peak of strategic depth has been reached in all matchups on arbitrary maps with a configurable average / max APM parameter, there is still the world of custom games to explore.

exciting!

counting

11 Posts

January 25 2019 15:51 GMT

#162

On January 26 2019 00:20 imp42 wrote:

Show nested quote +

On January 25 2019 21:14 Grumbels wrote:

On January 25 2019 13:00 counting wrote:

On January 25 2019 12:45 imp42 wrote:

On January 25 2019 12:14 vesicular wrote:

On January 25 2019 09:40 TheDougler wrote:
You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.

[...]
The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.

A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).

Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.

The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.

This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).

I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.

Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.

If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:

Game 1: Agent "Mass Blinker" - loss
Game 2: Agent "Proxy Gates" - win
...

and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).

But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.

There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
[image loading]

In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)

BronzeKnee

United States5217 Posts

January 25 2019 16:05 GMT

#163

I had been a proponent of oversaturating minerals for a long time after I saw Hister do it long ago. I gave up on it later, but I think I will go back to it.

There is so much to learn from an AI that doesn't care about norms or customs and does whatever it thinks gives it the best chance to win. As much as I try to avoid the pressure of following whatever everyone else does, its get me.

Deepmind has a very long way to go to beat Starcraft. A human can regularly take down the cheating SC2 AI with ease, I think Deepmind loses to the cheating AI at this point. It's micro isn't good enough to overcome the massive economic advantage the cheating AI gets.

And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.

imp42

398 Posts

January 25 2019 16:20 GMT

#164

On January 26 2019 01:05 BronzeKnee wrote:
[...]
And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.

My honest opinion? Because TLO and Mana were pulled aside in a quiet moment and politely asked to please tend towards macro games. We spotted some weakness in the defense, so trying to exploit that via a cannon rush would be a pretty straight-forward move...

just speculating though...

BronzeKnee

United States5217 Posts

January 25 2019 16:22 GMT

#165

People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?

PvP only?

Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.

imp42

398 Posts

January 25 2019 16:24 GMT

#166

On January 26 2019 00:51 counting wrote:

Show nested quote +

On January 26 2019 00:20 imp42 wrote:

On January 25 2019 21:14 Grumbels wrote:

On January 25 2019 13:00 counting wrote:

On January 25 2019 12:45 imp42 wrote:

On January 25 2019 12:14 vesicular wrote:

There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
[image loading]

yeah that's not what I was talking about. Basically, if you want the decisions to depend on history, you need to feed the history. And if you want pure self-learning you need to feed all of it. I.e. the complete history of the boX match.

imp42

398 Posts

January 25 2019 16:28 GMT

#167

On January 26 2019 01:22 BronzeKnee wrote:
People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?

PvP only?

Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.

From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.

Maenander

Germany4926 Posts

January 25 2019 16:34 GMT

#168

On January 26 2019 01:28 imp42 wrote:

Show nested quote +

From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.

I don't agree, the other matchups could present very different challenges.

Also, oversaturate your minerals human noobs! That was embarassing to see, how could we have been so blind. The value is not in the minerals but in the ability to absorb probe losses and to expand at full capacity.

klipik12

United States241 Posts

January 25 2019 16:56 GMT

#169

So is this project over, or will it continue?

jalstar

United States8198 Posts

January 25 2019 17:00 GMT

#170

On January 26 2019 01:24 imp42 wrote:

Show nested quote +

On January 26 2019 00:51 counting wrote:

On January 26 2019 00:20 imp42 wrote:

On January 25 2019 21:14 Grumbels wrote:

On January 25 2019 13:00 counting wrote:

On January 25 2019 12:45 imp42 wrote:

On January 25 2019 12:14 vesicular wrote:

There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
[image loading]

I'd bet truly random strategy selection (over the set of good enough strategies) is unbeatable in BoX

Maenander

Germany4926 Posts

January 25 2019 17:06 GMT

#171

On January 26 2019 02:00 jalstar wrote:

Show nested quote +

On January 26 2019 01:24 imp42 wrote:

On January 26 2019 00:51 counting wrote:

On January 26 2019 00:20 imp42 wrote:

On January 25 2019 21:14 Grumbels wrote:

On January 25 2019 13:00 counting wrote:

On January 25 2019 12:45 imp42 wrote:

On January 25 2019 12:14 vesicular wrote:

There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
[image loading]

I'd bet truly random strategy selection (over the set of good enough strategies) is unbeatable in BoX

Especially because the agents don't have specific tendencies like human players. Unlike a human, AlphaStar can switch to a very different agent executing a very different strategy or even switch races without loss of performance.

nimdil

Poland3748 Posts

January 25 2019 17:21 GMT

#172

On January 26 2019 01:34 Maenander wrote:

Show nested quote +

It's interesting whether it applies to humans as well. This tactic might be more applicable to agents with superior unit control and spending regime.
But then again - maybe it's a new meta.

BronzeKnee

United States5217 Posts

January 25 2019 18:00 GMT

#173

On January 26 2019 01:28 imp42 wrote:

Show nested quote +

From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.

6 times? My man... that isn't how this works. That isn't how any of this works. It far greater than 6 times because of the exponential increase of variables.

We used to talk a lot about Starsense. The ability to know something was happening without having any clue it was happening. The AI has no Starsense, clearly. Mana wrecked with a two Immortal drop that crippled it.

The AI is going to always favor aggression for the same reason I do: If you are attacking you are controlling the tempo of the game. That makes it more likely it will win because it reduces the number of variables in an interaction. Defending takes on many more forms.

Attack the AI and it will fall apart. Hold it's attacks (which is harder because defending takes on more variables and it has near perfect micro), and it will fall apart.

Poopi

France12790 Posts

January 25 2019 18:23 GMT

#174

I think it’s hard to judge from just one game if it is that vulnerable to opponents attacking.
Sad thing from their AMA thread is that they will stop using this current version of their agent, without even playing more games in camera interface to try to understand its flaws better.

MyLovelyLurker

France756 Posts

January 25 2019 18:33 GMT

#175

On January 26 2019 01:05 BronzeKnee wrote:
I had been a proponent of oversaturating minerals for a long time after I saw Hister do it long ago. I gave up on it later, but I think I will go back to it.

There is so much to learn from an AI that doesn't care about norms or customs and does whatever it thinks gives it the best chance to win. As much as I try to avoid the pressure of following whatever everyone else does, its get me.

Deepmind has a very long way to go to beat Starcraft. A human can regularly take down the cheating SC2 AI with ease, I think Deepmind loses to the cheating AI at this point. It's micro isn't good enough to overcome the massive economic advantage the cheating AI gets.

And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.

Lol, my man, Tencent did that in September already, with a bot 'only' circa Platinum lvl, that would get obliterated by AlphaStar : arxiv.org

Respectfully, it might be a good idea to do some research here...

MyLovelyLurker

France756 Posts

January 25 2019 18:36 GMT

#176

On January 26 2019 03:23 Poopi wrote:
I think it’s hard to judge from just one game if it is that vulnerable to opponents attacking.
Sad thing from their AMA thread is that they will stop using this current version of their agent, without even playing more games in camera interface to try to understand its flaws better.

Don't worry, they will just look for a more planning oriented approach, and iterate on it. Given the resources they've already invested in this, there's zero chance they take this as a sunk cost, and as they have followed the AlphaGo playbook to the letter so far (with Mana as Fan Hui), this is definitely going all the way to AlphaStar vs (Serral, Maru).

Nakajin

Canada8989 Posts

January 25 2019 18:38 GMT

#177

On January 26 2019 02:21 nimdil wrote:

Show nested quote +

It's interesting whether it applies to humans as well. This tactic might be more applicable to agents with superior unit control and spending regime.
But then again - maybe it's a new meta.

I wonder if overbuilding probes is for anticipate probes loss or if it's more something along the line of balancing the risk-reward of expanding.
If you get your expand denied you will have a better incomed if you oversaturate so maybe it's safer against agressive build. Like if you get 1 based and instead of getting the nexus you have 8 more probe in the main you will be ahead in econnomy against the other player witb having only 1 base to defend and when you do get the expand the reward will be imediate so it's less of a risk.

counting

11 Posts

January 25 2019 18:44 GMT

#178

On January 26 2019 01:24 imp42 wrote:

Show nested quote +

On January 26 2019 00:51 counting wrote:

On January 26 2019 00:20 imp42 wrote:

On January 25 2019 21:14 Grumbels wrote:

On January 25 2019 13:00 counting wrote:

On January 25 2019 12:45 imp42 wrote:

On January 25 2019 12:14 vesicular wrote:

There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
[image loading]

I got what you said, there used to be a kind of reinforcement learning distinction in whether to completely recorded all the past actions and behaviors in achieves or always trained from the very beginning independently only treating correlation between scenarios with correlation probability. The problem of utilizing archives is usually about training instability issue, as well as the scaling problem when the combination of possible samples become less representative if we introduce longer and longer training samples to RNNs. (actually input dimention of LSTM or RNNS doesn't have to change, just the training samples become longer, and most likely more hidden layers/nodes for memory)

In essence, LSTM should be able to treat a series of games input as one super long sequences and find patterns across games, hence agents' LSTM initial input will not be initialized when a new game started, but retain the precious iteration's final hidden output as its new initial hidden input. (think of it like the network remembers its final "mental states" after a match and bring it on to the next match). But the first problem will be the supervised learning training examples don't have enough of these kindd of particular player to particular player match-to-match games sequences available, hence the starting the policy networks will not be able to supervised learn them. Hence unless the whole structure is redesigned akin to AlphaGo Zero without any prior supervised learning, or somehow someone painstakingly manually piece together enough high level consecutive training sequence of Go5, Go7 from many high level players in all kinds of different strategies combinations used as training examples.

MyLovelyLurker

France756 Posts

January 25 2019 18:44 GMT

#179

On January 26 2019 03:38 Nakajin wrote:

Show nested quote +

I think it's the latter. I've gone and looked at minerals lost in the final game where it was sacrificing 2 oracles each time to one-shot Mana's workers. The mineral trade looks even in an instant, but killing workers makes AS better off in the long run, so it takes a very clinical and quantified view of future income. Implicitly embedded in the reinforcement learning algorithm is a very strong Monte-Carlo optimization engine. So once agents are fully trained (in 3 to 6 months IMHO), if they still do whacko stuff like oversaturating minerals, that is probably not a local optimum but the actual thing to copy for us as humans.

imp42

398 Posts

January 25 2019 20:00 GMT

#180

On January 26 2019 03:00 BronzeKnee wrote:

Show nested quote +

6 times? My man... that isn't how this works. That isn't how any of this works. It far greater than 6 times because of the exponential increase of variables.

Well, to expand from 1 mirror matchup to 3 you most definitely just need ~3 times the resources.

I admit in a non-mirror you have roughly twice as may different unit types. But whether that actually increases the number of variables exponentially depends on the implementation of the NN. For example, if you feed raw pixels to a fully connected NN, then it doesn't matter how many unit types you identify on a higher level of abstraction. You will have the same amount of nodes and edges in the neural net.

this is loosely related: https://www.quora.com/How-does-the-training-time-for-a-convolutional-neural-network-scale-with-input-image-size

I am a bit unsure who of us doesn't understand how any of this works at the moment... maybe both

Prev 1 7 8 9 10 11 19 Next All

Please or register to reply.

AlphaStar AI goes 10-1 against human pros in demonstration…

Completed

Ongoing

Upcoming