Deep Learning Starcraft

tztztz

Germany314 Posts

November 06 2016 23:04 GMT

As excited as I was about the announcement, as disappointed was I about the fact that they left so many fundamental questions unanswered. They talked about APM limitation and incomplete information and stuff like that, but they didn't talk about the more basic challanges and how they plan to approach them. One of the basic questions I had was: how can you deep learn a game without fixed rules? And how useful is that?

They said an AI that can play Starcraft will need to have the ability to make long-term game plans. But your game plan, at the beginnig of the game as well as during the game, is heavily dependent on the map and version of the game you are playing. One of the most important skills of a human SC player is to look at a new map or a new balance patch and at least have a basic idea of how this will impact a certain match up. But this is impossible for an AI, at least for the type of AI DeepMind developed till now.

All that DeepMind did with AlphaGo was massivly reducing the searchspace of Go with advanced policy and value networks, fed with millions of Go games. AlphaGo did not somehow gain awareness about complex Go concepts or something like that, it just used its deep learning to be very efficient regarding move selection and board evaluation. But its not "intelligent", it's still a nonthinking machine that just gets better at Go.

Now the problem with Starcraft 2 is: Starcraft 2 is not A game, it's many games. It's as many games as there are combinations of versions and maps. So the question is: how do you feed it's networks?

Let's imagine for a second we had millions of replays of the current version of the game, but only on maps like blistering sands and scrap station. If a network would deep learn only from those replays, it would never be able to learn the game to compete with a pro player on the maps of the current map pool.

What might sound like an extreme example is true for every change of the game. If you would fed the network with replays of any variety of versions and maps, it would only learn the most basic things that are true on those past versions and maps, but might become obsolete on future versions/maps.

If you ever plan to beat a pro player in a showmatch, you have to determine the version and map pool many months in advance, than somehow produce enough replays on those maps to train the networks (which might as well be impossible for starcraft), just to trash your AI with the next balance patch after the showmatch, so it's value network would become so significantly inaccurate (if not completly useless), that it will have to deep learn again what game plans and stategies and unit compositions are good/bad.

Of course it could very well be that DeepMind comes up with completlly different things but that's what I'm saying. They didn't talk about anything at all.

CodeWarriors

1 Post

November 07 2016 00:04 GMT

I agree with your skepticism of Alpha Go. I would also add that map darkness creates almost infinite possibilities of proxy combinations. Alpha Go's direction is hard to say without its limitations being public knowledge, the question whether it will learn about proxies or use its computer-precise micro still remains. It would be interesting if Alpha Go was used to play against players instead of random PC's.

snakeeyez

United States1231 Posts

November 07 2016 00:04 GMT

Yeah its going to be interesting to see how it plays out. In my mind if someone adds a new map never seen before the AI would need to be able to function very competently on it based on previous knowledge and predictions just like any pro player would. Its ability has to be able to transfer from map to map.
They would probably have trouble talking about this with specifics because its not built yet. They might not even know the approach yet. How the whole thing works might not be like GO at all. To me a game like civilization 6 would be more ideal than a game where apm and things count for so much

Chocolate

United States2350 Posts

November 07 2016 00:14 GMT

There's this concept in medical imaging called "segmentation" where you use heuristics, context learning, etc. to segment medical images into component parts to be able to automatically get metrics on organ component size, abnormalities, etc.

This will be even easier with something like a starcraft map because it doesn't rely on guesswork like with images.

I imagine a good AI for starcraft would have significant levels of preprocessing of input information outside of the "learning" portion. Preprocessing the map would be extremely useful, both in terms of "mapping" it and in learning in specific map-related contexts (e.g. attacking up ramps). By doing so the AI could more easily develop an understanding of map concepts so that upon preprocessing the layout of a new map, it could draw on its "segmentation" and learning within segmented contexts to get a feel more maps it has never seen before.

Chocolate

United States2350 Posts

November 07 2016 00:19 GMT

Also I'm sorry for double posting but I'm taking a machine learning/AI project class this next semester and I really hope I'll be able to do something related to starcraft AI. My university in particular has some professors working on AI in the context of imaging and economics, both of which I could perhaps leverage in my project. Really looking forward to see what other people do with this project!

Nakajin

Canada8989 Posts

November 07 2016 00:24 GMT

Very interesting to see how they will approach it indeed.

I still think the APM question is gonna be a big one, because if you have perfect control I feel like it will realy reduce the complexity of the game. Lets say Deepmind can worker rush every game and win at the start, at least against 99% of the players, then the recherche for the optimal way of playing his gonna be a lot simpler and way less interesting because the computer can just limit the amount of unknown information.

tztztz

Germany314 Posts

November 07 2016 00:45 GMT

On November 07 2016 09:14 Chocolate wrote:
There's this concept in medical imaging called "segmentation" where you use heuristics, context learning, etc. to segment medical images into component parts to be able to automatically get metrics on organ component size, abnormalities, etc.

This will be even easier with something like a starcraft map because it doesn't rely on guesswork like with images.

I imagine a good AI for starcraft would have significant levels of preprocessing of input information outside of the "learning" portion. Preprocessing the map would be extremely useful, both in terms of "mapping" it and in learning in specific map-related contexts (e.g. attacking up ramps). By doing so the AI could more easily develop an understanding of map concepts so that upon preprocessing the layout of a new map, it could draw on its "segmentation" and learning within segmented contexts to get a feel more maps it has never seen before.

Yes, a ramp is mostly the same on every map and the AI can reliably rely on his policy network regarding ramp specific choices. But thats it. His values will only be useful regarding choices that are "right" on the maps the networks were trained on, but useless at really map specific choices.

I get the concept of segmentation, but I'm talking about things you can't segment. I'm talking about expansion timings and expansion patterns, pushout timings, those kinds of things, you know. What expansion is hard to take or hard to attack? Timing attack out of 3 base? Is it crucial to deny the opponents 4th base? Most high level game plans and decisions are really map dependent.

Chocolate

United States2350 Posts

November 07 2016 01:19 GMT

I'd argue that segmentation could actually be quite helpful in that area. Segmenting an expansion: use choke points, a distance heuristic, etc. to separate it from the rest of the map. Then calculate the time it takes for your slowest combat unit (perhaps depending on time of game) to travel from the edge to a different segmented area e.g. your opponents expansion.

The "goodness" of an expansion will definitely be harder to solve because I think this is prone to overfitting. I'd actually posit, though, that this isn't as big of a problem as you think. In most maps there isn't a huge amount of choice in where to place your expansions until the late game.

alexanderzero

United States659 Posts

November 07 2016 09:37 GMT

The issues you bring up are exactly the reasons why Deepmind wants to develop an AI for Starcraft. I'm surprised that you almost seem upset with them for taking on this task.

One of the basic questions I had was: how can you deep learn a game without fixed rules?

A neural network is designed to optimize some output based on input. It has a function that represents how 'good' the output is, and tends to produce those sorts of outputs. With an Atari game, this is score. I'm not sure what you mean when you say that SC2 doesn't have fixed rules.

AlphaGo used two networks. One for guessing the opponents next move, and one for estimating the percent chance of victory for any particular state of the board.

I don't think anyone here can answer your questions about how Deepmind can achieve success in writing AI's to play RTS games. Your best bet would be to read their research papers yourself and guess at how they might use these new inventions to write an AI for Starcraft.

tztztz

Germany314 Posts

November 07 2016 16:55 GMT

#10

On November 07 2016 18:37 alexanderzero wrote:
Your best bet would be to read their research papers yourself

I did. Have you?

Like Chess and many other turn based board games with perfekt information, Go is a game with a two dimensional search space: breadth b and depth d. Breadth is the number of possible moves per position, and depth is the game lenght. The search space is given by b^d. For games like Chess and Go this number is too big to bruteforce a value function.

There are two ways to reduce this search space. First, by reducing the depth through position evaluation. This is nothing new and is exactly how it's done in any chess engine (admittedly it is a lot easier to evaluate a position in Chess than in Go because of parameters like material and stuff). The second way is reducing the breadth, and this was the more unique thing about AlphaGo.

Yes, AlphaGo uses two networks. One is called the policy network and its for outputting a probability distribution over the board. After that, it would only branch in the most probable trees and therefore reduce the breadth from all possible moves to only a fraction of that. The network "learned" the values of this probability distribution by a reward function. The outcome of a game (win or lose) would afterwards update the values for every move of that game.

Now the thing with Starcraft is, how do train a network when the rules of the game change?

Lets say we train AI with millions of replays of the current version of the game with the current map pool. If we do, this network will "learn" to build marauders agains roaches or to build overseers agains DTs. This means those "moves" will have resulted in more wins in those situation und therefore have higher probabilities.

And now lets imagine Blizzard releases a balance patch were overseers lose the detector ability and marauders lose there plus damage versus armored. A human player now figures out that overseers are not good against dts anymore and that marauders are weaker against raoches, just by looking at the patch. But a neural network doesn't. It will still build marauder against roaches like before, because it doesn't care about stats and attributes und counters and stuff. In deep learning nothing is scripted, there is no "do this if that".

A neural network doesn't "know" that you have to build detection against invisible units. It doesn't build overseers against DTs because of detection, but because it proved to be successful. The information of the overseer losing the detector ability doesn't mean anything to it. The move "build an Overseer if you spot DTs" still has a high value in its policy network. And it will stay this way until it has lost enough game against DTs. Than the value will get so low that it will choose another move instead.

This is what i mean with no fixed rules. A balance patch is a change of the game. If you change the game the policy network of your AI becomes useless.

tztztz

Germany314 Posts

November 07 2016 17:10 GMT

#11

On November 07 2016 10:19 Chocolate wrote:
I'd argue that segmentation could actually be quite helpful in that area. Segmenting an expansion: use choke points, a distance heuristic, etc. to separate it from the rest of the map. Then calculate the time it takes for your slowest combat unit (perhaps depending on time of game) to travel from the edge to a different segmented area e.g. your opponents expansion.

The "goodness" of an expansion will definitely be harder to solve because I think this is prone to overfitting. I'd actually posit, though, that this isn't as big of a problem as you think. In most maps there isn't a huge amount of choice in where to place your expansions until the late game.

Ok I might be really bad at explanig what I mean in general terms so I'm gonna make an extreme example:

Lets say you train your AI with millions of replays only on maps of the current map pool. And after that, you let it play against a human player on an island map where your starting location is an island and every expansion is an island. A human player at least has a basic idea of what to change in his play and how to approach a matchup. But the AI, which was trained for months and would have crushed any player on the maps it learned, is completly hopelessly lost. It wont break down or anything, it will still play, but it wont know whats the right thing to do because it never saw someone build a warp prism out of one base and be successful. The things that it's suppose to do in this situation are all "bad" things for him, things with very low values in it's policy network.

I know this is extreme, but it's just to illustrate whats true for every two maps, because maps are different. No matter how similar they are, they are different.

imp42

398 Posts

November 07 2016 18:12 GMT

#12

On November 08 2016 02:10 tztztz wrote:

Show nested quote +

Basically you are stating the problem of over-fitting an AI to a specific map, aren't you?
Given enough variance in the test maps I would assume an AI to figure out correlations based on rush distance, choke sizes, etc.

Regarding the changing rules: This is one of several reasons why I don't really understand the choice of picking sc2 over BW, except for marketing/publicity reasons.
Also, my advice @Chocolate is to at least check out bwapi before deciding to go with sc2 in an AI term project.

KelvaroN

Finland33 Posts

November 07 2016 19:25 GMT

#13

People people, BW players back in the days became pros by just playing Lost Temple, remember? Main skills can be generalized to any starcraft context independent of the map.

imp42

398 Posts

November 07 2016 19:31 GMT

#14

On November 08 2016 04:25 KelvaroN wrote:
People people, BW players back in the days became pros by just playing Lost Temple, remember? Main skills can be generalized to any starcraft context independent of the map.

It probably took those pros a while to figure out a new map though, so we should let an AI have some test runs as well

Gere

Germany55 Posts

November 07 2016 19:58 GMT

#15

@tztztz: What's your point? What do you want? Deepmind wants to work on making progress on one of the hardest challenges in AI and you complain that they don't come up with solutions on the spot? Anyone being sure of solutions after no thought is a fool. You come up with multiple bad ways to approach the problem. Yes, as you say, they won't work and yes, Deepmind will do a completely different way. You misjudge what a good approach with transfer capabilities could be. Let them advance AI with some steps.

I also think Go is an "easy" exercise compared to Starcraft and deep learning will hit some hard limits with Starcraft. But reasonable Starcraft AIs exist and if anyone can make a significant combination with deep learning architectures, it would be Deepmind.

Please or register to reply.

Deep Learning Starcraft

Completed

Ongoing

Upcoming