DeepMind sets AlphaGo's sights on SCII - Page 10

The Bottle

242 Posts

March 30 2016 18:44 GMT

#181

On March 30 2016 02:09 Mendelfist wrote:

I'm not familiar with the script API either (or we should probably call it the map editor, because that's what I think it is), but I think it's pretty safe to assume that the API tells you if something is hidden behind the fog or war, so you don't have to map hack. The map editor seems to be very powerful and l think it's also safe to assume that there is a wealth of information available, but you don't have to use all of it. It's more than just build order anyway. The point is that the bot-scripts use this information today with pretty good results, and I don't think they cheat that much, unless you tell them to. Put this information into a DeepMind AI instead. Or why not have several. A self learning micro AI maybe, and a supervising macro AI. Maybe we should have one micro AI per unit. The possibilities are endless.

That's true, there is the information from a map editor (which unit/part of building occupies each grid coordinate, binary variable to indicate whether or not it's revealed by the fog of war for each player). I haven't thought of that. If they could think of a way to learn information by simply feeding a NN a large series of still frames from random games (like they do for the Alpha Go policy network stage), that would be feasible. Although I can't think off the bat what they could infer from that. In Go, they fed the network board position-action pairs, which you can't really do in this case, not with discrete actions. It still doesn't solve the two major problems I illustrated with coarse graining a set of actions. The first problem is solved if you just take any mapping and call it "good enough" (which might actually be ok, since the strategy in Starcraft isn't as deep as in Go) but I think you would have to work a lot harder than that if you want it to learn useful unscripted strategies that are essential for winning. And the second problem, well, I must say I have no idea how that would be solved.

Without getting too specific, I guess I can lay out the problem like this. If you asked me to write an algorithm for learning Go strategies given a large data set of played games, before even reading that paper by Deepmind, I could think of one really easily. Not a good one, mind you, not one that could come even close to performing as well as theirs. But the fact that the game is so simple (not the strategies, but the building blocks) with an extremely well defined discretized set of board states and actions, means I have to do almost no work in transforming the data in a way that can be read by a good supervised learning algorithm, or translating the problem to a metric for the algorithm. For Starcraft, I really don't know how I would go about it. I mean, I have ideas, but they seem extremely hard to implement, and it seems like I will constantly run into problems with the way I set things up, just to bring the data to the same level as Go already is right off the bat. You might say that this is for a lack of imagination on my part, and you might be right. But the scientists at Deepmind are not supermen, it's pretty easy to see that these issues will be a big hurdle for them.

Mendelfist

Sweden356 Posts

March 30 2016 20:00 GMT

#182

I have an idea for a start at least. If its' feasible I don't know because I don't work with neural nets. If we start at the micro end I would want one type of neural net for each unit type. I would train them by simulating, or by using replays, local battles. A humongous amount of battles of course. Unlike Go the outcome of the battle is very easy to see. I wouldn't even need to play it to the end. Once one side start to win I would end it and record a win or loss. I would probably start with replays like the DeepMind team did, and then continue with self play. I would now expect (if I were good at this like the DeepMind guys) that the nets would learn micro, like the nets they used in the Atari games did. They would learn to cooperate with other. Medivacs heal marines and things like that. Now we have building blocks that a macro AI can use. This AI would not need to consider each unit, but could instead give orders, like "attack there".

It would probably be useful to organize the input to the nets in some more intelligent way than just coordinates, like "distance to closest enemy unit" and things like that. I think they did similar things for the Go problem.

I have no idea what I would do next. :-)

Edit: I want to remind you that the DeepMind team did NOT succeed at first. They used neural nets. They trained them. They used them in combination with Monte Carlo. It didn't work. Well, it worked of course, but other people had already created Go AIs that played at high amateur level. The breakthrough came first with the idea of a policy net and a value net. I may be that similar breakthroughs are needed for Starcraft. Just throwing nerual nets at a problem doesn't work.

Acrofales

Spain18112 Posts

March 30 2016 20:50 GMT

#183

My approach would probably to do something like the bag of visual words approach for vision to start building a feature set. Just take a huge DB of replays and let a pattern matching algorithm find commonalities. You might need multiple levels (such as first releasing it on the maps to analyse commonalities, and then use these commonalities in the higher level) such as "send 3 marines to ramp" instead of "send 3 marines to position (2012, 3437)". But you can easily test whether such patterns that it finds make sense by figuring out whether it finds common build orders, scouting patterns, etc. Obviously a gimongous number of those patterns are going to be complete nonsense to a human observer.

Once you have your features more or less working, you can take a similar approach to the way the Go system worked. You chop games into 5 second sequences (or half a second, or 2 second, or 10 second) and consider the "actions" the same as a move in Go (of course, it's real-time so you might need new methods to deal with the changing world, and unknowns in the game world, that occur while you consider a single player's actions), given only the additional label that the game was ultimately won or lost. That should at least give you a way to start understanding whether an approach like this is a viable way to "win" at SC2.

The Bottle

242 Posts

March 30 2016 22:28 GMT

#184

On March 31 2016 05:00 Mendelfist wrote:
I have an idea for a start at least. If its' feasible I don't know because I don't work with neural nets. If we start at the micro end I would want one type of neural net for each unit type. I would train them by simulating, or by using replays, local battles. A humongous amount of battles of course. Unlike Go the outcome of the battle is very easy to see. I wouldn't even need to play it to the end. Once one side start to win I would end it and record a win or loss. I would probably start with replays like the DeepMind team did, and then continue with self play. I would now expect (if I were good at this like the DeepMind guys) that the nets would learn micro, like the nets they used in the Atari games did. They would learn to cooperate with other. Medivacs heal marines and things like that. Now we have building blocks that a macro AI can use. This AI would not need to consider each unit, but could instead give orders, like "attack there".

It would probably be useful to organize the input to the nets in some more intelligent way than just coordinates, like "distance to closest enemy unit" and things like that. I think they did similar things for the Go problem.

I have no idea what I would do next. :-)

Edit: I want to remind you that the DeepMind team did NOT succeed at first. They used neural nets. They trained them. They used them in combination with Monte Carlo. It didn't work. Well, it worked of course, but other people had already created Go AIs that played at high amateur level. The breakthrough came first with the idea of a policy net and a value net. I may be that similar breakthroughs are needed for Starcraft. Just throwing nerual nets at a problem doesn't work.

I am starting to see the disconnect between us. You are talking at a much higher level than I am. That is, you're thinking about it from a sort-of business sense, whereas I'm thinking about it as someone who has to actually write the algorithm. In your explanation, you didn't even attempt to explain what the form of an input vector or target variable would be, and what you're explicitly trying to predict. Is your input vector the board state in a local set of grid points around the unit? Is the target a sequence of actions taken by the unit itself? (That would be a very complicated target; you'd have to get into doing Markov Chain Monte Carlo simulations or something like that, and I don't know how well that would go in training an AI). Or is your input the local board state around a unit and the sequence of actions afterwards? You mentioned that you'd take, as an input, a small interval of time during a battle, for a unit, with your target being who won the battle. If you take that for the maximal possible information, this is already intractable for training a NN (for a typical NN with a single hidden layer and number of nodes equal to number of features, which should be your minimum, the complexity of its calculation goes as order(D^2) where D is your number of features). If you want to transform the data into one of less than maximal information for that battle, again this is an extremely difficult task.

In Go all my questions are extremely easy to answer. An input vector is a 361 vector of discrete variables having one of three values (0, 1, or 2 corresponding to black, white, or empty). The target variable is single categorical variable with 361 possible values (what action was taken). This would work well enough to train something at the very least, that's tractable and would give me an algorithm that can actually play Go. (In the case of training Alpha Go's policy network, their training set was more complicated; it's 361x11 as the size of the input vector, since they included variables for each square such as how many liberties there are around the set it's pointing to, how many moves were made in that square since the beginning, etc; still tractable and easy to code).

All you can do with your explanation is to find a machine learning expert, give them your basic outline, and tell them to figure out how to do this. However you can actually take my explanation of the Go learning process, find a database of Go games, and actually write the algorithm and apply it, because I gave all the relevant information needed. That's what I'm trying to figure out for Starcraft, and this is a really tough problem for all the reasons I explained.

Your example also requires somebody to go through every single training set and give it a label. You can't sample 20 million battles from the data base and expect there to be the data available of who won that battle... you can't even expect to sample 20 million random intervals and local grid spaces and expect them all to be a battle, so how would you find them automatically? You have to go through it personally, and then your training set would be on the order of hundreds (after you spend a really long time) rather than 20 million like Alpha Go's. And you have to do that for each unit type. In their case, you can easily take (and they did) 20 million randomly sampled board state-action pairs without needing anyone to figure out if it's what they want, or label the value of the target variable. (And before you tell me that "hundreds will do just fine", that's absolutely false. If you're training a NN with a large feature space of a unit's actions, your solution will be overfitted like crazy.)

Mendelfist

Sweden356 Posts

March 31 2016 05:47 GMT

#185

On March 31 2016 07:28 The Bottle wrote:
I am starting to see the disconnect between us. You are talking at a much higher level than I am. That is, you're thinking about it from a sort-of business sense, whereas I'm thinking about it as someone who has to actually write the algorithm. In your explanation, you didn't even attempt to explain what the form of an input vector or target variable would be, and what you're explicitly trying to predict.

Yes. You are correct. I don't see the problem. In the Atari example they used pixels as the input vector. It worked. How about this: The input vector could be distance and angle to each unit (within close range). You would also need health value, type of unit, and upgrade information. The target would just be the next action taken by the unit, not a sequence. I say the same as you, while I'm sure this isn't optimal why wouldn't it do *something*? They already have nets playing Atari games. I consider the basic problem already solved. I don't believe that SC2 units on a map would be more complicated than pixels in an Atari game.

However you can actually take my explanation of the Go learning process, find a database of Go games, and actually write the algorithm and apply it, because I gave all the relevant information needed. That's what I'm trying to figure out for Starcraft, and this is a really tough problem for all the reasons I explained.

Your idea for the Go problem would fail. It would play Go, but it would be no better than all the other attempts. It required an ingenious idea by DeepMind to make it actually work.

Your example also requires somebody to go through every single training set and give it a label. You can't sample 20 million battles from the data base and expect there to be the data available of who won that battle...

No. Seriously. You are trying very hard to make everything sound a lot harder than it is. This is NOT a hard problem. I could put a novice directly from the university at this for some time and he would come up with a decent idea for an automated process. I maybe even don't need human games. I could use the Expert AI and simulate a lot of battles. I only need this to get a starting point, because the rest would be done with self play.

The Bottle

242 Posts

March 31 2016 20:00 GMT

#186

On March 31 2016 14:47 Mendelfist wrote:

Show nested quote +

My input vector suggestion was better (not the sequence of actions, but the local grid around the unit, of fixed size, with variables explaining the occupation of each grid element, with variables such as unit type, percentage of full health, etc). My way allows for a consistently structured vector whose dimensionality is the same for every input. You say the target is the unit's next action. What does that look like. Is it a softmax function whose levels consist of all possible actions? That's intractable. Is it the same, but with a large set of actions being restricted? Do you take the set of all possible actions and coarse grain it, then use the softmax of that? (Again a very difficult problem).

Show nested quote +

Your idea for the Go problem would fail. It would play Go, but it would be no better than all the other attempts. It required an ingenious idea by DeepMind to make it actually work.

No, you just misinterpreted the point I was making in that passage. Remember, I said,

This would work well enough to train something at the very least, that's tractable and would give me an algorithm that can actually play Go.

The process I described would get something that plays Go, and in a way that's better than random moves, maybe as good as a novice player. I never claimed it would do anywhere nearly as well as AlphaGo. My entire point in that section was just how the simplicity of Go makes for a structure that already gives you such readily available training data sets for training a tractable model. Something that Starcraft doesn't even come close to doing. This is the main reason why training a Starcraft AI is much, much harder than a Go AI.

Show nested quote +

The reason I said that you would need to label the sets manually is because in your original quote on the idea,

If we start at the micro end I would want one type of neural net for each unit type. I would train them by simulating, or by using replays, local battles. A humongous amount of battles of course. Unlike Go the outcome of the battle is very easy to see. I wouldn't even need to play it to the end. Once one side start to win I would end it and record a win or loss.

you talked about using "local battles" and "win/loss" when referring to the outcome of the battle (rather than the game). These are both very subjective terms. There is no notion of a "battle" in the data of a Starcraft game, nor is there any such thing as a win when it comes to a battle. These are all subjective terms that human spectators give to what they're seeing, based on very ad hoc conditions. For example, if a probe picks at an SCV while it's building a ramp depot, nobody considers that a battle. But if there's 10 probes trying to pick at that SCV, and then 13 SCVs are taken off the line to fend off the probes, everyone would call that a battle. And what happens if 8 probes and 12 SCVs die? Well someone observing that battle in isolation might say the Protoss won, but in the bigger picture it could very well be the Terran, especially if this was an all-in by the Protoss and the Terran's intent wasn't so much to get a more efficient trade, but to hold off the probes as well as possible before a hellion is pumped out.

I will concede this though. You could automate the process as long as you come up with a very strict definition of what constitutes a battle, and what constitutes a win/loss of a battle. It would have to be something like "a unit takes damage from another faction within a square grid of size K - centred around the damaged unit - within which at least N supply of both factions are present" where K and N are adjustable parameters. And the victory condition could be "the player who lost less resources worth of units between the time of battle initiation (as defined above) and M seconds after no unit takes any damage." Of course there are major problems with this definition (for example an algorithm trained like this would never retreat its whole army), but it would give you something that works without manual interference. Or, instead of trying to define a "victory" for a battle, you could just use whoever won the actual game, which is what AlphaGo did. You don't need to play an entire replay for that, the metadata surely has some binary variable that you can access. In fact, I think that would actually be a lot better, because then you'd have the network trained to do things for bigger picture reasons (such as retreating an army) rather than very tunnel-vision reasons.

beg

991 Posts

March 31 2016 21:36 GMT

#187

DeepMind has inspired me to learn both programming and setting up neural networks.

Honestly, I've never had that much fun for the last decades of my life. I'm loving this so much.

All of this is pure beauty.

I came to believe that neural networks (and recent techniques like deep learning) are among the most meaningful inventions we ever created. On the same level as steam engines, computers, the internet... :D

Sorry, fanboy talk. I just wanted to get this off my chest

Tuffy

Canada23 Posts

March 31 2016 22:14 GMT

#188

Google should stream a bronze to GM Deepmind thing on twitch

DuckloadBlackra

225 Posts

April 01 2016 00:07 GMT

#189

On April 01 2016 07:14 Tuffy wrote:
Google should stream a bronze to GM Deepmind thing on twitch

That would be epic!

cheekymonkey

France1387 Posts

April 01 2016 01:21 GMT

#190

On April 01 2016 07:14 Tuffy wrote:
Google should stream a bronze to GM Deepmind thing on twitch

And then it gets flamed for "hacking", and the most despicable things will be said.

turtles

Australia360 Posts

April 01 2016 04:05 GMT

#191

No bot today in Starcraft is doing a strategy that it gained from supervised or reinforced learning. It's all scripted.

There are plenty of BW AI's which are using genetic algorithms to decide strategy and AFAIK there are ones which are also using neural nets to help shape some of it's behaviour.

All of the unit data in the game is available if you wanted to use it but for the sake of not cheating you can restrict yourself to only things which would be seen by a human player. For instance in my AI I have an array of known enemy units and an array of currently visible units. I have a function that runs every 0.x seconds (0.5 or some value like that) which calculates which units a human would be able to see. If it sees a unit for the first time it adds it to the known units array and takes a unit out when it dies. I deliberately only ever access data (such as position/health) about units which are currently visisble so as to restrict the AI in a way simillar to a human player. But I can still use the array of "known enemy units" to make strategic descissions such as what units to counter with. Even if Blizzard does not provide some kind of API which only gives the AI human knowledge it would be easy enough for Google to write their own API to interact with and release so others can verify that it isn't cheating.

I've also been able to get genetic algorithms to work for making macro descissions so that the macro is completely unscripted and decisions such as when to expand / when to drone up / when to rush / what to build / what tech paths to follow are decided on the fly. So it is not even true that SC2 bots are scripted.

This was a demo from a long time ago before I got genetic algorithms to work. (The AI in this vid is attempting to immitate builds it's copied from online replays).

Your AI sucks and the casting is unprofessional

yes I know my AI sucks and would be ranked in bronze league. it's only in the very first stages of development and this is one of the first games it had ever played. Pretty much as soon as I could get a field test to work I uploaded a replay to /r/castit for an aspiring caster to get some casting experience with.

Hopefully no one minds me pimping my own work. I just thought it was relevent to the discussion. I've had some limited experience playing with neural networks and always think they are cool.

As for what counts as cheating mechanically and what restrictions should be placed on deep mind that is going to be different for each person. But honestly I don't care. An APM cap of say 300 (very possible for a pro human) seems reasonable but otherwise what's the point of restricting it? If Google can build a bot which can showcase top tier strategic play that will be amazing! No matter what restrictions they impose upon themselves or none at all it will be cool enough for me and I garuntee I would watch that showmatch having nerdgasms through the whole event. Maybe they could even use Archon mode to let the AI vs two human pros simultaniously so it can out micro AND out strategize two humans at once :D

Mendelfist

Sweden356 Posts

April 01 2016 08:25 GMT

#192

On April 01 2016 05:00 The Bottle wrote:
My input vector suggestion was better

Fine. Lets go with you input vector. You apparently want me to go inte details so you can get something to shoot down. It's a pointless excercise. The point with my example and this:
"Your idea for the Go problem would fail."
Was that we can both come up with AIs that can play badly, and training a neural net to micro a single SC2 unit in a half-intelligent way can't be that much harder than training a net to play an Atari game. I will readily agree that there probably is a lot more details in making an SC2 AI than a Go AI, but that's beside the point.

The goal is however not to make an AI that plays badly. This has already been done, both for Go and SC2. I think a good indication for how hard a problem is is to see how far you can get with a moderate effort. How far would you get by trying to make a Go AI from scratch? You wouldn't get anywhere. Trust me. This was an unsolved problem for years. There was no solution in sight, until the Monte Carlo engines came. The AIs STILL played like amateurs. Another revolutionary idea was required, this time by the DeepMind team. No one else in the world has managed this. This is how hard Go is. In hindsight you can say that the solution isn't very complex, and therefore not very hard (compared to SC2) but that's dishonest.

How hard is it to make an SC2 AI then? With moderate effort you get pretty far. You need no revolutionary ideas. This tells me that SC2 is an easier problem. Yes, I'm extrapolating. Then we can argue to days end about how hard it is to select training data for nets. Fine. Don't use nets for micro then. Handcode it. I said from the beginning that the problem of making an SC2 AI is only as hard as you make it, and therefore not very interesting.

TheFish7

United States2824 Posts

April 02 2016 00:59 GMT

#193

On March 31 2016 03:35 Naracs_Duc wrote:
Would it be more interesting if we have DeepMind get setup, then have a pro and programmer decide what "strategy" deepmind should execute perfectly and have them face off another deepmind + pro combo?

Flash with DeepMind micro vs Jaedong with DeepMind micro?

Holy crap. Archon mode with DeepMind.

DuckloadBlackra

225 Posts

April 02 2016 01:11 GMT

#194

On April 01 2016 13:05 turtles wrote:

Show nested quote +

There are plenty of BW AI's which are using genetic algorithms to decide strategy and AFAIK there are ones which are also using neural nets to help shape some of it's behaviour.

[

Why are all these AI's in BW? Where are the SC2 AIs? The only one I know of is Green Tea AI...

turtles

Australia360 Posts

April 02 2016 01:46 GMT

#195

Why are all these AI's in BW? Where are the SC2 AIs?

Part of it is history. There is a healthy and competitive scene for BW AI's which has been around long enough to foster a community.

Mostly it is because there is an interface available for 3rd party tools to interact with BW. Blizzard obviously does not want to cooperate with 3rd party tools which would make cheating and hacking easier on the ladder. So basically BW AI's exist because Blizzard allows them to.

It is possible to create an AI internally through their scripting tools but there are many restrictions. As powerful as the galaxy scripting language is it does not compare to any of the usual programming languages. A lot of AI teams are university teams which would have to worry about liscencing for SC2 as well.

LetaBot

Netherlands557 Posts

April 02 2016 12:12 GMT

#196

On April 02 2016 10:11 DuckloadBlackra wrote:

Show nested quote +

Why are all these AI's in BW? Where are the SC2 AIs? The only one I know of is Green Tea AI...

Making custom StarCraft 2 AI violates the TOS, and the scripting language in SC2 doesn't have file I/O which some bots rely on to store/retrieve megabytes of data.