Flash on DeepMind: "I think I can win" - Page 8

Haukinger

Germany131 Posts

March 11 2016 08:28 GMT

#141

You can have that today with human players if you remove the mechanical stress, leaving more room for actual thinking.

That's the core problem why starcraft is boring to play and boring to watch for most people: mechanics play an overwhelming part in winning. You can get to GM just by cannon rushing or 4-gating mechanically well, and I'm sure a bot would win GSL just by worker rush. That means players have to completely know their maps and chose a more or less static build orders because there's no time in the game to do think.

sertas

Sweden815 Posts

March 11 2016 08:47 GMT

#142

you cant get gm by cannon rushing or 4 gating wtf not in this expansion at least

heqat

Switzerland96 Posts

March 11 2016 09:18 GMT

#143

The question is at what level the AI can access the game. Normally in AI research, the software cannot access the internal state of the game (or 3D scene). For instance it should not be able to just access the position of the (visible) units. So for a true test, the AI should also move the camera, trying to figure out what it sees on screen with a chance to miss some informations (which happend all the time to human in SC2). If it can simply access game state like the current SC2 AI, this is not a true test from my point of view.

NiHiLuSsc2

United States50 Posts

March 11 2016 10:28 GMT

#144

if anyone can do it its God himself

sakurazawanamo

Korea (South)1 Post

March 11 2016 10:40 GMT

#145

i wonder how an AI will react to fakes and misdirection in builds

DwD

Sweden8621 Posts

March 11 2016 10:42 GMT

#146

After seeing some of those micro bots with like 50.000 APM(or whatever) in the SC2 map editor I'm pretty sure Flash would get smoked pretty hard.

coolprogrammingstuff

906 Posts

March 11 2016 11:17 GMT

#147

why are people talking about insane micro? Give it some unique quirks, perhaps, but "hurrr insane micro ai" is fucking stupid -- completely defeats the point if you give it perfect mechanics where it macros exactly on point, and micros 10 stacks of 11 mutas at once - pointless and stupid. I'm cringing reading comments discussing the micro mechanics and it being unstoppable.

Make it play like a human. Don't restrict the APM - It's not how algorithms operate. They'd had EAPM close to 100%. Restrict that instead, to a human level. Make it so it's actually a contest in natural ability - see if it can micro logically better from splitting, positioning, and general human-tier control, rather than by just maneuvering ridiculously. Make it execute build orders, rather than a 2 hatch muta all in every game with impossible micro. Making it play like a human and contest in a way that's human-esque is what makes it interesting, otherwise no human can stop even a perfect 4pool.

Besides from that, I think that at this stage it'd be close if it was to go up against Flash shortly, with Flash pulling ahead. However, if Flash was at his peak in 2 years, hypothetically, as mentioned before, if the bot was just fed brood war, I think he'd have no chance. And it'd be fascinating to watch how it plays.

Dromar

United States2145 Posts

March 11 2016 11:26 GMT

#148

On March 11 2016 18:18 heqat wrote:
The question is at what level the AI can access the game. Normally in AI research, the software cannot access the internal state of the game (or 3D scene). For instance it should not be able to just access the position of the (visible) units. So for a true test, the AI should also move the camera, trying to figure out what it sees on screen with a chance to miss some informations (which happend all the time to human in SC2). If it can simply access game state like the current SC2 AI, this is not a true test from my point of view.

Well the game played will be Brood War, but even if it were SC2, the AI could control everything without moving the screen. It could simply hotkey every unit as it is produced, remember its location, and from that hotkey select and give commands to each individual unit. Isn't there also a "Select Army" button?

heqat

Switzerland96 Posts

March 11 2016 11:49 GMT

#149

On March 11 2016 20:26 Dromar wrote:

Show nested quote +

Sorry yes, it would be BW. Regarding your point, what I mean is that for a perfect test, the AI should use the same user-interface than a human. It should take decisions using a flat 2D picture and control the game using hotkey, scrolling, etc.(don't need a physical robot, just wire the data to the AI software). In regular game AI (such as SC2 AI), the software has access to the complete game internal state and can take decision at every step by simply checking unit positions, states, etc. with some specific rules to avoid cheating (like preveting the AI to access non-visible units).

Now I guess it would become much more difficult for the AI if it has to play from the exact same user-interface than a human (which makes sens for a true SC human/machine match, contrary to Go/Chess where user-interface does not change the result of the performance). It would require some very advanded real time visual recognition algorithm for instance.

ETisME

12083 Posts

March 11 2016 11:52 GMT

#150

after reading some interviews, I think deepmind team just used starcraft as a point of reference because it is famous and strategy game, not aware that mechanics plays a huge part of the game.

Anyway I really don't think it is going to pose any challenge for the AI.
I am not an expert but certainly it can just scout every once awhile and deduct what is the most possible and threatening strategy/timing coming in and then win by perfect attention to everything, perfect micro, perfect reactionary decision etc.

Each harass/engagement just limits more and more uncertainty for the AI.

rockslave

Brazil318 Posts

March 11 2016 12:57 GMT

#151

On March 11 2016 13:01 ZAiNs wrote:
Deep learning needs a dataset for the AI to be trained though. For AlphaGo they trained two separate networks (one designed to predict the next move, and the other designed to predict the final winner) on 30 million discrete moves from games played by human experts. After that it trained itself by actually playing Go against itself a ridiculous number of times.

A Go game can be perfectly modelled by simple list of positions describing which square had a stone placed on it each turn, it's going to be very hard to get enough useful data (replays) to significantly help with the training. And without the initial training it's going to have to learn mostly by playing against itself which will be difficult because of the ridiculous number of game states. At least that's my understanding of things, I could be wrong, but it seems to be a lot harder than Go.

That is a fair point. But I think you can break a game in several mini-games, having a little algorithm to guess who has the advantage, based on material advantage, positioning, etc (just as you said they did for Go).

While Go can be perfectly modelled, the number of possible states is intractable. Just as you need heuristics to cut the search tree in table games, you can also "cheat" in SC by having sort of a hash function on states. That's what I meant by parametrization earlier: a lot of the work involved in building neural nets is choosing which are the inputs.

By the way: I don't really know anything about what I'm saying. I just played with machine learning, never studied it seriously.

Edit: if anyone is interested, here's a great free book about it: http://neuralnetworksanddeeplearning.com . You gotta love mathematics, though.

Vasoline73

United States7675 Posts

March 11 2016 13:14 GMT

#152

People severely underestimating the difficulty of achieving an effective AI for BW. As someone has pointed out it's not going to have access to the game state beyond seeing a 800x600 2D image in real time. It may see dots on the mini-map but it's not going to know what it is or how to properly react without moving it's "screen" there. Obviously it will have speed but...

...stuff like, how does the AI react to a map (building placement, etc) it's never played on before? What if there's no immediate natural and it typically fast expands? When it sends it's scout out onto the map, goes down the ramp and sees no natural... does it start looking for one? Scout for the enemy first? Does it change it's build order to a one base play when it may just have not scouted a expansion spot yet? The clock is ticking and supply is going up. How does it play on Monty Hall or some crazy shit for the first time?

Etc etc. That stuff will make an "all around" BW AI that beats top humans on the level chess engines do, or as AlphaGo is very likely to continue doing, very difficult.

Now if they make the AI just a one base BBS or 4 pool + drones killing machine on standard maps it recognizes then I see success being plausible quickly... probably now even, but I don't think google is trying to win that way. Guessing they have loftier ideas for their AI and what they want it to symbolize/accomplish.

All that said, it's more than possible and it would be cool to see it happen someday sooner than expected.

reminisce12

Australia318 Posts

March 11 2016 13:35 GMT

#153

perfect macro and micro aint gonna matter when flash siege tanks reign fire down on ya

MyLovelyLurker

France730 Posts

March 11 2016 13:47 GMT

#154

I've been watching Broodwar for 15+ years, and programming reinforcement learning engines for a few. Here are a few thoughts on why the specific DeepMind approach is going to be very hard for SC, although it might well happen in around 10 years time :

1. We are assuming the AI teaches itself to play only from a realtime view of the pixels on the screen, and knows nothing about any score at all - *there is no score in Starcraft*, unlike in the arcade Atari 2600 games that have been mostly arcade games, with a clear numerical objective ( the score ) to be maximized by the playing agent. The act of playing thereby becomes a calculus problem ( to first order, set the gradients of the score function to zero ). Not impossible but harder in Starcraft.

2. Starcraft II is an imperfect information game, as opposed to chess or go where the board contains the whole information available to both players. Whilst it is possible to do reinforcement learning in that setting, it is a relatively new field and adds to the difficulty - articles are being published now on the subject.

3. The 60 or 120 APM barrier will not be broken easily. Right now in the Atari 2600 Deepmind simulations rely on one or two actions by frame, which imposes that your APM is limited by the FPS you render. Even with two policy networks - one for the keyboard and one for the mouse - you are headbutting against 120APM pretty much. It is not impossible to think about operating several policy networks in parallel in order to enable strong ( think Korean multiple drops ) multitasking, but it is a new area that needs to be explored - the connections between networks and their interaction would need to be thought through carefully. Some cutting-edge research with asynchronous networks goes in a similar direction.

4. Point-and-click games have not been tackled yet by RL ; they are joystick or keyboard-based, ergo with binary 'push or don't push' states, but no mouse game has been tackled by a policy network as far as I know. This brings its own sets of challenges ( the AI will have to figure out by itself, for instance, how to move the mouse in optimal ways, which includes making straight lines, position the cursor close to a Nexus or a pylon, etc ).

5. Starcraft is also 'multi-screen' - it requires frequently changing views with your F keys ( move to different bases and engagement battles ) in order to correctly represent the full state of the game. So far, Atari 2600 games have been mono-screen only. Again, it is not impossible to imagine this will be overcome in the future, it is just harder to do right now.

6. Combinatorial explosion in the number of unit compositions is also hard to tackle - every time you add a potential unit to the mix, the possibilities for army composition multiply, which is why in the campaign mode you learn to play from introduces units pretty much one at a time. It would objectively be much, much harder to start playing full games from laddering and without an instruction manual, which what the Deepmind approach is.

7. The meta in SC rotates on a regular basis - it is 'non stationary', which adds to the list of problems encountered by a machine that would learn by playing on ladder, as some of the strats and playstyles learned earlier could well be obsolete - and hard-countered - by the time they are assimilated. This happens with human players too ; they have to make a conscious effort to get out of a slump, learn more new information, and forget about the old. Some work on policy distillation or optimal brain damage in neural networks goes, very tentatively, in that direction. Again, this is hard.

For all those reasons, it would be an incredible achievement already to have a Starcraft deep reinforcement learning AI that can teach itself to play a very easy computer AI in a setting with only workers, and maybe a unit list restricted to just a couple, like zealots and dragoons.

If you look at the performance of reinforcement learning in 2d games such as Atari, 'mechanical' games like Pong or Breakout get to much higher skill levels than games with planning required such as Pacman. It is hence entirely possible that Starcraft Deepmind would play mechanically correctly, but overall pretty poorly, as one can only speculate. If you add up all the objection points above, you can get a feel for why there is quite a long way to go.

Happy to provide reference articles list if required.

BeStFAN

483 Posts

March 12 2016 00:06 GMT

#155

On March 11 2016 22:47 MyLovelyLurker wrote:
I've been watching Broodwar for 15+ years, and programming reinforcement learning engines for a few. Here are a few thoughts on why the specific DeepMind approach is going to be very hard for SC, although it might well happen in around 10 years time :

1. We are assuming the AI teaches itself to play only from a realtime view of the pixels on the screen, and knows nothing about any score at all - *there is no score in Starcraft*, unlike in the arcade Atari 2600 games that have been mostly arcade games, with a clear numerical objective ( the score ) to be maximized by the playing agent. The act of playing thereby becomes a calculus problem ( to first order, set the gradients of the score function to zero ). Not impossible but harder in Starcraft.

2. Starcraft II is an imperfect information game, as opposed to chess or go where the board contains the whole information available to both players. Whilst it is possible to do reinforcement learning in that setting, it is a relatively new field and adds to the difficulty - articles are being published now on the subject.

3. The 60 or 120 APM barrier will not be broken easily. Right now in the Atari 2600 Deepmind simulations rely on one or two actions by frame, which imposes that your APM is limited by the FPS you render. Even with two policy networks - one for the keyboard and one for the mouse - you are headbutting against 120APM pretty much. It is not impossible to think about operating several policy networks in parallel in order to enable strong ( think Korean multiple drops ) multitasking, but it is a new area that needs to be explored - the connections between networks and their interaction would need to be thought through carefully. Some cutting-edge research with asynchronous networks goes in a similar direction.

4. Point-and-click games have not been tackled yet by RL ; they are joystick or keyboard-based, ergo with binary 'push or don't push' states, but no mouse game has been tackled by a policy network as far as I know. This brings its own sets of challenges ( the AI will have to figure out by itself, for instance, how to move the mouse in optimal ways, which includes making straight lines, position the cursor close to a Nexus or a pylon, etc ).

5. Starcraft is also 'multi-screen' - it requires frequently changing views with your F keys ( move to different bases and engagement battles ) in order to correctly represent the full state of the game. So far, Atari 2600 games have been mono-screen only. Again, it is not impossible to imagine this will be overcome in the future, it is just harder to do right now.

6. Combinatorial explosion in the number of unit compositions is also hard to tackle - every time you add a potential unit to the mix, the possibilities for army composition multiply, which is why in the campaign mode you learn to play from introduces units pretty much one at a time. It would objectively be much, much harder to start playing full games from laddering and without an instruction manual, which what the Deepmind approach is.

7. The meta in SC rotates on a regular basis - it is 'non stationary', which adds to the list of problems encountered by a machine that would learn by playing on ladder, as some of the strats and playstyles learned earlier could well be obsolete - and hard-countered - by the time they are assimilated. This happens with human players too ; they have to make a conscious effort to get out of a slump, learn more new information, and forget about the old. Some work on policy distillation or optimal brain damage in neural networks goes, very tentatively, in that direction. Again, this is hard.

For all those reasons, it would be an incredible achievement already to have a Starcraft deep reinforcement learning AI that can teach itself to play a very easy computer AI in a setting with only workers, and maybe a unit list restricted to just a couple, like zealots and dragoons.

If you look at the performance of reinforcement learning in 2d games such as Atari, 'mechanical' games like Pong or Breakout get to much higher skill levels than games with planning required such as Pacman. It is hence entirely possible that Starcraft Deepmind would play mechanically correctly, but overall pretty poorly, as one can only speculate. If you add up all the objection points above, you can get a feel for why there is quite a long way to go.

Happy to provide reference articles list if required.

could anyone answer this?: what is the significance of AI's ability to master the game of Go in relation to what it means for it's ability to play BW at a high enough level?

in other words, before and after the developments required for the ability to beat sedol what tools has AI gained in relation to it's abilty to play SC?

rockslave

Brazil318 Posts

March 12 2016 12:55 GMT

#156

[B]1. We are assuming the AI teaches itself to play only from a realtime view of the pixels on the screen, and knows nothing about any score at all - *there is no score in Starcraft*, unlike in the arcade Atari 2600 games that have been mostly arcade games, with a clear numerical objective ( the score ) to be maximized by the playing agent. The act of playing thereby becomes a calculus problem ( to first order, set the gradients of the score function to zero ). Not impossible but harder in Starcraft.

I don't think your first hypothesis is true, the AI would be able to read the data in the replay files and judge plays accordingly (only in the training phase).

Also, there is a natural language to describe the moves: the one people use to describe AIs in BW (stuff like GTAI).