alphaGo and style

evanthebouncy!

United States12796 Posts

March 12 2016 22:53 GMT

I would like to offer a story and a viewpoint on "style". It took me 2 hours to flesh it out and write, so hopefully you enjoy it.

I followed broodwar in my years in highschool until starcraft2 came out during my undergraduate. My first pro-game watched live over afreeca was the infamous 3 time bunker rush by Boxer vs Yellow and some of the last games I watched was where 5 hatch hydra zvp was becoming commonplace.

As we all know, in starcraft or any competitive video game there is this term called "meta". In short the meta(game) is the set of dominant styles of play at a given time.

To put it in concrete terms, take ZvP for example, earlier ZvP involves securing a 2nd base while fending off an early protoss pressure into a lurker/zergling containment.

Savior brought the 3 hatch mutalisks style that involves early simultanious mutalisks and lurker tech, and can use mutalisks to harass and threat to backstab to keep protoss at bay while using lurkers to secure a 3rd base and eventually a containment.

The 3 hatch muta was eventually trumped by Bisu which brought corsair / darktemplar (off of a fast forge expansion) strategy into the meta in this landmark MSL final game (I remember watching it and jumping with joy because I was a protoss player and protoss was absolutely dumpstered by zergs in those days)

The zerg eventually adapted into a stable 5 hatchery hydralisk style with a fast spire for scourges to limit early corsair counts.

I chose broodwar over say, sc2 as an example because unlike many games which are continuously tweaked and balanced, the last balance patch for broodwar was 1.08 which is set in 2001, and the game has not changed since. The reinvention of different styles was entirely of the players doing.

But that's enough on the story side.

The reason I want to discuss "style" rather than "strategy" is because strategy often has a connotation of optimality, whereas style is arbitrary. In the early heydays of Boxer, all he wanted to do is to micro his 5 marines and was extremely proficient at it. This style was proven to be suboptimal by his own disciple iloveoov that showed him if you have 50 marines maybe you don't need to micro at all.

Yet, it is impossible to conceive a human activity which we perform without a style. We were somehow able to come up with a set of almost arbitrary attributes (maximize efficiency of the first 10 marines in a dropship) that, once emphasized, give rise to an extremely potent local optimum that you would never able to find if you start your optimization without them.

It is also clear that, if a space can be searched efficiently / exhaustively, style cease to be relevant. As we witnessed alphaGo easily dispatched the top player, pulling out moves the equivalent of giving thousands of years of philosophies and aesthetics the middle finger.

In conclusion, rather than applauding the computer's ability to pull off an "inhuman" move as a triumph, it shows the limitations of AI. Coming from deepblue to alphaGo, not much has changed in their styles, they're both trying to exhaustively search the huge space as well as they can, and with better computers and more clever encodings (conv nn, deep rl). It's only a matter of time before our kimchi eating thinking machine lose to an enormous cluster on a diet of pure lightnings.

In conclusion (again), whatever style we observe in these game playing agent is a direct manifestation of the human efforts behind it. Only when the day come an agent plays a piece on the board, and when asked why replies "it looked like a bird" should we be genuinely worried about our existence.

Until then, keep being stylish.

(I can probably answer some questions on ai and deep reinforcement learning as I am fairly familiar with the field)

edit: damn my first post since forever. feels good to post as an ultralisk man

sc2chronic

United States777 Posts

March 12 2016 23:00 GMT

i think alphago can beat starcraft in go

DuckloadBlackra

225 Posts

March 12 2016 23:26 GMT

I would love to see AlphaGo play chess to see what style it develops as well as whether it can teach itself to become good enough to surpass the current top engines Stockfish and Komodo.

tec27

United States3676 Posts

March 13 2016 00:36 GMT

On March 13 2016 07:53 evanthebouncy! wrote:
In conclusion, rather than applauding the computer's ability to pull off an "inhuman" move as a triumph, it shows the limitations of AI. Coming from deepblue to alphaGo, not much has changed in their styles, they're both trying to exhaustively search the huge space as well as they can, and with better computers and more clever encodings (conv nn, deep rl). It's only a matter of time before our kimchi eating thinking machine lose to an enormous cluster on a diet of pure lightnings.

I think this statement fits your narrative better than the actual facts of the situation do. If you had matched AlphaGo against a pro on the Internet without telling them it was a computer, I think the vast majority of them would have been quite, quite surprised to find that out after the match. Partly because they didn't expect to be beaten by a computer, but also because the plays it makes are not that inhuman.

You also overestimate the value of distributed computing here, the distributed version of AlphaGo only has a 70% winrate over the single machine version. This was not a victory for computing power, we've had plenty of more powerful collections of computing resources in the past. The novel part of AlphaGo, and the reason it is able to play more like a human can play, is the combination of the techniques it is using. Humans do the sort of optimization search you are talking about as well, but the difference (historically) has been that humans are better at what we might term "intuition" from past experience. That is, humans can synthesize complex pattern recognition into a very fast path in their mind. By doing this, humans are able to discard a vast number of possible moves as being completely irrelevant, and focus quickly on ones that are relevant to winning the game. Computers have decidedly not been very good at this sort of thing in the past, but AlphaGo is a demonstration that this might not be an eternal limitation.

As far as the "style" that you talk about, it seems to me that this comes from a couple places:

1) Patterns of plays that develop out of the way you learned the game. I see no particular reason why neural nets wouldn't also experience this effect. The data or methods that are used to train them necessarily have an effect on the way the network activates during usage, similarly to how human brains work.

2) Optimization for different sorts of values. Humans have a tendency to optimize for things that "look cool" in their own play. Given two options that perform seemingly the same in a given game, they'd probably go for the one that makes them feel happier. If this was a desirable trait of an AI, I see no particular reason why you wouldn't be able to train such a thing given a way to value its outputs. Valuing the outputs is seemingly the hard part there, but we already have tons of people working every day to figure out how to better target ads at people using forms of AI. Is using AI to determine how an ad will appeal to someone really all that different from figuring out if a certain play/maneuver will?

Relatedly, DeepMind has talked about future plans to operate the same AI setup as AlphaGo, but beginning from "pure" foundations; that is, beginning from random play with itself, rather than training on expert Go games. I think the outcome of that work will likely be very interesting here: will the AI develop a style all its own, vastly different from the way humans have ever played the game?

Humans don't operate all that differently from what AlphaGo is doing, they just have a system of values decided by evolution and a less controlled environment/upbringing. AlphaGo, however, is a step in the direction of being able to do more of that with AI; to control less of the upbringing, to require less human intervention in their training, to generalize the solution to more and more fields. While we are prone to thinking today that AIs play games in a very clean, rigid way, I think this is only indicative of the limited number of variables they can value, and the limits in our abilities to train them organically. I don't think this is likely to be a limit for too much longer.

Full disclosure: I work for Google, although nothing related to DeepMind/AI

Chocolate

United States2350 Posts

March 13 2016 00:44 GMT

Alphago does not exhaustively search the set of possible moves from the current position. it is much more nuanced than that. A better way to conceive of how it works is by "pattern-finding". An example in starcraft: alphago will never even consider making very early missile turrets because it doesn't fit the general pattern of games that cause it to win.

We can probably expect alphago to come up with a pretty interesting style/meta during the portion where it plays games against itself to learn. It will probably start using a pretty standard meta (because it will look at a set of past games to learn from initially) but this could lead to it eventually playing a completely different meta after enough simulations against itself.

MyLovelyLurker

France756 Posts

March 13 2016 01:31 GMT

On March 13 2016 08:26 DuckloadBlackra wrote:
I would love to see AlphaGo play chess to see what style it develops as well as whether it can teach itself to become good enough to surpass the current top engines Stockfish and Komodo.

This was already undertaken last year by Matthew Lai's chess project 'Giraffe' ( arxiv.org) . While its ELO was slightly lower than the two monsters aforementioned, it's a single-man project that was trained by self-play only ( as opposed to crazy handcrafted evaluation functions ) for a couple days on a single machine, so it's extraordinarily impressive. It is not impossible to believe that given enough time, the program would learn enough by itself to overtake them.

Excalibur_Z

United States12200 Posts

March 13 2016 05:01 GMT

It's a very curious question and we have to ask first some basic questions:
- What race will it choose? (if not assigned one)
- What maps will it play on? (assuming it will have to train on specific maps)

If AlphaGo's descendant plays against itself at an accelerated game speed in order to learn and adapt, then it will probably learn from the very beginning, with four workers and a townhall structure. Maybe its first few hundred games will involve attacking with its first four workers and not building anything at all. Eventually it will figure out how to build units and send those more effective fighting units in to win more easily. The learning process always has to start from the beginning of a match and gradually extend. At some point, it will figure out that 2 zerglings result in a win, then 4 zerglings, then 6, and then it will learn to have its workers fight back. It probably won't even know what Battlecruisers or Defilers are until hundreds of thousands, perhaps millions of simulations, because they're so far down the tech tree and games never get that far.

It also remains to be seen at what point it will learn unit control. It could be the very first thing it learns, before even building its first structure. There's only so much you can do at the start of a game, but one thing you can do is send your workers blindly around the map and give them an attack command on the enemy base and win. It probably won't find out why Marines are good for a very long time (because it would keep losing to rushes until it learns to micro or wall). I would guess that everything it does would be centered around the early game and nothing more for months, if not years.

The really interesting thing is that because it keeps playing against itself, then it could eventually stop using rushes and start teching/early expanding because it's capable of fending off a rush. What's interesting about that is since it could theoretically have perfect micro by that point, a rush could simply beat most human opponents outright, but it wouldn't use it because it's learned that rushing is no longer an effective strategy against itself.

evanthebouncy!

United States12796 Posts

March 13 2016 05:22 GMT

On March 13 2016 09:36 tec27 wrote:
...

Hey I remember you! long time no see.

You seem to have the opinion that the majority (if not all) of human reasoning can be explained by pattern recognition. Am I correct in saying that?

Assume that assumption is correct then yes, the nn architectures today may well capture human-like reasonings. However there are certain aspect of intelligence like simulation and forming hypothesis that cannot naturally arise from these networks unless we explicitly encode them in.

The fact that we can learn and infer a huge amount of information based on very few training examples makes human different from a network at the moment. Now we can make the argument that human is a well-initialized and pre-trained network that only need a few additional datas to fit to the current environment, maybe.

but all in all I think the current network models is yet insufficient to capture intellegence, or style, for that matter.

tec27

United States3676 Posts

March 13 2016 06:27 GMT

I think the defining difference of AlphaGo is based around emulating human "intuition", but I don't know that I would say it represents the vast majority of human reasoning. Humans do tend to make use of it a lot (often subconsciously), but I think the reason its important to note for this particular case is that it is clearly important for winning at Go.

I dunno, the question of whether or not such AIs represent intelligent beings seems to be a philosophical question. If humans program something, can it actually learn things? If we introduce random events and let these mutate the program, is that the same thing as human learning? Can such a program be creative? You could go either way there, but I think we'd both probably agree on the outputs such a program could produce. It's more a matter of how you define learning or creativity or intelligence, and what that encompasses.

I don't think there's a particularly great grasp on what makes humans intelligent yet, but I also think things like AlphaGo are interesting because they can start to push the boundaries there and make you think about it. As these get better at doing things humans are able to do, we can also use them to identify the parts that *are* unique to humans and make us intelligent.

evanthebouncy!

United States12796 Posts

March 13 2016 06:43 GMT

#10

right except I don't think alphaGo emulated intuition as much as we gave it credit for. It merely used a better state abstraction via convulutional nn. and after a billion gradient descent later was able to converge on some better-than-human policy.

Although I do agree in that it is very useful to push the boundaries. Once we have a good AI for playing starcraft I'd be more convinced because starcraft has more states still and the decision is far more complex than Go. The whole partially observable aspect makes it fun to think about.

Incidentally, do you know if anywhere (in google or out) do the power consumption of these machines are being kept? I would like to make an infographic on how much equivalent hamburgers are used to train and run these networks, if you can point me in the right direction

letian

Germany4221 Posts

March 13 2016 09:00 GMT

#11

I thought highly speculative posts about alphago will never get out of SC2 section.
Network that learnt some patterns of some specific game is not an AI it is a joke.
But you can sell this joke to ppl for the sake of fun.

BisuDagger

Bisutopia19107 Posts

March 13 2016 10:23 GMT

#12

On March 13 2016 14:01 Excalibur_Z wrote:
It's a very curious question and we have to ask first some basic questions:
- What race will it choose? (if not assigned one)
- What maps will it play on? (assuming it will have to train on specific maps)

If AlphaGo's descendant plays against itself at an accelerated game speed in order to learn and adapt, then it will probably learn from the very beginning, with four workers and a townhall structure. Maybe its first few hundred games will involve attacking with its first four workers and not building anything at all. Eventually it will figure out how to build units and send those more effective fighting units in to win more easily. The learning process always has to start from the beginning of a match and gradually extend. At some point, it will figure out that 2 zerglings result in a win, then 4 zerglings, then 6, and then it will learn to have its workers fight back. It probably won't even know what Battlecruisers or Defilers are until hundreds of thousands, perhaps millions of simulations, because they're so far down the tech tree and games never get that far.

It also remains to be seen at what point it will learn unit control. It could be the very first thing it learns, before even building its first structure. There's only so much you can do at the start of a game, but one thing you can do is send your workers blindly around the map and give them an attack command on the enemy base and win. It probably won't find out why Marines are good for a very long time (because it would keep losing to rushes until it learns to micro or wall). I would guess that everything it does would be centered around the early game and nothing more for months, if not years.

The really interesting thing is that because it keeps playing against itself, then it could eventually stop using rushes and start teching/early expanding because it's capable of fending off a rush. What's interesting about that is since it could theoretically have perfect micro by that point, a rush could simply beat most human opponents outright, but it wouldn't use it because it's learned that rushing is no longer an effective strategy against itself.

The maps question is one I wonder about. Let's say it trains on fighting spirit until it has optimal scouting patterns. What happens if flash or boxer float their CC to another mineral patch at the start of the game. Does it keep searching? Does it send more scouts? Same for hiding buildings. How creative can Alpha get. Will it ever think to 2rax in the middle of the map?

evanthebouncy!

United States12796 Posts

March 13 2016 20:24 GMT

#13

On March 13 2016 19:23 BisuDagger wrote:

Show nested quote +

well see that we don't know. but ideally the ai would have a more abstract notion of scouting defined as: keep searching if enemies are not found yet. So it will keep going until it gets the vision instead of blindly do a pattern once and forgets.

Please or register to reply.

alphaGo and style

Completed

Ongoing

Upcoming