OpenAI's Dota 2 bots vs. 5 top professionals in TI - Page 12

stink123

United States241 Posts

August 23 2018 09:45 GMT

#221

Some quick thoughts about the game today:

1. Bots are really good at team fights. (Although maybe because their draft was all teamfight?)
2. Bot wards are horrible. Triple sentry in front of rosh pit? Vision ward at on top of their standing tower? Wards in front of fountain?
3. Once late game hit, bots wasted tons of ults farming. (Maybe they knew they were losing and were trying to farm their way back?) This ended up hurting them seriously during late game fights. They also wasted ults to kill 1 hero.
4. Bots are really good at diving, but this caused them to overextend sometimes.
5. Really inefficient buyback usage, at one point CM did a "rage buyback".

Then again, despite the bots playing horribly in late game, they still made the game look really close. Hopefully humans still come out ahead over the next few days, but we'll see. It's weaknesses are really glaring though.

spudde123

4814 Posts

August 23 2018 10:30 GMT

#222

I have yet to see the game fully but based on what we saw before indeed teamfights are their biggest strength. And because of that it's probably quite hard for humans to just roll over them. It's quite scary to engage against them without a significant advantage either through items or just positioning. From the clips I saw the bots for example dodged w33's Axe calls several times with blinks, euls and whatnot because their reaction time is shorter than the call cast time. If you can't even do a reliable initiation with something like Axe, you sort of have to slowly win by moving around the map better and only taking favourable fights.

polgas

Canada1770 Posts

August 23 2018 15:01 GMT

#223

Problem with OpenAI in this game was not attempting to breach high ground when they had a slight lead:
- Doesn't seem to know how to use aegis for fights in the high ground, even when they got aegis multiple times.
- Giving aegis to Lich.
- Letting Gyro get picked off alone when it had aegis.
- Blowing their ults on creeps at random times.

Maybe OpenAI calculated that it wasn't able to win the fight uphill and went into farm mode.

Tanukki

Finland579 Posts

August 23 2018 17:08 GMT

#224

OK, it can't be helped that they have to play an older version of the game.

The altered rules are all bullshit, though. It means the pros can't play at their best, so there will always be doubts even if the bot wins.

So they make a weird compromise where Blitz draws up a supposedly fair draft. AND THEN the bot goes haywire and loses the big stage match. OUCH.

OpenAI isn't a massive company, so it's understandable that they haven't quite surpassed Google in AI technology. I still can't help but to be a little disappointed at all this.

spudde123

4814 Posts

August 23 2018 17:53 GMT

#225

On August 24 2018 02:08 Tanukki wrote:
OK, it can't be helped that they have to play an older version of the game.

The altered rules are all bullshit, though. It means the pros can't play at their best, so there will always be doubts even if the bot wins.

So they make a weird compromise where Blitz draws up a supposedly fair draft. AND THEN the bot goes haywire and loses the big stage match. OUCH.

OpenAI isn't a massive company, so it's understandable that they haven't quite surpassed Google in AI technology. I still can't help but to be a little disappointed at all this.

I thought they are now playing the current patch? The match against Blitz & co was on some old patch, but I thought I read/heard that it's now playing the TI patch.

Concerning the draft, I think something like what they did is the best way to go for a one-off match. If you allow the bots to always pick their lineup, they will likely only have to play a certain kind of game. It is also interesting how well they've learnt the game overall, and not only whether they can beat humans with some specific kind of strategy. Especially as it's just a one-off match where the humans have no possibility to see the bots play first and they don't play multiple games to allow the human team to adapt. As far as I understood Blitz selected the lineups (and I presume OpenAI people also checked that their bots find them pretty balanced) and then they randomly selected which lineup goes to which team.

Of course it would be great if it would be completely without restrictions, but now that they got rid of the invulnerable couriers I think the games are alright. I think the bigger asterisk on any possible bot win is it being just a one-off match where the humans can't play against the bots first. Dota snowballs really fast, so if you lose one teamfight with a certain kind of draft it may be really hard for you to do anything. Clearly the bots are different to play against than humans, so humans need some time to adjust to what works against them and what doesn't.

Baradrist

Germany96 Posts

August 23 2018 19:54 GMT

#226

If I may ... the rules are really not THAT restrictive. If a pro can't play with those small constraints ... I mean imagine, pros played before a lot of these things existed in the game. =)

But then again it doesn't matter. It's not about beating humans. It's about seeing what can be learned in such ways. And we've seen that pro teams can pick up things from the bots. Next TI possibly even cool strats and insight into how teamplay can win the game! The scenarios of application are endless ... not to spreak about other real life applications like their cool robotic hand.

All the shortcomings of the bots come obviously from them playing only themselves. The myriads of possible situations are just too much to go through (not to speak about going through them more then once to actually make a difference in the probability distributions and thus their update!), even with playing 180 hours a day! But look at how far they have come with that!! WOW. I am in awe. Imagine them learning with more human guidance. Imagine the cool new stuff we could see. For now those "poor" bots (:D) play against an alien race ... which we are to them! We play so strange for them, so imperfect. That makes them loose. It's like having two seperate island regions train for years and then let them compete against each other. That would be weird. We have a metagame that is completely separate from theirs. They haven't seen the stuff they see from us humans now! And if they have, they even estimate their chance of winning/loosing very nicely with it. But the stuff they do ... impressive, amazing. I love them changing the way we see roles already! Support, mid, carry, offlane? They don't give a shit. They trilane with sups only and win with it. That's what I want to see. Cool stuff.

On a last note: The fixed teams is good for the humans. If the bots choose themselves, they will be already much better, I think.

Tanukki

Finland579 Posts

August 23 2018 20:57 GMT

#227

On August 24 2018 04:54 Baradrist wrote:
If I may ... the rules are really not THAT restrictive. If a pro can't play with those small constraints ... I mean imagine, pros played before a lot of these things existed in the game. =)

...The fixed teams is good for the humans. If the bots choose themselves, they will be already much better, I think.

I agree that the bot ELO goes up significantly if it gets to draft. That's because with the limited hero pool, it's like a different game mode that humans have not practiced.

But then again it doesn't matter. It's not about beating humans. It's about...

Results are everything. Maybe some programming wizards are excited about the new tech OpenAI has developed, but losing the game isn't going to generate them any hype.

I got pretty damn excited about DeepMind in the last couple years with their achievements, and it felt like soon there'll be nothing AI cannot do. But now, if anything, I'm seeing the limitations of AI.

evanthebouncy!

United States12796 Posts

August 23 2018 21:30 GMT

#228

I will re-advertise my blog post here:

https://medium.com/@evanthebouncy/understanding-openai-five-16f8d177a957

A lot of why the bot acts the way they do can be answered if you give it a read.

ASoo

2865 Posts

August 23 2018 22:04 GMT

#229

What exactly are the restrictions the bot is currently playing with? It looks like the "5x immortal couriers" thing is gone, and maybe they have more heroes than before? But they're no longer drafting.

I took a look back at their page and didn't see the updated ruleset; is it explained somewhere?

polgas

Canada1770 Posts

August 24 2018 00:11 GMT

#230

Found the rules here:
(Wiki)

The International/2018/OpenAI Showmatches

Team compositions are limited to the following two variations:
Team A: Lich, Crystal Maiden, Death Prophet, Tidehunter, Gyrocopter
Team B: Lion, Witch Doctor, Necrophos, Axe, Sniper

Team sides and hero composition are decided by coin toss.
No Divine Rapier, Bottle.
No summons/illusions.
No Scan.

Humans win Game 2!

PlayerofDota

29 Posts

August 24 2018 03:09 GMT

#231

Seems like the bots have no clue how to play past 25 minutes or so. They are really good at the early game, they are really good at team fighting, but if they are not winning by a large margin by the mid game they don't know how to play.

Past 25 minutes they start doing really stupid and noob mistakes.

To me the biggest issue for the bots seems to be that they can't communicate with one another. I think giving the bots just a simple ping option, so they can basically ping to each other like humans do will improve their game a lot.

I don't know how difficult it would be to add this, might be that there is no way for the bots to learn how to actually communicate with each other and ping when they are ganking or whatever, or even if they did learn to ping, how hard would it be for them to understand what the pings mean.

We as humans have concepts way beyond the game, so pinging makes sense, if fact most of low to mid tier games are player with little to no actual communication, other than pinging.

So yeah, the bots would benefit immensely if they could learn to communicate with one others through pings.

Oshuy

Netherlands529 Posts

August 24 2018 12:34 GMT

#232

There could be a number of reasons for it.

Could be the average game time in their self-play is short, so that a longer game goes into uncharted territory.
Could be that the existing tweaks in the reward function are no longer sufficient when the game runs late: looking at the reward function description,
- there is a (arbitrary ?) scaling that lowers the value of last hitting/denying/killing/surviving as time goes by in favor of objective taking, which could lead to incorrect decisions providing the maximum reward
- not sure how it learns of the negative impact of buying back since costs have no negative value and buying back always provides a better opportunity for short term reward
- there could be something in hero prioritization late that is also hard to learn given a death/kill provides the same reward regardless of the target
Could be that the deciding factors in the endgame suffer from a horizon effect (waiting for buyback to be available or waiting for next rosh take often more than 5 min).

Garrl

Scotland1977 Posts

August 24 2018 13:29 GMT

#233

On August 24 2018 12:09 PlayerofDota wrote:
Seems like the bots have no clue how to play past 25 minutes or so. They are really good at the early game, they are really good at team fighting, but if they are not winning by a large margin by the mid game they don't know how to play.

Past 25 minutes they start doing really stupid and noob mistakes.

To me the biggest issue for the bots seems to be that they can't communicate with one another. I think giving the bots just a simple ping option, so they can basically ping to each other like humans do will improve their game a lot.

I don't know how difficult it would be to add this, might be that there is no way for the bots to learn how to actually communicate with each other and ping when they are ganking or whatever, or even if they did learn to ping, how hard would it be for them to understand what the pings mean.

We as humans have concepts way beyond the game, so pinging makes sense, if fact most of low to mid tier games are player with little to no actual communication, other than pinging.

So yeah, the bots would benefit immensely if they could learn to communicate with one others through pings.

The problem probably isn't communication, it looks as if it's playing a good early game because it just brings every hero to every fight, which ends up making it "win" the early game but in reality the proes were just doing a basic 4 protect 1 (as dictated by the hero choices really) and the core is way ahead.

Taf the Ghost

United States11751 Posts

August 24 2018 19:30 GMT

#234

On August 24 2018 21:34 Oshuy wrote:

Show nested quote +

There could be a number of reasons for it.

Could be the average game time in their self-play is short, so that a longer game goes into uncharted territory.
Could be that the existing tweaks in the reward function are no longer sufficient when the game runs late: looking at the reward function description,
- there is a (arbitrary ?) scaling that lowers the value of last hitting/denying/killing/surviving as time goes by in favor of objective taking, which could lead to incorrect decisions providing the maximum reward
- not sure how it learns of the negative impact of buying back since costs have no negative value and buying back always provides a better opportunity for short term reward
- there could be something in hero prioritization late that is also hard to learn given a death/kill provides the same reward regardless of the target
Could be that the deciding factors in the endgame suffer from a horizon effect (waiting for buyback to be available or waiting for next rosh take often more than 5 min).

Good human teams also coordinate Item Timings and Item Choices as they go late-game, rather than being individually optimized. That's the part that will take a long time for the Bots to truly learn to do.

They'll be kings of Pubs, though.

mozoku

United States708 Posts

August 25 2018 07:31 GMT

#235

On August 18 2018 18:10 evanthebouncy! wrote:

Show nested quote +

So reinforcement learning has a very bad "sample complexity" which is to say, to learn a close-to optimal policy(strategy) you will need to play a ton of games.

The lower bound for the number of games is porportional to (1 / (1 - gamma))^3

where gamma is the "discount-factor" where it allows you to see more into the future.

So the closer gamma is to 1.0, the more into the future you can see, but the more samples you would need. This makes sense because the more and more you're planning into the future, the more and more data / games you would need to play in order to grasp what's the best startegy

OpenAI is using gamma of 0.9997, which translates to at lease 37037037037 dota games.

I emphasize AT LEASE because this is a LOWER bound, the real number is likely much much higher than this, probably 2^100 times bigger than this if not more.

So in a sense yes, the computation is the bottle neck

On August 18 2018 17:37 WolfintheSheep wrote:

Show nested quote +

Basically computational power is the limit. Chess/Go AI still have that problem. Theoretically a computer would be unstoppable because they'd just compute every possibility and win from there. But this takes so much computer power that it probably you'll never have a complete analysis like that until quantum computers.

So for Chess, much like how actual Grandmasters would play, it's simpler and better to start with a full set of game openers, then the end game scenarios, and everything off-script would analyze 2-3 moves ahead.

Dota is the same, but a lot worse. Though, to a bit fair, a complex strategy and plan is already in play with OpenAI. The end goal is to kill the enemy ancient, the opposition is 5 completely random and unpredictable agents, and the AI had to create a plan and strategy to still reach that goal.

Obviously you're talking about higher complexity of micro-strategies, but it's somewhat important to note that because of how learning AI works, it's not actually just risk/reward evaluating the game. How it acts now is the result of thousands of simulations with only supplied knowledge of the game mechanics. The AI is playing what it has determined to be the best strategy, but it's just not flexible enough to do anything else.

I think you two are looking at the problem too narrowmindedly here. Or perhaps just responding to the question within the framework it was posed under.

You're right in the sense that, with infinite compute, OpenAI Five would almost certainly be far beyond human-level. But saying "computational power is the limit" isn't entirely correct either. According to OpenAI, these bots have played the equivalent of a total game time longer than the existence of human civilization. Yet humans are still beating them.

The corollary is that either humans have some mystical powers that can't be modeled mathematically (unlikely) or
that there exist more efficient algorithms to learn Dota than the ones that the OpenAI Five are currently applying (or humanity has yet discovered, for that matter).

Most of the progress in deep learning in the past 10 years has come from improving architectures. Computer vision is possible with a single hidden layer and "sufficient compute" (and data), but computer vision became a field once convolutional neural networks started getting applied to the problem (and you can even get pretty far training them on a single GPU). Natural language processing's recent advances in the past ~10 years have stemmed more from the introduction of embeddings and attention than a revolution in computing.

I suspect reinforcement learning will follow a similar development path. Quantum computing is the big x-factor to look out for though.

As an example, take blink-call on Axe. OpenAI Five never attempted it iirc. There are two expansions I can think of: 1) at AI reaction speeds (where training occurred), blink-call often isn't a good tactic; or 2) blink-call as an offensive tactic is such a specific sequence of events that the AI never attempted it in training.

For the sake of discussion, let's think about the latter. Clearly, a human has some method of intuition-based theorizing that leads to experimentation and learning (and this method is incredibly efficient relative to current RL methods). Even with intermediate rewards and randomness, it seems plausible that it would take longer than is realistically possible to "accidentally" figure out that blink-call is a good tactic. If we could describe, mathematically, the "algorithm" that humans apply to perform such learning, it seems reasonable they AI could learn at a speed (in games played) comparable to humans -- without the limitation of humans taking 45 min to play a single game of Dota. It would seem such an AI would likely defeat humans (especially given its inherent informational and coordination advantage).

In short, I'm just not comfortable with the statement "the problem boils down to compute."

WolfintheSheep

Canada14127 Posts

August 25 2018 09:22 GMT

#236

On August 25 2018 16:31 mozoku wrote:
I think you two are looking at the problem too narrowmindedly here. Or perhaps just responding to the question within the framework it was posed under.

You're right in the sense that, with infinite compute, OpenAI Five would almost certainly be far behind human-level. But saying "computational power is the limit" isn't entirely correct either. According to OpenAI, these bots have played the equivalent of a total game time longer than the existence of human civilization. Yet humans are still beating them.

The corollary is that either humans have some mystical powers that can't be modeled mathematically (unlikely) or
that there exist more efficient algorithms to learn Dota than the ones that the OpenAI Five are currently applying (or humanity has yet discovered, for that matter).

Most of the progress in deep learning in the past 10 years has come from improving architectures. Computer vision is possible with a single hidden layer and "sufficient compute" (and data), but computer vision became a field once convolutional neural networks started getting applied to the problem (and you can even get pretty far training them on a single GPU). Natural language processing's recent advances in the past ~10 years have stemmed more from the introduction of embeddings and attention than a revolution in computing.

I suspect reinforcement learning will follow a similar development path. Quantum computing is the big x-factor to look out for though.

As an example, take blink-call on Axe. OpenAI Five never attempted it iirc. There are two expansions I can think of: 1) at AI reaction speeds (where training occurred), blink-call often isn't a good tactic; or 2) blink-call as an offensive tactic is such a specific sequence of events that the AI never attempted it in training.

For the sake of discussion, let's think about the latter. Clearly, a human has some method of intuition-based theorizing that leads to experimentation and learning (and this method is incredibly efficient relative to current RL methods). Even with intermediate rewards and randomness, it seems plausible that it would take longer than is realistically possible to "accidentally" figure out that blink-call is a good tactic. If we could describe, mathematically, the "algorithm" that humans apply to perform such learning, it seems reasonable they AI could learn at a speed (in games played) comparable to humans -- without the limitation of humans taking 45 min to play a single game of Dota. It would seem such an AI would likely defeat humans (especially given its inherent informational and coordination advantage).

In short, I'm just not comfortable with the statement "the problem boils down to compute."

But you're not really discussing the computational problem that we brought up. Neural networks and learning AI didn't eliminate the computation problem, they just reduced the load.

The Blink-Call example is a perfect example of what the bot is capable of doing (and yesterday we did indeed see the bots Blink->Call). If the learning model is shaped correctly, the bots will simulate thousands of games, find that having a Blink Dagger and jumping into range of a target and casting Call will have a higher success rate of stunning, thus leading to more consistent rewards in XP and Gold.

That's all short term analysis and reward modelling, and not the limit of the OpenAI.

What is a limitation is long-term reward analysis. For example, the 4p1 strat. There, 4 players have to continuously make conscious decisions to take less immediately rewarding actions (creep stacking, not farming, being in risky areas, etc.) for 20-30 minutes for the goal of getting another hero to a point where it will win the game.

Whether it's live or in post-data analysis, more time and more actions will lead to exponentially more factors to include. If the AI can only efficiently account for 5 minutes of analysis or prediction, it will never take the actions that require more time to be rewarded. So stacking a camp 4 times to be farmed 10 minutes later to build and item 20 minutes later would be beyond it.

Sr18

Netherlands1141 Posts

August 25 2018 13:52 GMT

#237

His point was that the fact that humans figure out the usefulness of blink into call in much fewer games than the ai does, shows that there is room to improve the ai beyond just throwing more computational power towards it. If the learning of the ai is improved in such a way, it is likely they get better at other aspects of the game as well.

WolfintheSheep

Canada14127 Posts

August 25 2018 19:38 GMT

#238

On August 25 2018 22:52 Sr18 wrote:
His point was that the fact that humans figure out the usefulness of blink into call in much fewer games than the ai does, shows that there is room to improve the ai beyond just throwing more computational power towards it. If the learning of the ai is improved in such a way, it is likely they get better at other aspects of the game as well.

It's important to note that, while not exactly 1:1 comparable, the human brain is still vastly more powerful than any super computer.

mozoku

United States708 Posts

August 25 2018 23:58 GMT

#239

What is a limitation is long-term reward analysis. For example, the 4p1 strat. There, 4 players have to continuously make conscious decisions to take less immediately rewarding actions (creep stacking, not farming, being in risky areas, etc.) for 20-30 minutes for the goal of getting another hero to a point where it will win the game.

Whether it's live or in post-data analysis, more time and more actions will lead to exponentially more factors to include. If the AI can only efficiently account for 5 minutes of analysis or prediction, it will never take the actions that require more time to be rewarded. So stacking a camp 4 times to be farmed 10 minutes later to build and item 20 minutes later would be beyond it.

This is only true when you're trying to indiscriminately process the entire action space though. It's like trying to do computer vision with multilayer perceptrons. Computer vision underwent its revolution when researchers started applying architectures (convolutional neural networks) that, more or less, force the network to learn and identify objects (i.e. abstract concepts) in images and then identify relationships between the objects present in the image and the objects' locations in the image. Object identification is aided in CNNs by taking advantage of the fact that the pixels that make up objects in images tend to be clustered together. (Note: I'm using a very loose definition of objects here).

It's easy to imagine that similar abstractions likely exist that enormously reduce the computational load necessary to make a strategic Dota bot. For example, using the relative strengths of the drafts at various game stages can be used to influence whether the AI team should spend more time fighting or farming in the early-midgame. How to nudge the network to learn useful abstractions in an elegant and flexible way is surely a difficult problem, but it's not the same as saying "OpenAI's limiting factor is compute." And seeing as recent "AI" breakthroughs have largely come from similar cases of facilitating abstraction as well other architectural improvements, I think it's reasonable to expect that any potential OpenAI Five breakthroughs are more likely to come from that direction.

FreakyDroid

Macedonia2616 Posts

August 26 2018 16:16 GMT

#240

^ Yeah, I think that to have the bots simulate human like dota game play, the break through has to come from that direction. I called it foresight, but I guess abstraction is a more appropriate term. Otherwise they'll be stuck with what they have now, which are basically (or rather equivalent of) 500 mmr players with godlike mechanical skills, that's how I see these bots. That personally doesn't impress me at all.

Prev 1 10 11 12 13 14 16 Next All

Please or register to reply.

OpenAI's Dota 2 bots vs. 5 top professionals in TI - Page 12

Completed

Ongoing

Upcoming