On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
Basically computational power is the limit. Chess/Go AI still have that problem. Theoretically a computer would be unstoppable because they'd just compute every possibility and win from there. But this takes so much computer power that it probably you'll never have a complete analysis like that until quantum computers.
So for Chess, much like how actual Grandmasters would play, it's simpler and better to start with a full set of game openers, then the end game scenarios, and everything off-script would analyze 2-3 moves ahead.
Dota is the same, but a lot worse. Though, to a bit fair, a complex strategy and plan is already in play with OpenAI. The end goal is to kill the enemy ancient, the opposition is 5 completely random and unpredictable agents, and the AI had to create a plan and strategy to still reach that goal.
Obviously you're talking about higher complexity of micro-strategies, but it's somewhat important to note that because of how learning AI works, it's not actually just risk/reward evaluating the game. How it acts now is the result of thousands of simulations with only supplied knowledge of the game mechanics. The AI is playing what it has determined to be the best strategy, but it's just not flexible enough to do anything else.
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
So reinforcement learning has a very bad "sample complexity" which is to say, to learn a close-to optimal policy(strategy) you will need to play a ton of games.
The lower bound for the number of games is porportional to (1 / (1 - gamma))^3
where gamma is the "discount-factor" where it allows you to see more into the future.
So the closer gamma is to 1.0, the more into the future you can see, but the more samples you would need. This makes sense because the more and more you're planning into the future, the more and more data / games you would need to play in order to grasp what's the best startegy
OpenAI is using gamma of 0.9997, which translates to at lease 37037037037 dota games.
I emphasize AT LEASE because this is a LOWER bound, the real number is likely much much higher than this, probably 2^100 times bigger than this if not more.
So in a sense yes, the computation is the bottle neck
On August 17 2018 22:39 FreakyDroid wrote: It was too obvious they have no long term strategy, my first impressions were these:
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
Basically computational power is the limit. Chess/Go AI still have that problem. Theoretically a computer would be unstoppable because they'd just compute every possibility and win from there. But this takes so much computer power that it probably you'll never have a complete analysis like that until quantum computers.
So for Chess, much like how actual Grandmasters would play, it's simpler and better to start with a full set of game openers, then the end game scenarios, and everything off-script would analyze 2-3 moves ahead.
Dota is the same, but a lot worse. Though, to a bit fair, a complex strategy and plan is already in play with OpenAI. The end goal is to kill the enemy ancient, the opposition is 5 completely random and unpredictable agents, and the AI had to create a plan and strategy to still reach that goal.
Obviously you're talking about higher complexity of micro-strategies, but it's somewhat important to note that because of how learning AI works, it's not actually just risk/reward evaluating the game. How it acts now is the result of thousands of simulations with only supplied knowledge of the game mechanics. The AI is playing what it has determined to be the best strategy, but it's just not flexible enough to do anything else.
In OpenAI's case the bots are not directly trying to win the game but rather they learn to maximize their manually crafted reward. Certainly some things they do require something like foresight, like prioritizing getting a tower kill in 30 seconds over farming individually right now. But there are some things that seem quite difficult to fully learn with this sort of approach.
Take warding for example. Placing a ward alone doesn't give them any reward. So they would have to learn that placing a ward in a specific sort of game situation in a specific spot allows them to play better over the next six minutes or so. It'll be interesting to see how the warding develops over time. Sentries to counter invis heroes seems a bit more simple because it can be more short term (though obviously even there it's often good to use obs+sentry combos to spot movements) and I suspect they could learn how to use wards behind a tower to improve their tower pushes.
Also another thing that came to mind is how they deal with the prospect of losing a game. The bots aren't taught to deal with the ancient blowing up as a complete disaster but rather it's just one aspect that gives a negative reward at some point in the future. When behind, a human team is likely to take a risk and go for a rosh fight for example because they know the enemy will likely be able to push their base if they get the aegis. As far as I gather, this sort of approach isn't necessarily what the bots would do. They may think that taking a fight now would result in them losing it and they just continue farming because it is what maximizes their reward for the next few minutes. However, they may end up straight up losing shortly after that, but from their perspective it isn't worse than probably getting crushed in a 5v5 fight right now.
But not sure what exactly OpenAI's plans are. Will they continue to use Dota as a test bed to test their ability to adapt to all sorts of weird situations? Or will they just stop the project once they beat pros at TI?
On August 18 2018 17:37 WolfintheSheep wrote: Basically computational power is the limit. Chess/Go AI still have that problem. Theoretically a computer would be unstoppable because they'd just compute every possibility and win from there. But this takes so much computer power that it probably you'll never have a complete analysis like that until quantum computers.
So for Chess, much like how actual Grandmasters would play, it's simpler and better to start with a full set of game openers, then the end game scenarios, and everything off-script would analyze 2-3 moves ahead.
Dota is the same, but a lot worse. Though, to a bit fair, a complex strategy and plan is already in play with OpenAI. The end goal is to kill the enemy ancient, the opposition is 5 completely random and unpredictable agents, and the AI had to create a plan and strategy to still reach that goal.
Obviously you're talking about higher complexity of micro-strategies, but it's somewhat important to note that because of how learning AI works, it's not actually just risk/reward evaluating the game. How it acts now is the result of thousands of simulations with only supplied knowledge of the game mechanics. The AI is playing what it has determined to be the best strategy, but it's just not flexible enough to do anything else.
I think GPU computation will soon surpass CPU computation, and it already has in some areas, so my money would be on that rather than on quantum computers, at least in the near future. I use GPU rendering for 3D work and while it is faster than CPU (arguably), its got some nasty limitation with RAM, not being able to combine the RAM from multiple GPU's, however Im not sure if thats the case with GPUs for machine/deep learning. The new ones with tensor cores from Nvidia seem to be tailored towards these kind of tasks, but seeing as they have 0 competition at the moment, I dont think they are in any rush to improve that technology or make it available at cheaper prices.
Yeah, that was my read on the AI, which didnt knock my socks off. I know the OpenAI team were happy with the result, but I personally didnt see the bots as anything special because the only thing that can impress me is planning and strategy rather than godlike execution. However now that I understand the limitations it has due to compute power, I wonder what will it take to have a break through on this field: better algorithms or better compute power, or maybe both.
On August 17 2018 22:39 FreakyDroid wrote: It was too obvious they have no long term strategy, my first impressions were these:
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
So reinforcement learning has a very bad "sample complexity" which is to say, to learn a close-to optimal policy(strategy) you will need to play a ton of games.
The lower bound for the number of games is porportional to (1 / (1 - gamma))^3
where gamma is the "discount-factor" where it allows you to see more into the future.
So the closer gamma is to 1.0, the more into the future you can see, but the more samples you would need. This makes sense because the more and more you're planning into the future, the more and more data / games you would need to play in order to grasp what's the best startegy
OpenAI is using gamma of 0.9997, which translates to at lease 37037037037 dota games.
I emphasize AT LEASE because this is a LOWER bound, the real number is likely much much higher than this, probably 2^100 times bigger than this if not more.
So in a sense yes, the computation is the bottle neck
I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D
However, even if the AI had all the computation power it needs, I still dont understand how the problem of planning ahead or devising a longer strategy would help it achieve a more human like read on the game, when they are still limited by the immediate rewards. Maybe Im asking the wrong question dunno.
I just watched this video, but sadly its way too technical for me to understand it
On August 17 2018 22:39 FreakyDroid wrote: It was too obvious they have no long term strategy, my first impressions were these:
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
So reinforcement learning has a very bad "sample complexity" which is to say, to learn a close-to optimal policy(strategy) you will need to play a ton of games.
The lower bound for the number of games is porportional to (1 / (1 - gamma))^3
where gamma is the "discount-factor" where it allows you to see more into the future.
So the closer gamma is to 1.0, the more into the future you can see, but the more samples you would need. This makes sense because the more and more you're planning into the future, the more and more data / games you would need to play in order to grasp what's the best startegy
OpenAI is using gamma of 0.9997, which translates to at lease 37037037037 dota games.
I emphasize AT LEASE because this is a LOWER bound, the real number is likely much much higher than this, probably 2^100 times bigger than this if not more.
So in a sense yes, the computation is the bottle neck
I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D
However, even if the AI had all the computation power it needs, I still dont understand how the problem of planning ahead or devising a longer strategy would help it achieve a more human like read on the game, when they are still limited by the immediate rewards. Maybe Im asking the wrong question dunno.[/QUOTE] As I understand these neural networks, which to be honest I may not fully, the AI learns but creating a gigantic set of data and finding the optimum actions amongst that set.
So the AI won't actually be planning ahead or devising a better strategy on the fly. It needs to have those already in its playset, and then the game needs the match the conditions to use that strategy.
On August 17 2018 22:39 FreakyDroid wrote: It was too obvious they have no long term strategy, my first impressions were these:
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
So reinforcement learning has a very bad "sample complexity" which is to say, to learn a close-to optimal policy(strategy) you will need to play a ton of games.
The lower bound for the number of games is porportional to (1 / (1 - gamma))^3
where gamma is the "discount-factor" where it allows you to see more into the future.
So the closer gamma is to 1.0, the more into the future you can see, but the more samples you would need. This makes sense because the more and more you're planning into the future, the more and more data / games you would need to play in order to grasp what's the best startegy
OpenAI is using gamma of 0.9997, which translates to at lease 37037037037 dota games.
I emphasize AT LEASE because this is a LOWER bound, the real number is likely much much higher than this, probably 2^100 times bigger than this if not more.
So in a sense yes, the computation is the bottle neck
Can you count to ten at least?
2^100 is ridiculously large number. I will give you some multiplication.
Imagine that 1 game takes 1 microsecond and that you use all x86 computers in the world.
Intel produces ~400 million CPUs a year. Let say this translates to 400 million computers a year. Let say we combine all computers created in last 50 years. That's 20 billions computers. Now we used them to play dota2 for last 8 years. 8 years = 252460800 seconds = 2524608 * 10^2 seconds = 2524608 * 10^8 microseconds <= 2.6 * 10^14 seconds.
Combined game power is 20 billions - 2 * 10^10 - computers times combined time 2.6 * 10^14.
That's 5.2 * 10^24. That's under 2^83. In other words even if someone run simulation on a completely unrealistic number of computers for completely unrealistic time assuming one game takes unrealistically short time to compute, we still are nowhere near 2^100 games, much less 2^100 * 37037037037 games.
On August 17 2018 22:39 FreakyDroid wrote: It was too obvious they have no long term strategy, my first impressions were these:
On August 06 2018 08:38 FreakyDroid wrote: it seemed as if they do stuff that directly rewards them, ie take tower, last hit creep, make a kill, prevent them from hitting my tower etc. Dota is way more nuanced than that, the reward doesn't always have to follow a few simple linear steps, it involves a lot of prediction/foresight something which these bots didnt have.
After reading Evan's blog, Im glad that my observations without having any prior knowledge of how the AI worked are pretty spot on. So my question to Evan or anyone who knows about it is, is it hard to to code an AI that can, for the lack of better word, remember a more complex network of steps, or maybe plan/plot a more complex strategy that isnt that dependent on the immediate reward[s]? Basally try to mimic foresight and perhaps sacrifice an immediate reward in order to gain an advantage later on. Or is compute power (or perhaps storage) a problem?
Basically computational power is the limit. Chess/Go AI still have that problem. Theoretically a computer would be unstoppable because they'd just compute every possibility and win from there. But this takes so much computer power that it probably you'll never have a complete analysis like that until quantum computers.
So for Chess, much like how actual Grandmasters would play, it's simpler and better to start with a full set of game openers, then the end game scenarios, and everything off-script would analyze 2-3 moves ahead.
Dota is the same, but a lot worse. Though, to a bit fair, a complex strategy and plan is already in play with OpenAI. The end goal is to kill the enemy ancient, the opposition is 5 completely random and unpredictable agents, and the AI had to create a plan and strategy to still reach that goal.
Obviously you're talking about higher complexity of micro-strategies, but it's somewhat important to note that because of how learning AI works, it's not actually just risk/reward evaluating the game. How it acts now is the result of thousands of simulations with only supplied knowledge of the game mechanics. The AI is playing what it has determined to be the best strategy, but it's just not flexible enough to do anything else.
That is so out of data what you just said. You can absolutely compute every move in Chess, computers have been able to do that for the past 20 years. Thing is modern AI systems use intuition, its essentially really specified brain power to do the same thing over and over, how it learns stuff is by rewards, the simpler the game the easier it is to maximize play faster, the more complex the game the more rewards, the harder it is to master.
Anyways Google's open mind AI has already beaten the chess computer blue fish or whatever the name was. In fact OpenAI didn't even lose a single game against the computer that calculates literally up to 10k moves ahead, openAI uses intuition and is able to squeeze out victories.
The biggest problem for chess for example is computing the exact amount of moves, it worthless to compute 1000 moves ahead, the game has 1/10k chance to end like that, so its about understanding how your opponent plays and figuring out what move is he playing for, for humans this is usually 2-5 moves ahead for professionals, 0-2 moves for amateurs.
So a modern AI is essentially a brain that specializes in a certain thing. With repetition it is able to master things like Chess, GO, even Dota it seems now. The biggest problem I see with today's AI is that it needs too much repetition to learn stuff. A normal human being can learn to play decent dota with like 20-50 games, I'm talking about not doing dumb shit like suicide into towers, suicide into rosh at level 4, using salve's while enemy heroes are hitting you, just basic stuff like that. For a human it would take 20-50 games to be at a decent level to not to dumb shit, for AI it takes 2-5 million hours to learn that.
OpenAI has played hundreds of millions of hours and that is with limited hero pool and certain game modifications.
AI advantage is that it can play 180 years worth of play in a single day, every day. Though you do need super expensive supercomputers and gazzilion of hard drives to store all of that data and obviously that is a lot of electricity expanded for a very specific task.
I'm no expert on AI, but everything about chess in the above post is really just wrong. You absolutely cannot 'compute every move' in chess. The conservative estimate for the amount of typical (ie 40-move-long) chess games is 10^120, a number that far outstrips the amount of atoms in the universe. Even the estimate of reasonable chess games (ie. excluding obvious blunders) is 10^40. We aren't even close to being able to solve chess via brute force (ie. a minimax algorithm) and if we were, no 'intuitive AI' would ever find any edge vs a traditional computer as the game would just be solved.
Saying 'a computer that literally calculates up to 10k moves ahead' is just as misleading as claiming GMs 'usually calculate 2-5 moves ahead'. Chess calculation is not a singular string of white and black moves, it's a game tree starting from the present position that includes all legal moves (the brute-force method, an optimised version of which is used by traditional engines) or all candidate moves (the human method), and going as far as computational or brain power allow. Calculating 4 ply (2 moves each) ahead in a complex middlegame position with 10+ candidate moves per ply is far more difficult (or often even a pointless approach for humans) than calculating 30 ply ahead for a mate in 15 where every single move is forced.
But anyway, I realise the post was more about AI than chess or traditional chess computers and I do agree that AlphaZero only taking 4 hours to surpass (an unoptimal version of) Stockfish is incredibly impressive. Though I'd be careful with words like intuition when talking about an AI. While it's true that AlphaZero evaluated far fewer positions than Stockfish, there could be many reasons for that. We simply don't know exactly why AlphaZero is as good as it is. It's one of the most fascinating things to me in all this: With traditional chess computers, we know exactly how they reach their conclusions (we even gave them the algorithms by which they judge positions). With an AI, we really don't, at least as far as I understand it (evan?). We give them the tools to learn, we understand how they learn, but the results of that learning are a black box that can surpass us.
======================== I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D =========================
On August 22 2018 06:30 evanthebouncy! wrote: ======================== I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D =========================
It's (1 / (1 - gamma) )
so if gamma is 1
you get 1 / (1 - 1) = 1 / 0 = infinite
I believe the point is that you can never have gamma = 1, yes
On August 22 2018 06:30 evanthebouncy! wrote: ======================== I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D =========================
It's (1 / (1 - gamma) )
so if gamma is 1
you get 1 / (1 - 1) = 1 / 0 = infinite
I believe the point is that you can never have gamma = 1, yes
I mean if you just want gamma = 1 you'd just use the episodic count H = 20000
It is fairly agreeable in the RL community that (1 / (1 - gamma) ) ~ H
One for a stationary RL (i.e. you expect the game go on forever and has no definitive ending) One for episodic RL (i.e. you expect the game to end upon a certain point, i.e. Go, or short-ish game of dota)
On August 22 2018 06:30 evanthebouncy! wrote: ======================== I've never been particularly good at math, but if gamma is 1, then wouldnt that equation be 0, which means the AI would have to play 0 games? Perhaps im misunderstanding the equation ... so if thats a dumb question, just leave it :D =========================
It's (1 / (1 - gamma) )
so if gamma is 1
you get 1 / (1 - 1) = 1 / 0 = infinite
1/0 does not equal infinite
1/0 is undefined and therefore mathematically impossible.