-
So, I'm relatively new ML/RL I've kinda messed around with Unity's ML agents but did the basics. I'm trying to do a learning project similar to the Unity's Dragon dungeon example where there are 4 agents, one dragon and 3 'players'. The dragon wants to touch all of the players, the players want to avoid the dragon and reach the exit. The dragon gets a reward if it eats a player, the players get a reward if they reach the exit. So, the issue I'm having is how would you do a multi-agent environment. So initially (or right now) I had the idea to have two managers, one for dragons and the other for players. I was setting up a player environment in a separate scene, however, after some debugging and error testing. The training manager only finds environments that are direct children of the node, so not only can you not use a separate scene to make the training instance, I can't copy the contents of the scene to the main scene since I need to have two separate environments to keep track of two different sets of rewards which both can't be direct children and it be clean. The other thing I was thinking about was possibly copy and pasting the training manager to each instance, but from my understanding, doesn't that negate the value of having multiple instances since each is learning by itself? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Hey. Thanks for trying out RL Matrix. It's exactly as you said the manager finds all the the children of T where T is your kind of environment. If you have two different environments you would need two managers. But exactly as you said two scripts cannot sit on the same (direct child) node. I set it up that way because I assumed users would typically want tens if not hundreds of environments. I'll replace with a way to search all children instead. |
Beta Was this translation helpful? Give feedback.
-
ok updated the nuget hope it doesnt break :) |
Beta Was this translation helpful? Give feedback.
Here Man, I fixed it for you, you were returning null observation due to some reset mess some key points:
These I didnt fix for you:
3. Your agents interact with env for many episodes without any reward, the algo will not know which direction to move in (sparse reward problem)
4. In the unlikely event the dragon catches the player or player moves to exit you only give reward at the end, the algo will not know which actions to attribute reward to
DungeonDragonAI.zip
(credit assignment problem)
You should maybe…