I tried with the same model that we used for solving CartPole env. I trained it for 30000 epochs. Each epoch up to 15000 got the reward of -200 continuously. then some epochs got the reward of around -180 but then again it started getting -200 reward. I didn’t understand why it didn’t get better after 15000 epochs. should I train it for more than 30000 epochs? How to know a sufficient number of epochs for these models. Does the size of deque memory play an important role in learning?
What will be a better approach than this?
RL model for MountainCar Env
Hey @nuts2021, 30000 epochs is very huge number of epochs, try to increase the complexity of the model by adding more layers.
What will be answers to other sub questions ?
Hey @nuts2021,
“How to know a sufficient number of epochs for these models”
-> You need to set a threshold for exploration rate, which is not fixed, you also need to keep track of the score, score should not decrease by a suddeng change. If it does you need to stop there.
“Does the size of deque memory play an important role in learning?”
-> Yes of course it plays some role, it should not be neither too small nor too large, 32,64,128 will solve the purpose
“What will be a better approach than this?”
So far q learning is the best approach that gives preferable results in game playing, other alternatives are building but they are in still process and you need to read blogs etc and need to implement them to compare the results.
Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.