NAVIGACIJA U TRODIMENZIONALNOM OKRUŽENJU Obstacle Tower UPOTREBOM UČENJA USLOVLJAVANJEM
Ključne reči:
Učenje uslovaljavanjem, trodimenzionalno okruženje, kontrola okruženja, navigacija, ograničene nagrade, neuronske mreže
Apstrakt
Učenje uslovljavanjem u današnjem poretku stvari kada je veštačka inteligencija u velikom usponu predstavlja povoljno polje za nova istraživanja.. Jedan od problema koji je rešavan poslednjih godinu ili dve jeste problem kontrole okruženja ili navigacije. U ovom radu predstavljen je jedan vid rešenja problema navigacije i generalizacije u trodimenzionalnom okruženju pri postojanju ograničenja nagrada, formiranjem autonomnog agenta tehnikama dubokog učenja. Evaluacija performansi agenta izvršena je poredbom sa ljudskim performansama i rezultatima već opisanih u propratnim naučnim radovima.
Reference
[1] AY Ng. (2017, December 14). Practical applications of reinforcement learning in industry. Retrieved June 24, 2019 from https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
[2] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
[3] Python. (n.d). Retrieved from https://www.python.org/
[4] Juliani, A., Khalifa, A., Berges, V. P., Harper, J., Henry, H., Crespi, A., ... & Lange, D. (2019). Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. arXiv preprint arXiv:1902.01378.
[5] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[6] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[7] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[2] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
[3] Python. (n.d). Retrieved from https://www.python.org/
[4] Juliani, A., Khalifa, A., Berges, V. P., Harper, J., Henry, H., Crespi, A., ... & Lange, D. (2019). Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. arXiv preprint arXiv:1902.01378.
[5] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[6] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[7] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Objavljeno
2019-12-21
Sekcija
Elektrotehničko i računarsko inženjerstvo