UČENJE USLOVLJAVANJEM UZ POŠTOVANJE SIGURNOSNIH MEHANIZAMA – STUDIJA SLUČAJA RADA UZ SAMO-MODIFIKACIJU

  • Ognjen Francuski
Ključne reči: Učenje uslovljavanjem, Neuronske Mreže, Samo-modifikacija, Samo-zabluđivanje

Apstrakt

S ubrzanim napretkom u istraživanju i primeni veštačke inteligencije, raste i zabrinuost o bezbednoj upotrebi iste u autonomnim sistemima. Iako trenutno ne postoji standardizovana formalna specifi­kacija, u literaturi je napravljen pomak definisanjem mogućih sigurnosnih problema inteligentnih agenata, njhovom matematičkom formalizacijom i predlozima rešenja. Fokus ovog rada je testiranje algoritama učenja uslovljavanjem na problem samo-modifikacije. U radu AI Safety Gridworlds definisano je okruženje „Viski i Zlatnik“ koje testira inteligentne agente na ovaj problem. Na ovom okruženju testirani su DQN, A2C i diskretna modifikacija SAC algoritma. Agenti trenirani DQN-om i SAC-om poštuju sigurnosni mehanizam, dok agent treniran A2C-om nije uspeo da ga nauči. Kako prilikom procesa treniranja agent može u nekom trenutku da divergira od rešenja, da bi se dobio agent koji poštuje sigurnosni mehanizam samo-modifikacije potrebno je pratiti proces treniranja i zaustaviti ga u pravom trenutku.

Reference

[1] LEIKE, Jan, et al. AI safety gridworlds. arXiv preprint arXiv:1711.09883, 2017.
[2] MAEI, Hamid Reza, et al. Toward off-policy learning control with function approximation. In: ICML. 2010.
[3] MNIH, Volodymyr, et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[4] HAARNOJA, Tuomas, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.
[5] CHRISTODOULOU, Petros. Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207, 2019.
[6] MNIH, Volodymyr, et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. 2016. p. 1928-1937.
[7] AMODEI, Dario, et al. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
[8] BRUNDAGE, Miles. Taking superintelligence seriously: Superintelligence: Paths, dangers, strategies by Nick Bostrom (Oxford University Press, 2014). Futures, 2015, 72: 32-35.
[9] HIBBARD, Bill. Model-based utility functions. Journal of Artificial General Intelligence, 2012, 3.1: 1-24.
[10] ORSEAU, Laurent; ARMSTRONG, M. S. Safely interruptible agents. 2016.
[11] RING, Mark; ORSEAU, Laurent. Delusion, survival, and intelligent agents. In: International Conference on Artificial General Intelligence. Springer, Berlin, Heidelberg, 2011. p. 11-20.
[12] EVERITT, Tom, et al. Reinforcement learning with a corrupted reward channel. arXiv preprint arXiv:1705.08417, 2017.
[13] ORSEAU, Laurent; RING, Mark. Self-modification and mortality in artificial agents. In: International Conference on Artificial General Intelligence. Springer, Berlin, Heidelberg, 2011. p. 1-10.
[14] HERNÁNDEZ-ORALLO, José, et al. Surveying Safety-relevant AI characteristics. In: AAAI Workshop on Artificial Intelligence Safety (SafeAI 2019). CEUR Workshop Proceedings, 2019. p. 1-9.
[15] HUTTER, Marcus. Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media, 2004.
[16] PUTERMAN, Martin L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
[17] GHAHRAMANI, Zoubin. Learning dynamic Bayesian networks. In: International School on Neural Networks, Initiated by IIASS and EMFCSC. Springer, Berlin, Heidelberg, 1997. p. 168-197.
[18] IOFFE, Sergey; SZEGEDY, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[19] HAHNLOSER, Richard HR, et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 2000, 405.6789: 947-951.
Objavljeno
2020-12-23
Sekcija
Elektrotehničko i računarsko inženjerstvo