AUTOMATSKA REKONSTRUKCIJA DIJAKRITIČKIH ZNAKOVA U TEKSTOVIMA NA SRPSKOM JEZIKU PRIMENOM MAŠINSKOG UČENJA
Ključne reči:
Rekonstrukcija dijakritičkih znakova, Mašinsko učenje, Klasifikacija
Apstrakt
Rad sadrži opis problema rekonstrukcije dijakritičkih znakova u tekstovima na srpskom jeziku. Problem je predstavljen kao klasifikacioni i predstavljene su tri metode mašinskog učenja pomoću kojih su dobijeni rezultati: neuronske mreže sa propagacijom unapred, mreže sa dugom kratkotrajnom memorijom i konvolucione neuronske mreže. Metode su poređene po merama klasifikatora i za najbolji od klasifikatora dat je prikaz rezultata, odnosno primeri kako je redijakritizacija izvršena.
Reference
[1] Nikola Ljubešić, Tomaž Erjavec, Darja Fišer, „Corpus-Based Diacritic Restoration for South Slavic Languages“
[2] Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič, “Diacritics Restoration Using Neural Networks”
[3] Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, and Mahmoud Al-Ayyoub, “Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation”, Proceedings of the 6th Workshop on Asian Translation, pages 215–225 Hong Kong, China, November 4, 2019. Association for Computational Linguistics
[4] Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu: Romanian Diacritics Restoration Using Recurrent Neural Networks. Septembar 2020.
[5] https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/, 17.08.2022.
[6] Tijana Nosek, Branko Brkljač, Danica Despotović, Milan Sečujski, Tatjana Lončar-Turukalo: Praktikum iz mašinskog učenja, Fakultet Tehničkih Nauka, Univerzitet u Novom Sadu
[7] https://en.wikipedia.org/wiki/Feedforward_neural_network, 17.10.2022.
[8] https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9, 17.10.2022
[9] https://colah.github.io/posts/2015-08-Understanding-LSTMs/, 17.10.2022.
[10] https://colah.github.io/posts/2014-07-Conv-Nets-Modular/, 18.10.2022.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition”, “Proc. Of the IEEE”, Novembar 1998
[12] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, International Journal of Computer Vision, 2019.
[2] Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič, “Diacritics Restoration Using Neural Networks”
[3] Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, and Mahmoud Al-Ayyoub, “Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation”, Proceedings of the 6th Workshop on Asian Translation, pages 215–225 Hong Kong, China, November 4, 2019. Association for Computational Linguistics
[4] Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu: Romanian Diacritics Restoration Using Recurrent Neural Networks. Septembar 2020.
[5] https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/, 17.08.2022.
[6] Tijana Nosek, Branko Brkljač, Danica Despotović, Milan Sečujski, Tatjana Lončar-Turukalo: Praktikum iz mašinskog učenja, Fakultet Tehničkih Nauka, Univerzitet u Novom Sadu
[7] https://en.wikipedia.org/wiki/Feedforward_neural_network, 17.10.2022.
[8] https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9, 17.10.2022
[9] https://colah.github.io/posts/2015-08-Understanding-LSTMs/, 17.10.2022.
[10] https://colah.github.io/posts/2014-07-Conv-Nets-Modular/, 18.10.2022.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition”, “Proc. Of the IEEE”, Novembar 1998
[12] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, International Journal of Computer Vision, 2019.
Objavljeno
2023-01-08
Sekcija
Elektrotehničko i računarsko inženjerstvo