AUTOMATSKO ODREĐIVANJE TEMA KNJIGA POMOĆU TEHNIKA ZA PROCESIRANJE PRIRODNOG JEZIKA
Ključne reči:
Latent Dirichlet Allocation, Named Entity Recognition
Apstrakt
Ovaj rad bavi se analizom performansi LDA modela kreiranog sa ciljem određivanja tema koje se pojavljuju u nekom korpusu knjiga. Opisan je skup podataka sa kojim se radi kao i svi problemi koji se javljaju prilikom implementacije ovakvog modela. Detaljno su analizirana četiri glavna koraka kreiranja modela, pretpocesiranje podataka, NER metoda, određivanje optimalnog broja tema i izbor konkretnog algoritma za implementaciju. Za svaki od koraka su demonstrirani različiti pristupi rešavanju problema koji se javljaju. Izvršena je evaluacija rezultata za svaki od ovih pristupa nakon čega je odabran optimalan pristup sa ciljem da čini sastavni deo krajnjeg modela.
Reference
[1] O. Hrnjaković, V. Đurđević, D. Bujiša, Predikcija popularnosti knjiga, Fakultet tehničkih nauka, Novi Sad, 2019
[2] Goodreads. (2018). [online] Dostupno na: https://www.goodreads.com/
[3] J. Millar, G. Peterson, M. Mendenhall, Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps, Air Force Institute of Technology, 2009
[4] S. Crossley, M. Dascalau, D. McNamara, How Important Is Size? An Investigation of Corpus Size and Meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation
[5] D. Alvarez-Melis, M. Saveski, Topic Modeling in Twitter: Aggregating Tweets by Conversations, Massachusetts Institute of Technology, Cambridge, MA, USA, 2016
[6] W. Zhao, J. Chen, R. Perkins, Z. Liu, W. Ge, Y. Ding, W. Zou, A heuristic approach to determine an appropriate number of topics in topic modeling, 2015
[7] J. Murdock, C. Allen, Visualization Techniques for Topic Model Checking, Program in Cognitive Science, Indiana University, USA
[8] M. Roder, A. Both, A. Hinneburg, Exploring the Space of Topic Coherence Measures, Leipzig University, R&D, Unister GmbH, Martin-Luther University, Germany
[2] Goodreads. (2018). [online] Dostupno na: https://www.goodreads.com/
[3] J. Millar, G. Peterson, M. Mendenhall, Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps, Air Force Institute of Technology, 2009
[4] S. Crossley, M. Dascalau, D. McNamara, How Important Is Size? An Investigation of Corpus Size and Meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation
[5] D. Alvarez-Melis, M. Saveski, Topic Modeling in Twitter: Aggregating Tweets by Conversations, Massachusetts Institute of Technology, Cambridge, MA, USA, 2016
[6] W. Zhao, J. Chen, R. Perkins, Z. Liu, W. Ge, Y. Ding, W. Zou, A heuristic approach to determine an appropriate number of topics in topic modeling, 2015
[7] J. Murdock, C. Allen, Visualization Techniques for Topic Model Checking, Program in Cognitive Science, Indiana University, USA
[8] M. Roder, A. Both, A. Hinneburg, Exploring the Space of Topic Coherence Measures, Leipzig University, R&D, Unister GmbH, Martin-Luther University, Germany
Objavljeno
2019-12-30
Sekcija
Elektrotehničko i računarsko inženjerstvo