https://oldena.lpnu.ua/handle/ntb/39455
Title: | Unsupervised acquisition of morphological resources for Ukrainian |
Authors: | Hamon, Thierry Grabar, Natalia |
Affiliation: | LIMSI-CNRS, Orsay, Université Paris 13, Sorbonne Paris Cité, France CNRS UMR 8163 STL, Université Lille 3, 59653 Villeneuve d'Ascq, France |
Bibliographic description (Ukraine): | Hamon T. Unsupervised acquisition of morphological resources for Ukrainian / Thierry Hamon, Natalia Grabar // Computational linguistics andintelligent systems (COLINS 2017) : proceedings of the 1st International conference, Kharkiv, Ukraine, 21 April 2017 / National Technical University «KhPI», Lviv Polytechnic National University. – Kharkiv, 2017. – P. 20–30. – Bibliography: 36 titles. |
Conference/Event: | Computational linguistics andintelligent systems (COLINS 2017) |
Issue Date: | 2017 |
Publisher: | National Technical University «KhPI» |
Country (code): | UA |
Place of the edition/event: | Kharkiv |
Number of pages: | 20-30 |
Abstract: | Availability of morphological resources is an important and recurrent need because they allow the development of NLP tools and applications for a given language. Indeed, such resources provide basic information which is necessary for such tools for performing more sophisticated treatments (information retrieval, morphosyntactic tagging, etc). We propose to acquire morphological resources for Ukrainian language. The method proposed exploits corpora in order to extract words that are related morphologically between them. The method has two versions: without and with processing of prefixes. The association strength between these words indicates their probability to have a morphological and semantic relation between them. We use three corpora (literary, medical and general-language) and evaluate the results obtained. According to the corpora, precision varies between 67% and 86%. The results from different corpora are also compared, which shows that there is little redundancy between the corpora. The currently available resource contains 3,315 fully validated pairs of words. |
URI: | https://ena.lpnu.ua/handle/ntb/39455 |
References (International): | 1. Abeera, V., Aparna, S., Rekha, R., Kumar, M., Dhanalakshmi, V., Soman, K., Rajendran, S.: Morphological analyzer for Malayalam using machine learning. Data Engineering and Management, LNCS 6411, 252--254 (2012) 2. van den Bosch, A., Daelemans, W., Weijters, T.: Morphological analysis as classification: an inductive-learning approach. In: International Conference on Computational Linguistics (COLING) (1996) 3. Bosch, S., Pretorius, L., Fleisch, A.: Experimental bootstrapping of morphological analysers for Nguni languages. Nordic Journal of African Studies 17(2), 66--88 (2008) 4. Burnage, G.: CELEX - A Guide for Users. Centre for Lexical Information, University of Nijmegen (1990) 5. Claveau, V., Kijak, E.: Generating and using probabilistic morphological resources for the biomedical domain. In: Proceedings of the Ninth International Conference on Language Re-sources and Evaluation (LREC'14). pp. 3348--3354 (2014) 6. Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A statistical parser for czech. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. pp. 505--512. Association for Computational Linguistics, College Park, Maryland, USA (June 1999), http://www.aclweb.org/anthology/P99-1065 7. Déjean, H.: Morphemes as necessary concept for structures discovery from untagged corpora. In: Workshop on Paradigms and Grounding in Natural Language Learning. pp. 295--299. Adelaide (1998) 8. Diestel, R.: Graph Theory. Springer-Verlag Heidelberg, New-York (2005) 9. Erjavec, T.: MULTEXT-East: Morphosyntactic resources for central and eastern european languages. Language Resources and Evaluation 46(1), 131--142 (2012) 10. Gaussier, E.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Kehler, A., Stolcke, A. (eds.) ACL workshop on Unsupervised Methods in Natural Lan-guage Learning. College Park, Md. (Jun 1999) 11. Grabar, N., Hamon, T.: Acquisition non supervisée de ressources morphologiques en ukrainien. In: Atelier Traitement Automatique des Langues Slaves (TASLA). pp. 1--10 (2015) 12. Grabar, N., Zweigenbaum, P.: Acquisition automatique de connaissances morphologiques sur le vocabulaire médical. In: Traitement Automatique de Langues Naturelles (TALN). pp. 175--184 (1999) 13. Hamon, T., Engström, C., Manser, M., Badji, Z., Grabar, N., Silvestrov, S.: Combining com-positionality and pagerank for the identification of semantic relations between biomedical words. In: BIONLP NAACL. pp. 109--117 (2012) 14. Hathout, N., Namer, F.: La base lexicale Démonette: entre sémantique constructionnelle et morphologie dérivationnelle. In: TALN. pp. 208--219 (2014) 15. Hathout, N.: Analogies morpho-syntaxiques. In: Traitement Automatique des Langues Naturelles (TALN). Tours (2001) 16. Katrenko, S., Adriaans, P.: Named entity recognition for Ukrainian: A resource-light approach. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing. pp. 88--93. Association for Computational Linguistics, Prague, Czech Republic (June 2007), http://www.aclweb.org/anthology/W/W07/W07-1712 17. Kostov, J.: Le verbe macédonien : pour un traitement informatique de nature linguistique et applications didactiques (réalisation d'un conjugueur). Thèse de doctorat, INaLCO, Paris, France (2013) 18. Kotsyba, N., Mykulyak, A., Shevchenko, I.V.: UGTag: morphological analyzer and tagger for the Ukrainian language. In: Proceedings of the international conference Practical Applications in Language and Computers (PALC 2009) (2009) 19. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. pp. 191--202 (1993) 20. Loukachevitch, N., Nokel, M.: An experimental study of term extraction for real information-retrieval thesauri. In: TIA. pp. 1--8 (2013) 21. Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge, MA (1999) 22. Miller, N., Lacroix, E., Backus, J.: MEDLINEplus: building and maintaining the national library of medicine's consumer health web service. Bull Med Libr Assoc 88(1), 11--7 (2000) 23. Namer, F.: Morphologie, Lexique et TAL : l'analyseur DériF. TIC et Sciences cognitives. Hermes Sciences Publishing, London (2009) 24. Pirrelli, V., Yvon, F.: The hidden dimension: a paradigmatic view of data-driven NLP. JETAI 11, 391--408 (1999) 25. Pretorius, L., Bosch, S.: Exploiting cross-linguistic similarities in Zulu and Xhosa computa-tional morphology. In: AFLAT. pp. 96--103 (2009) 26. Romanyshyn, M.: Rule-based sentiment analysis of ukrainian reviews. International Journal of Artificial Intelligence & Applications (IJAIA) (2013) 27. Sagot, B., Clément, L., Villemonte de la Clergerie, E., Boullier, P.: The Lefff 2 syntactic lexicon for french: architecture, acquisition, use. In: LREC (2006) 28. Schone, P., Jurafsky, D.: Knowledge-free induction of inflectional morphologies. In: Work-shop NA de ACL (2001) 29. Siruk, O., Derzhanski, I.: Linguistic corpora as international cultural heritage: The corpus of Bulgarian and Ukrainian parallel texts. Digital Presentation and Preservation of Cultural and Scientific Heritage 3, 91--98 (2013) 30. Theron, P., Cloete, I.: Automatic acquisition of two-level morphological rules. In: ANLP. pp. 103--110 (1997) 31. Turska, M., Kotsyba, N.: Polish-ukrainian parallel corpus and its possible applications. In: GmbH, P.L. (ed.) Practical Applications in Language and Computers. Lódź (April 2007) 32. Urrea, A.M.: Automatic discovery of affixes by means of a corpus : a catalog of Spanish affixes. Journal of quantitative linguistics 7(2), 97--114 (2000) 33. Xu, J., Croft, B.W.: Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems 16(1), 61--81 (1998) 34. Zanchetta, E., Baroni, M.: Morph-it! a free corpus-based morphological resource for the ital-ian language. Corpus Linguistics 2005 1(1) (2005) 35. Zweigenbaum, P., Hadouche, F., Grabar, N.: Apprentissage de relations morphologiques en corpus. In: Traitement Automatique des Langues Naturelles (TALN) (2003) 36. Клименко, Карпіловська, Карпіловський, Недозим, Словник Афіксальних Морфем Української Мови. Інститут Мовознавства ім. О.О. Потебні Національної Академії Наук України, Київ, Україна (1998). |
Content type: | Conference Abstract |
Appears in Collections: | Computational linguistics and intelligent systems. – 2017 р. |
File | Description | Size | Format | |
---|---|---|---|---|
005-020-030.pdf | 320.74 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.