TY - CHAP AU - Ligeti-Nagy, Noémi AU - Ferenczi, Gergő AU - Héja, Enikő AU - Jelencsik-Mátyus, Kinga AU - Laki, László János AU - Vadász, Noémi AU - Yang, Zijian Győző AU - Váradi, Tamás ED - Berend, Gábor ED - Gosztolya, Gábor ED - Vincze, Veronika TI - HuLU: magyar nyelvű benchmark adatbázis kiépítése a neurális nyelvmodellek kiértékelése céljából T2 - XVIII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2022 PB - Szegedi Tudományegyetem, Informatikai Intézet CY - Szeged SN - 9789633068489 T3 - MSZNY ; 18.. PY - 2022 SP - 431 EP - 446 PG - 16 UR - https://m2.mtmt.hu/api/publication/32644299 ID - 32644299 LA - Hungarian DB - MTMT ER - TY - CONF AU - Yang, Zijian Győző AU - Novák, Attila AU - Laki, László János ED - Kovásznai, Gergely ED - Fazekas, István ED - Tómács, Tibor TI - Automatic Tag Recommendation for News Articles T2 - Proceedings of the 11th International Conference on Applied Informatics (ICAI 2020) PB - CEUR Workshop Proceedings C1 - Eger T3 - CEUR Workshop Proceedings, ISSN 1613-0073 ; 2650. PY - 2020 SP - 442 EP - 451 PG - 10 UR - https://m2.mtmt.hu/api/publication/31436461 ID - 31436461 N1 - Scopus:hiba:85090846104 2022-10-12 17:49 típus nem egyezik AB - In this paper, we present an automatic neural tag recommendation system for Hungarian news articles and the results of our experiments concerning the effect of preprocessing applied to the texts and various parameter settings. A novelty of the approach is a combination of subword tokenization with character-n-gram-based representations, which resulted in high gains in recall. The best system yields 76% precision at 58% recall. Subjective performance is higher, because suggested labels missing from the reference often fit the document well or are similar to missing reference labels. We also created an online GUI for the tag recommendation system that makes it possible for the user to interactively set threshold parameters facilitating customization of precision and recall. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). LA - English DB - MTMT ER - TY - JOUR AU - Novák, Borbála AU - Novák, Attila AU - Prószéky, Gábor TI - Context-aware correction of spelling errors in Hungarian medical documents JF - COMPUTER SPEECH AND LANGUAGE J2 - COMPUT SPEECH LANG VL - 35 PY - 2016 SP - 219 EP - 233 PG - 15 SN - 0885-2308 DO - 10.1016/j.csl.2014.09.001 UR - https://m2.mtmt.hu/api/publication/2844131 ID - 2844131 AB - Abstract Owing to the growing need of acquiring medical data from clinical records, processing such documents is an important topic in natural language processing (NLP). However, for general NLP methods to work, a proper, normalized input is required. Otherwise the system is overwhelmed by the unusually high amount of noise generally characteristic of this kind of text. The different types of this noise originate from non-standard language use: short fragments instead of proper sentences, usage of Latin words, many acronyms and very frequent misspellings. In this paper, a method is described for the automated correction of spelling errors in Hungarian clinical records. First, a word-based algorithm was implemented to generate a ranked list of correction candidates for word forms regarded as incorrect. Second, the problem of spelling correction was modelled as a translation task, where the source language is the erroneous text and the target language is the corrected one. A Statistical Machine Translation (SMT) decoder performed the task of error correction. Since no orthographically correct proofread text from this domain is available, we could not use such a corpus for training the system. Instead, the word-based system was used to create translation models. In addition, a 3-gram token-based language model was used to model lexical context. Due to the high number of abbreviations and acronyms in the texts, the behaviour of these abbreviated forms was further examined both in the case of the context-unaware word-based and the SMT-decoder-based implementations. The results show that the SMT-based method outperforms the first candidate accuracy of the word-based ranking system. However, the normalization of abbreviations should be handled as a separate task. LA - English DB - MTMT ER - TY - CHAP AU - Miháltz, Márton AU - Indig, Balázs AU - Prószéky, Gábor ED - Tanács, Attila ED - Varga, Viktor ED - Vincze, Veronika TI - Igei vonzatkeretek és tematikus szerepek felismerése nyelvi erőforrások összekapcsolásával egy kereslet-kínálat elvű szövegelemzőben T2 - XI. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2015 PB - Szegedi Tudományegyetem Informatikai Tanszékcsoport CY - Szeged SN - 9789633063590 PY - 2015 SP - 298 EP - 302 PG - 4 UR - https://m2.mtmt.hu/api/publication/2811567 ID - 2811567 LA - Hungarian DB - MTMT ER - TY - CHAP AU - Orosz, György AU - Jelencsik-Mátyus, Kinga ED - Petr, Sojka ED - Aleš, Horák ED - Ivan, Kopeček ED - Karel, Pala TI - An MLU estimation method for Hungarian transcripts T2 - Text, Speech, and Dialogue PB - Springer Netherlands CY - Cham SN - 9783319108162 T3 - Lecture Notes in Computer Science, ISSN 0302-9743 ; 8655. PY - 2014 SP - 173 EP - 180 PG - 8 DO - 10.1007/978-3-319-10816-2_22 UR - https://m2.mtmt.hu/api/publication/2847504 ID - 2847504 AB - Mean length of utterance (MLU) is an important indicator for measuring complexity in child language. A generally employed method for calculating MLU is to use the CLAN toolkit, which includes modules that enable the measurement of utterance length in morphemes. However, these methods are based on rules which are only available for just a few languages not involving Hungarian. Therefore, in order to automatically analyze and measure Hungarian transcripts adequate methods need to be developed. In this paper we describe a new toolkit which is able to estimate MLU counts (in morphemes) while providing morphosyntactic tagging as well. Its components are based on existing resources; however, many of them were adapted to the language of the transcripts. The tool-chain performs the annotation task with a high pre cision and its MLU estimates are correlated with that of human experts. LA - English DB - MTMT ER - TY - CHAP AU - Orosz, György AU - Prószéky, Gábor ED - Tanács, Attila ED - Varga, Viktor ED - Vincze, Veronika TI - Hol a határ? Mondatok, szavak, klinikák T2 - X. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2014 PB - Szegedi Tudományegyetem Informatikai Tanszékcsoport CY - Szeged SN - 9789633062463 PY - 2014 SP - 177 EP - 187 PG - 11 UR - https://m2.mtmt.hu/api/publication/2847502 ID - 2847502 LA - Hungarian DB - MTMT ER - TY - CHAP AU - Novák, Borbála AU - Novák, Attila ED - Besacier, L ED - Dediu, A-H ED - Martín-Vide, C TI - Identifying and clustering relevant terms in clinical records using unsupervised methods T2 - Statistical Language and Speech Processing PB - Springer Netherlands CY - Cham SN - 9783319113975 T3 - Lecture Notes in Computer Science, ISSN 0302-9743 ; 8791. PY - 2014 SP - 233 EP - 243 PG - 11 DO - 10.1007/978-3-319-11397-5_18 UR - https://m2.mtmt.hu/api/publication/2843822 ID - 2843822 LA - English DB - MTMT ER - TY - CHAP AU - Novák, Attila ED - Calzolari, Nicoletta ED - Khalid, Choukri ED - Thierry, Declerck ED - Hrafn, Loftsson ED - Bente, Maegaard ED - Joseph, Mariani ED - Asunción, Moreno ED - Jan, Odijk ED - Stelios, Piperidis TI - A New Form of Humor – Mapping Constraint-Based Computational Morphologies to a Finite-State Representation T2 - LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION PB - European Language Resources Association (ELRA) CY - Lisszabon SN - 9782951740884 PY - 2014 SP - 1068 EP - 1073 PG - 6 UR - https://m2.mtmt.hu/api/publication/2843807 ID - 2843807 N1 - LA - English DB - MTMT ER - TY - CHAP AU - Laki, László János AU - Orosz, György ED - Calzolari, Nicoletta ED - Khalid, Choukri ED - Thierry, Declerck ED - Hrafn, Loftsson ED - Bente, Maegaard ED - Joseph, Mariani ED - Asunción, Moreno ED - Jan, Odijk ED - Stelios, Piperidis TI - An Efficient Language Independent Toolkit for Complete Morphological Disambiguation T2 - LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION PB - European Language Resources Association (ELRA) CY - Lisszabon SN - 9782951740884 PY - 2014 SP - 1625 EP - 1630 PG - 6 UR - https://m2.mtmt.hu/api/publication/2824578 ID - 2824578 LA - English DB - MTMT ER - TY - CHAP AU - Prószéky, Gábor ED - Laczkó, Krisztina ED - Tátrai, Szilárd TI - A számítógépes nyelvészet hatása a nyelvleírásra T2 - Elmélet és módszer PB - ELTE Eötvös József Collegium CY - Budapest SN - 9786155371219 PY - 2014 SP - 315 EP - 322 PG - 8 UR - https://m2.mtmt.hu/api/publication/2797769 ID - 2797769 LA - Hungarian DB - MTMT ER -