TY  - CHAP
AU  - Ligeti-Nagy, Noémi
AU  - Ferenczi, Gergő
AU  - Héja, Enikő
AU  - Jelencsik-Mátyus, Kinga
AU  - Laki, László János
AU  - Vadász, Noémi
AU  - Yang, Zijian Győző
AU  - Váradi, Tamás
ED  - Berend, Gábor
ED  - Gosztolya, Gábor
ED  - Vincze, Veronika
TI  - HuLU: magyar nyelvű benchmark adatbázis kiépítése a neurális nyelvmodellek kiértékelése céljából
T2  - XVIII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2022
PB  - Szegedi Tudományegyetem, Informatikai Intézet
CY  - Szeged
SN  - 9789633068489
T3  - MSZNY ; 18..
PY  - 2022
SP  - 431
EP  - 446
PG  - 16
UR  - https://m2.mtmt.hu/api/publication/32644299
ID  - 32644299
LA  - Hungarian
DB  - MTMT
ER  - 

TY  - CONF
AU  - Yang, Zijian Győző
AU  - Novák, Attila
AU  - Laki, László János
ED  - Kovásznai, Gergely
ED  - Fazekas, István
ED  - Tómács, Tibor
TI  - Automatic Tag Recommendation for News Articles
T2  - Proceedings of the 11th International Conference on Applied Informatics (ICAI 2020)
PB  - CEUR Workshop Proceedings
C1  - Eger
T3  - CEUR Workshop Proceedings, ISSN 1613-0073 ; 2650.
PY  - 2020
SP  - 442
EP  - 451
PG  - 10
UR  - https://m2.mtmt.hu/api/publication/31436461
ID  - 31436461
N1  - Scopus:hiba:85090846104 2022-10-12 17:49 típus nem egyezik
AB  - In this paper, we present an automatic neural tag recommendation system for Hungarian news articles and the results of our experiments concerning the effect of preprocessing applied to the texts and various parameter settings. A novelty of the approach is a combination of subword tokenization with character-n-gram-based representations, which resulted in high gains in recall. The best system yields 76% precision at 58% recall. Subjective performance is higher, because suggested labels missing from the reference often fit the document well or are similar to missing reference labels. We also created an online GUI for the tag recommendation system that makes it possible for the user to interactively set threshold parameters facilitating customization of precision and recall. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
LA  - English
DB  - MTMT
ER  - 

TY  - JOUR
AU  - Novák, Borbála
AU  - Novák, Attila
AU  - Prószéky, Gábor
TI  - Context-aware correction of spelling errors in Hungarian medical documents
JF  - COMPUTER SPEECH AND LANGUAGE
J2  - COMPUT SPEECH LANG
VL  - 35
PY  - 2016
SP  - 219
EP  - 233
PG  - 15
SN  - 0885-2308
DO  - 10.1016/j.csl.2014.09.001
UR  - https://m2.mtmt.hu/api/publication/2844131
ID  - 2844131
AB  - Abstract Owing to the growing need of acquiring medical data from clinical records, processing such documents is an important topic in natural language processing (NLP). However, for general NLP methods to work, a proper, normalized input is required. Otherwise the system is overwhelmed by the unusually high amount of noise generally characteristic of this kind of text. The different types of this noise originate from non-standard language use: short fragments instead of proper sentences, usage of Latin words, many acronyms and very frequent misspellings. In this paper, a method is described for the automated correction of spelling errors in Hungarian clinical records. First, a word-based algorithm was implemented to generate a ranked list of correction candidates for word forms regarded as incorrect. Second, the problem of spelling correction was modelled as a translation task, where the source language is the erroneous text and the target language is the corrected one. A Statistical Machine Translation (SMT) decoder performed the task of error correction. Since no orthographically correct proofread text from this domain is available, we could not use such a corpus for training the system. Instead, the word-based system was used to create translation models. In addition, a 3-gram token-based language model was used to model lexical context. Due to the high number of abbreviations and acronyms in the texts, the behaviour of these abbreviated forms was further examined both in the case of the context-unaware word-based and the SMT-decoder-based implementations. The results show that the SMT-based method outperforms the first candidate accuracy of the word-based ranking system. However, the normalization of abbreviations should be handled as a separate task.
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Miháltz, Márton
AU  - Indig, Balázs
AU  - Prószéky, Gábor
ED  - Tanács, Attila
ED  - Varga, Viktor
ED  - Vincze, Veronika
TI  - Igei vonzatkeretek és tematikus szerepek felismerése nyelvi erőforrások összekapcsolásával egy kereslet-kínálat elvű szövegelemzőben
T2  - XI. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2015
PB  - Szegedi Tudományegyetem Informatikai Tanszékcsoport
CY  - Szeged
SN  - 9789633063590
PY  - 2015
SP  - 298
EP  - 302
PG  - 4
UR  - https://m2.mtmt.hu/api/publication/2811567
ID  - 2811567
LA  - Hungarian
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Orosz, György
AU  - Jelencsik-Mátyus, Kinga
ED  - Petr, Sojka
ED  - Aleš, Horák
ED  - Ivan, Kopeček
ED  - Karel, Pala
TI  - An MLU estimation method for Hungarian transcripts
T2  - Text, Speech, and Dialogue
PB  - Springer Netherlands
CY  - Cham
SN  - 9783319108162
T3  - Lecture Notes in Computer Science, ISSN 0302-9743 ; 8655.
PY  - 2014
SP  - 173
EP  - 180
PG  - 8
DO  - 10.1007/978-3-319-10816-2_22
UR  - https://m2.mtmt.hu/api/publication/2847504
ID  - 2847504
AB  - Mean length of utterance (MLU) is an important indicator for measuring 
complexity in child language. A generally employed
method for calculating MLU is to use the CLAN toolkit, which includes
modules that enable the measurement of utterance length in morphemes.
However, these methods are based on rules which are only available
for just a few languages not involving Hungarian. Therefore, in order
to automatically analyze and measure Hungarian transcripts adequate
methods need to be developed. In this paper we describe a new toolkit
which is able to estimate MLU counts (in morphemes) while providing
morphosyntactic tagging as well. Its components are based on existing
resources; however, many of them were adapted to the language of the
transcripts. The tool-chain performs the annotation task with a high pre
cision and its MLU estimates are correlated with that of human experts.
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Orosz, György
AU  - Prószéky, Gábor
ED  - Tanács, Attila
ED  - Varga, Viktor
ED  - Vincze, Veronika
TI  - Hol a határ? Mondatok, szavak, klinikák
T2  - X. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2014
PB  - Szegedi Tudományegyetem Informatikai Tanszékcsoport
CY  - Szeged
SN  - 9789633062463
PY  - 2014
SP  - 177
EP  - 187
PG  - 11
UR  - https://m2.mtmt.hu/api/publication/2847502
ID  - 2847502
LA  - Hungarian
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Novák, Borbála
AU  - Novák, Attila
ED  - Besacier, L
ED  - Dediu, A-H
ED  - Martín-Vide, C
TI  - Identifying and clustering relevant terms in clinical records using unsupervised methods
T2  - Statistical Language and Speech Processing
PB  - Springer Netherlands
CY  - Cham
SN  - 9783319113975
T3  - Lecture Notes in Computer Science, ISSN 0302-9743 ; 8791.
PY  - 2014
SP  - 233
EP  - 243
PG  - 11
DO  - 10.1007/978-3-319-11397-5_18
UR  - https://m2.mtmt.hu/api/publication/2843822
ID  - 2843822
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Novák, Attila
ED  - Calzolari, Nicoletta
ED  - Khalid, Choukri
ED  - Thierry, Declerck
ED  - Hrafn, Loftsson
ED  - Bente, Maegaard
ED  - Joseph, Mariani
ED  - Asunción, Moreno
ED  - Jan, Odijk
ED  - Stelios, Piperidis
TI  - A New Form of Humor – Mapping Constraint-Based Computational Morphologies to a Finite-State Representation
T2  - LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
PB  - European Language Resources Association (ELRA)
CY  - Lisszabon
SN  - 9782951740884
PY  - 2014
SP  - 1068
EP  - 1073
PG  - 6
UR  - https://m2.mtmt.hu/api/publication/2843807
ID  - 2843807
N1  - 
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Laki, László János
AU  - Orosz, György
ED  - Calzolari, Nicoletta
ED  - Khalid, Choukri
ED  - Thierry, Declerck
ED  - Hrafn, Loftsson
ED  - Bente, Maegaard
ED  - Joseph, Mariani
ED  - Asunción, Moreno
ED  - Jan, Odijk
ED  - Stelios, Piperidis
TI  - An Efficient Language Independent Toolkit for Complete Morphological Disambiguation
T2  - LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
PB  - European Language Resources Association (ELRA)
CY  - Lisszabon
SN  - 9782951740884
PY  - 2014
SP  - 1625
EP  - 1630
PG  - 6
UR  - https://m2.mtmt.hu/api/publication/2824578
ID  - 2824578
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Prószéky, Gábor
ED  - Laczkó, Krisztina
ED  - Tátrai, Szilárd
TI  - A számítógépes nyelvészet hatása a nyelvleírásra
T2  - Elmélet és módszer
PB  - ELTE Eötvös József Collegium
CY  - Budapest
SN  - 9786155371219
PY  - 2014
SP  - 315
EP  - 322
PG  - 8
UR  - https://m2.mtmt.hu/api/publication/2797769
ID  - 2797769
LA  - Hungarian
DB  - MTMT
ER  -