TY - CHAP AU - Mihajlik, Péter AU - Balog, András AU - Gráczi, Tekla Etelka AU - Kohári, Anna AU - Tarján, Balázs AU - Mády, Katalin ED - Nicoletta, Calzolari ED - Frédéric, Béchet ED - Philippe, Blache ED - Khalid, Choukri ED - Chritopher, Cieri ED - Thierry, Declerk ED - Sara, Goggi ED - Hitoshi, Isahara ED - Bente, Maegaard ED - Joseph, Mariani ED - Hélene, Mazo ED - Jan, Odijk ED - Stelios, Piperidis TI - BEA-Base: A Benchmark for ASR of Spontaneous Hungarian T2 - LREC 2022, Thirteeth International Conference on Language Resources and Evaluation PB - European Language Resources Association (ELRA) CY - Paris SN - 9791095546726 PY - 2022 SP - 1970 EP - 1977 PG - 8 UR - https://m2.mtmt.hu/api/publication/33437265 ID - 33437265 N1 - Hungarian Research Centre for Linguistics, Benczúr u. 33, Budapest, 1068, Hungary Budapest University of Technology and Economics, Műegyetem rakpart 3, Budapest, 1111, Hungary SpeechTex Inc., Madách Imre utca 47, Budapest, 1181, Hungary LA - English DB - MTMT ER - TY - CHAP AU - Mihajlik, Péter AU - Gráczi, Tekla Etelka AU - Kohári, Anna AU - Tarján, Balázs AU - Balog, András AU - Mády, Katalin ED - Mády, Katalin ED - Markó, Alexandra TI - A BEA továbbfejlesztése és alkalmazása kontrasztív gépi beszédfelismerési kísérletekre T2 - Általános nyelvészeti tanulmányok 34. PB - Akadémiai Kiadó CY - Budapest SN - 9789634548553 PY - 2022 SP - 361 EP - 380 PG - 20 UR - https://m2.mtmt.hu/api/publication/33283256 ID - 33283256 LA - Hungarian DB - MTMT ER - TY - JOUR AU - Tarján, Balázs AU - Fegyó, Tibor AU - Mihajlik, Péter TI - Morphology aware data augmentation with neural language models for online hybrid ASR JF - ACTA LINGUISTICA ACADEMICA J2 - ACTA LING ACAD VL - 69 PY - 2022 IS - 4 SP - 581 EP - 598 PG - 18 SN - 2559-8201 DO - 10.1556/2062.2022.00582 UR - https://m2.mtmt.hu/api/publication/33267111 ID - 33267111 N1 - Correspondence Address: Tarján, B.; Department of Telecommunications and Media Informatics, Hungary; email: tarjanb@tmit.bme.hu AB - Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Neural Network Language Models (NNLMs) can provide remedy for the high perplexity of the task; however, their high complexity makes them very difficult to apply in the first (single) pass of an online system. Recent studies showed that a considerable part of the knowledge of NNLMs can be transferred to traditional n-grams by using neural text generation based data augmentation. Data augmentation with NNLMs works well for isolating languages; however, we show that it causes a vocabulary explosion in a morphologically rich language. Therefore, we propose a new, morphology aware neural text augmentation method, where we retokenize the generated text into statistically derived subwords. We compare the performance of word-based and subword-based data augmentation techniques with recurrent and Transformer language models and show that subword-based methods can significantly improve the Word Error Rate (WER) while greatly reducing vocabulary size and memory requirements. Combining subword-based modeling and neural language model-based data augmentation, we were able to achieve 11% relative WER reduction and preserve real-time operation of our conversational telephone speech recognition system. Finally, we also demonstrate that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of Out-of-Vocabulary (OOV) words. LA - English DB - MTMT ER - TY - THES AU - Tarján, Balázs TI - Language Modeling for Hungarian Speech Recognition PB - Budapesti Műszaki és Gazdaságtudományi Egyetem PY - 2021 SP - 123 UR - https://m2.mtmt.hu/api/publication/32498304 ID - 32498304 LA - English DB - MTMT ER - TY - CHAP AU - Mihajlik, Péter AU - Balog, András AU - Tarján, Balázs AU - Fegyó, Tibor ED - Berend, Gábor ED - Gosztolya, Gábor ED - Vincze, Veronika TI - End-to-end és hibrid mélyneuronháló alapú gépi leiratozás magyar nyelvű telefonos ügyfélszolgálati beszélgetésekre T2 - XVII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2021 PB - Szegedi Tudományegyetem, Informatikai Intézet CY - Szeged SN - 9789633067819 PY - 2021 SP - 139 EP - 145 PG - 7 UR - https://m2.mtmt.hu/api/publication/31881360 ID - 31881360 LA - Hungarian DB - MTMT ER - TY - GEN AU - Tarján, Balázs AU - Szaszák, György AU - Fegyó, Tibor AU - Mihajlik, Péter TI - Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR PY - 2020 UR - https://m2.mtmt.hu/api/publication/31855595 ID - 31855595 LA - English DB - MTMT ER - TY - CHAP AU - Tarján, Balázs AU - Szaszák, György AU - Fegyó, Tibor AU - Mihajlik, Péter TI - Improving Real-time Recognition of Morphologically Rich Speech with Transformer Language Model T2 - 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2020) PB - IEEE CY - New York, New York SN - 9781728182148 T3 - International Conference on Cognitive Infocommunications, ISSN 2375-1312 PY - 2020 SP - 491 EP - 496 PG - 6 DO - 10.1109/CogInfoCom50765.2020.9237817 UR - https://m2.mtmt.hu/api/publication/31621427 ID - 31621427 N1 - IEEE Computational Intelligence Chapter; IEEE Finland Section; IEEE Hungary Section; IEEE IES and RAS Chapters; IEEE Systems, Man and Cybernetics Chapter Budapest University of Technology and Economics, Department of Telecommunications and Media Informatics, Budapest, Hungary SpeechTex Ltd., Budapest, Hungary THINKTech Research Center, Vác, Hungary Conference code: 164650 Export Date: 25 October 2022 Correspondence Address: Tarjan, B.; Budapest University of Technology and Economics, Hungary; email: tarjanb@tmit.bme.hu Correspondence Address: Szaszak, G.; Budapest University of Technology and Economics, Hungary; email: szaszak@tmit.bme.hu Correspondence Address: Fegyo, T.; Budapest University of Technology and Economics, Hungary; email: fegyo@speechtex.com Correspondence Address: Mihajlik, P.; Budapest University of Technology and Economics, Hungary; email: mihajlik@tmit.bme.hu LA - English DB - MTMT ER - TY - CHAP AU - Tarján, Balázs AU - Szaszák, György AU - Fegyó, Tibor AU - Mihajlik, Péter ED - Sojka, Petr ED - Kopeček, Ivan ED - Pala, Karel ED - Horák, Aleš TI - On the Effectiveness of Neural Text Generation Based Data Augmentation for Recognition of Morphologically Rich Speech T2 - Text, Speech, and Dialogue: TSD 2020 PB - Springer Netherlands CY - Cham SN - 9783030583224 T3 - Lecture Notes in Computer Science, ISSN 0302-9743 ; 12284. PY - 2020 SP - 437 EP - 445 PG - 9 DO - 10.1007/978-3-030-58323-1_47 UR - https://m2.mtmt.hu/api/publication/31608551 ID - 31608551 N1 - Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary SpeechTex Ltd., Budapest, Hungary THINKTech Research Center, Vác, Hungary Cited By :1 Export Date: 15 February 2023 Correspondence Address: Tarján, B.; Department of Telecommunications and Media Informatics, Hungary; email: tarjanb@tmit.bme.hu LA - English DB - MTMT ER - TY - JOUR AU - Tündik, Máté Ákos AU - Tarján, Balázs AU - Szaszák, György TI - A low latency sequential model and its user -focused evaluation for automatic punctuation of ASR closed captions JF - COMPUTER SPEECH AND LANGUAGE J2 - COMPUT SPEECH LANG VL - 63 PY - 2020 PG - 19 SN - 0885-2308 DO - 10.1016/j.csl.2020.101076 UR - https://m2.mtmt.hu/api/publication/31483149 ID - 31483149 N1 - WoS:hiba:000534481900012 2021-10-06 23:34 cikkazonosító nem egyezik AB - In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed online RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests are carried out on real broadcast data subtitled with ASR (closed captioning). As it can be expected, the RNN outperforms the MaxEnt baseline system, but of course not the off-line systems: limiting the future context to minimize latency results only in a slighter performance drop, but ASR errors obviously influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance showing that both recognition and punctuation of more spontaneous speech styles is challenging. Overall, the subjective tests confirmed that users perceive a significant quality improvement when punctuation is added, even in presence of word errors and even if punctuation is automatic and hence itself may contain further errors. For users living with hearing loss or deafness, an even higher, clear preference for the punctuated captions could be confirmed. (c) 2020 Elsevier Ltd. All rights reserved. LA - English DB - MTMT ER - TY - CHAP AU - Tarján, Balázs AU - Szaszák, György AU - Fegyó, Tibor AU - Mihajlik, Péter TI - N-gram Approximation of LSTM Recurrent Language Models for Single-pass Recognition of Hungarian Call Center Conversations T2 - 10th IEEE International Conference on Cognitive Infocommunications, (CogInfoCom 2019) PB - IEEE CY - Piscataway (NJ) SN - 9781728147925 T3 - International Conference on Cognitive Infocommunications, ISSN 2375-1312 PY - 2019 SP - 131 EP - 136 PG - 6 DO - 10.1109/CogInfoCom47531.2019.9089959 UR - https://m2.mtmt.hu/api/publication/31640248 ID - 31640248 N1 - Conference code: 159695 Cited By :2 Export Date: 25 October 2022 LA - English DB - MTMT ER -