TY - GEN AU - Mollberg, David Erik TI - The use of subwords for Automatic Speech Recognition PY - 2021 UR - https://m2.mtmt.hu/api/publication/32087221 ID - 32087221 LA - English DB - MTMT ER - TY - JOUR AU - Smit, Peter AU - Virpioja, Sami AU - Kurimo, Mikko TI - Advances in subword-based HMM-DNN speech recognition across languages JF - COMPUTER SPEECH AND LANGUAGE J2 - COMPUT SPEECH LANG VL - 66 PY - 2021 SN - 0885-2308 DO - 10.1016/j.csl.2020.101158 UR - https://m2.mtmt.hu/api/publication/31992793 ID - 31992793 LA - English DB - MTMT ER - TY - JOUR AU - Varjokallio, Matti AU - Virpioja, Sami AU - Kurimo, Mikko TI - Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian JF - COMPUTER SPEECH AND LANGUAGE J2 - COMPUT SPEECH LANG VL - 66 PY - 2021 SN - 0885-2308 DO - 10.1016/j.csl.2020.101141 UR - https://m2.mtmt.hu/api/publication/31992800 ID - 31992800 LA - English DB - MTMT ER - TY - CONF AU - Leinonen, Juho AU - Smit, Peter AU - Virpioja, Sami AU - Kurimo, Mikko ED - Tommi, A Pirinen ED - Michael, Riessler ED - Jack, Rueter ED - Trond, Trosterud ED - Francis, M Tyers TI - New Baseline in Automatic Speech Recognition for Northern Sámi T2 - Proceedings of the Fourth International Workshop on Computatinal Linguistics of Uralic Languages PB - Association for Computational Linguistics (ACL) PY - 2018 SP - 87 EP - 97 PG - 11 UR - https://m2.mtmt.hu/api/publication/27393011 ID - 27393011 LA - English DB - MTMT ER - TY - CHAP AU - Varjokallio, Matti AU - Virpioja, Sami AU - Kurimo, Mikko ED - Stylianou, Yannis TI - First-Pass Techniques for Very Large Vocabulary Speech Recognition ff Morphologically Rich Languages T2 - IEEE SLT 2018 PB - IEEE CY - New York, New York SN - 9781538643341 PY - 2018 SP - 227 EP - 234 PG - 8 DO - 10.1109/SLT.2018.8639691 UR - https://m2.mtmt.hu/api/publication/31992792 ID - 31992792 N1 - Department of Signal Processing and Acoustics, Aalto University, Finland Utopia Analytics, Helsinki, Finland Cited By :3 Export Date: 9 February 2023 LA - English DB - MTMT ER - TY - CHAP AU - Smit, Peter AU - Virpioja, Sami AU - Kurimo, Mikko ED - Francisco, Lacerda ED - David, House ED - Matthias, Heldner ED - Joakim, Gustafsson ED - Sofia, Strömbergsson ED - Marcin, Wlodarczak TI - Improved subword modeling for WFST-based speech recognition T2 - Proceedings of Interspeech PB - Causal Productions T3 - INTERSPEECH, ISSN 2308-457X ; 2017. PY - 2017 SP - 2551 EP - 2555 PG - 5 UR - https://m2.mtmt.hu/api/publication/27392949 ID - 27392949 LA - English DB - MTMT ER - TY - CHAP AU - Szaszák, György AU - Tündik, Máté Ákos AU - Beke, András ED - Ana, Fred ED - Jan, Dietz ED - David, Aveiro ED - Kecheng, Liu ED - Jorge, Bernardino ED - Joaquim, Filipe TI - Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer T2 - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management PB - SciTePress CY - Setubal SN - 9789897582035 PY - 2016 SP - 221 EP - 227 PG - 7 DO - 10.5220/0006044802210227 UR - https://m2.mtmt.hu/api/publication/3146183 ID - 3146183 N1 - Dept. of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 2 Magyar tudosok krt., Budapest, 1117, Hungary Dept. of Phonetics, Research Institute for Linguistics of the Hungarian Academy of Sciences, 33 Bencźur utca, Budapest, 1068, Hungary Cited By :4 Export Date: 9 February 2023 LA - English DB - MTMT ER - TY - CHAP AU - Tarján, Balázs AU - Varga, Ádám AU - Tobler, Zoltán AU - Szaszák, György AU - Fegyó, Tibor AU - Bordás, Csaba AU - Mihajlik, Péter ED - Tanács, Attila ED - Varga, Viktor ED - Vincze, Veronika TI - Magyar nyelvű, élő közéleti- és hírműsorok gépi feliratozása T2 - XII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2016 PB - Szegedi Tudományegyetem, Informatikai Intézet CY - Szeged SN - 9789633064504 PY - 2016 SP - 89 EP - 99 PG - 11 UR - https://m2.mtmt.hu/api/publication/3000304 ID - 3000304 LA - Hungarian DB - MTMT ER - TY - CHAP AU - Tündik, Máté Ákos AU - Szaszák, György ED - Tanács, Attila ED - Varga, Viktor ED - Vincze, Veronika TI - Szöveg alapú nyelvi elemző kiértékelése gépi beszédfelismerő hibákkal terhelt kimenetén T2 - XII. Magyar Számítógépes Nyelvészeti Konferencia : MSZNY 2016 PB - Szegedi Tudományegyetem, Informatikai Intézet CY - Szeged SN - 9789633064504 PY - 2016 SP - 111 EP - 121 PG - 11 UR - https://m2.mtmt.hu/api/publication/2996839 ID - 2996839 LA - Hungarian DB - MTMT ER - TY - CHAP AU - Varjokallio, Matti AU - Kurimo, Mikko AU - Virpioja, Sami ED - Pavel, Král ED - Carlos, Martín-Vide TI - Class n-gram models for very large vocabulary speech recognition of Finnish and Estonian T2 - Statistical Language and Speech Processing PB - Springer Netherlands CY - Cham SN - 9783319459257 T3 - Lecture Notes in Computer Science, ISSN 0302-9743 ; 9918. PY - 2016 SP - 133 EP - 144 PG - 12 DO - 10.1007/978-3-319-45925-7_11 UR - https://m2.mtmt.hu/api/publication/27392952 ID - 27392952 AB - We study class n-gram models for very large vocabulary speech recognition of Finnish and Estonian. The models are trained with vocabulary sizes of several millions of words using automatically derived classes. To evaluate the models on Finnish and an Estonian broadcast news speech recognition task, we modify Aalto University’s LVCSR decoder to operate with the class n-grams and very large vocabularies. Linear interpolation of a standard n-gram model and a class n-gram model provides relative perplexity improvements of 21.3 % for Finnish and 12.8 % for Estonian over the n-gram model. The relative improvements in word error rates are 5.5 % for Finnish and 7.4 % for Estonian. We also compare our word-based models to a state-of-the-art unlimited vocabulary recognizer utilizing subword n-gram models, and show that the very large vocabulary word-based models can perform equally well or better. LA - English DB - MTMT ER - TY - JOUR AU - Varga, A AU - Tarján, Balázs AU - Tobler, Z AU - Szaszák, György AU - Fegyó, Tibor AU - Bordas, C AU - Mihajlik, Péter TI - Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach JF - LECTURE NOTES IN ARTIFICIAL INTELLIGENCE J2 - LECT NOTES ARTIF INT VL - 9319 PY - 2015 SP - 105 EP - 112 PG - 8 SN - 0302-9743 DO - 10.1007/978-3-319-23132-7_13 UR - https://m2.mtmt.hu/api/publication/2995572 ID - 2995572 AB - In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during decoding with various models tailored for Hungarian broadcast speech recognition. Two decoders are compared on the direct transcription task of broadcast conversation recordings, and setups employing re-speakers are also tested. Moreover, the models are evaluated on a broadcast news transcription task as well, and different language models (LMs) are tested in order to demonstrate the performance of our systems in settings when low memory consumption is a less crucial factor. LA - English DB - MTMT ER -