TY - CHAP AU - Tóth, László AU - Honarmandi Shandiz, Amin AU - Gosztolya, Gábor AU - Csapó, Tamás Gábor TI - Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks T2 - Proceedings of the 24th International Speech Communication Association, Interspeech 2023 PB - International Speech Communication Association (ISCA) T3 - INTERSPEECH, ISSN 2308-457X ; 2023-August. PY - 2023 SP - 1169 EP - 1173 PG - 5 DO - 10.21437/Interspeech.2023-1607 UR - https://m2.mtmt.hu/api/publication/34067268 ID - 34067268 N1 - Institute of Informatics, University of Szeged, Hungary ELRN-SZTE Research Group on Artificial Intelligence, Szeged, Hungary Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary Export Date: 2 October 2023 AB - Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting the recording equipment. To aid quick speaker and session adaptation of ultrasound tongue imaging-based SSI models, we extend our deep networks with a spatial transformer network (STN) module, capable of performing an affine transformation on the input images. Although the STN part takes up only about 10% of the network, our experiments show that adapting just the STN module might allow to reduce MSE by 88% on the average, compared to retraining the whole network. The improvement is even larger (around 92%) when adapting the network to different recording sessions from the same speaker. © 2023 International Speech Communication Association. All rights reserved. LA - English DB - MTMT ER - TY - CHAP AU - Bóna, Judit AU - Gosztolya, Gábor AU - Hoffmann, Ildikó AU - Klivényi, Péter AU - Tóth, Alinka AU - Svindt, Veronika AU - Tóth, László AU - Lőrincz, András ED - Dobrić, Arnalda ED - Liker, Marko TI - Temporal variables of speech in Parkinson’s Disease in three spontaneous speaking tasks T2 - Book of Abstracts : The 11th scientific conference with international participation Speech Research, Faculty of Humanities and Social Sciences, Zagreb, Croatia, December 8 - 10 2022 PB - Hrvatsko filološko društvo CY - Zágráb SN - 9789532961935 PY - 2022 SP - 28 EP - 29 PG - 2 UR - https://m2.mtmt.hu/api/publication/33576741 ID - 33576741 LA - English DB - MTMT ER - TY - CHAP AU - Pap, Gergely AU - Ádám, Krisztián AU - Györgypál, Zoltán AU - Tóth, László AU - Hegedűs, Zoltán ED - Anon, A TI - Depthwise Convolutions using Physicochemical Features of DNA for Transcription Factor Binding Site Classification. Physicochemical Features for DNA-Protein Classification with Depthwise Convolutions TS - Physicochemical Features for DNA-Protein Classification with Depthwise Convolutions T2 - ICAAI '22: Proceedings of the 6th International Conference on Advances in Artificial Intelligence PB - Association for Computing Machinery (ACM) CY - New York, New York SN - 9781450396943 PY - 2022 SP - 15 EP - 21 PG - 7 DO - 10.1145/3571560.3571563 UR - https://m2.mtmt.hu/api/publication/33563542 ID - 33563542 LA - English DB - MTMT ER - TY - JOUR AU - Csapó, Tamás Gábor AU - Gosztolya, Gábor AU - Tóth, László AU - Honarmandi Shandiz, Amin AU - Markó, Alexandra TI - Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping JF - SENSORS J2 - SENSORS-BASEL VL - 22 PY - 2022 IS - 22 PG - 13 SN - 1424-8220 DO - 10.3390/s22228601 UR - https://m2.mtmt.hu/api/publication/33220025 ID - 33220025 N1 - Funding Agency and Grant Number: European Commission [20192.1.2-NEMZ-2020-00012]; National Research, Development and Innovation Office of Hungary [FK 142163]; Bolyai Janos Research Fellowship of the Hungarian Academy of Sciences; New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund [UNKP-22-5-BME-316]; Hungarian Ministry of Innovation and Technology NRDI Office [TKP2021-NVA-09]; Artificial Intelligence National Laboratory [RRF-2.3.1-21-2022-00004] Funding text: T.G. Csapo's research was partly supported by the APH-ALARM project (contract 20192.1.2-NEMZ-2020-00012) funded by the European Commission and the National Research, Development and Innovation Office of Hungary (FK 142163 grant), by the Bolyai Janos Research Fellowship of the Hungarian Academy of Sciences and the UNKP-22-5-BME-316 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund. The work of G. Gosztolya and L. Toth were also supported by the Hungarian Ministry of Innovation and Technology NRDI Office (grant TKP2021-NVA-09) and by the Artificial Intelligence National Laboratory (RRF-2.3.1-21-2022-00004). AB - Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a “silent speech interface”, which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions. LA - English DB - MTMT ER - TY - CHAP AU - Honarmandi Shandiz, Amin AU - Tóth, László ED - Fujita, Hamido ED - Fournier-Viger, Philippe ED - Ali, Moonis ED - Wang, Yinglin TI - Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks T2 - Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence PB - Springer Netherlands CY - Cham SN - 9783031085307 T3 - Lecture Notes in Computer Science, ISSN 0302-9743 PY - 2022 SP - 265 EP - 274 PG - 10 DO - 10.1007/978-3-031-08530-7_22 UR - https://m2.mtmt.hu/api/publication/33096595 ID - 33096595 LA - English DB - MTMT ER - TY - JOUR AU - Kálmán, János AU - Devanand, Davangere P. AU - Gosztolya, Gábor AU - Balogh, Réka AU - Imre, Nóra AU - Tóth, László AU - Hoffmann, Ildikó AU - Kovács, Ildikó AU - Vincze, Veronika AU - Pákáski, Magdolna TI - Temporal speech parameters detect mild cognitive impairment in different languages: validation and comparison of the Speech-GAP Test® in English and Hungarian JF - CURRENT ALZHEIMER RESEARCH J2 - CURR ALZHEIMER RES VL - 19 PY - 2022 IS - 5 SP - 373 EP - 386 PG - 14 SN - 1567-2050 DO - 10.2174/1567205019666220418155130 UR - https://m2.mtmt.hu/api/publication/32841964 ID - 32841964 LA - English DB - MTMT ER - TY - CHAP AU - Gosztolya, Gábor AU - Tóth, László AU - Svindt, Veronika AU - Bóna, Judit AU - Hoffmann, Ildikó ED - The Institute of Electrical, and Electronics Engineers TI - Using Acoustic Deep Neural Network Embeddings to Detect Multiple Sclerosis From Speech T2 - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) PB - IEEE CY - Piscataway (NJ) SN - 9781665405409 PY - 2022 SP - 6927 EP - 6931 PG - 5 DO - 10.1109/ICASSP43922.2022.9746856 UR - https://m2.mtmt.hu/api/publication/32800392 ID - 32800392 LA - English DB - MTMT ER - TY - CHAP AU - Kiss-Vetráb, Mercedes AU - José Vicente, Egas López AU - Balogh, Réka AU - Imre, Nóra AU - Hoffmann, Ildikó AU - Tóth, László AU - Pákáski, Magdolna AU - Kálmán, János AU - Gosztolya, Gábor ED - The Institute of Electrical, and Electronics Engineers TI - Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment T2 - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) PB - IEEE CY - Piscataway (NJ) SN - 9781665405409 PY - 2022 SP - 6467 EP - 6471 PG - 5 DO - 10.1109/ICASSP43922.2022.9746148 UR - https://m2.mtmt.hu/api/publication/32800358 ID - 32800358 LA - English DB - MTMT ER - TY - JOUR AU - José Vicente, Egas López AU - Balogh, Réka AU - Imre, Nóra AU - Hoffmann, Ildikó AU - Szabó, Martina Katalin AU - Tóth, László AU - Pákáski, Magdolna AU - Kálmán, János AU - Gosztolya, Gábor TI - Automatic screening of mild cognitive impairment and Alzheimer’s disease by means of posterior-thresholding hesitation representation JF - COMPUTER SPEECH AND LANGUAGE J2 - COMPUT SPEECH LANG VL - 75 PY - 2022 PG - 13 SN - 0885-2308 DO - 10.1016/j.csl.2022.101377 UR - https://m2.mtmt.hu/api/publication/32761562 ID - 32761562 LA - English DB - MTMT ER - TY - JOUR AU - Imre, Nóra AU - Balogh, Réka AU - Gosztolya, Gábor AU - Tóth, László AU - Hoffmann, Ildikó AU - Várkonyi, Tamás AU - Lengyel, Csaba Attila AU - Pákáski, Magdolna AU - Kálmán, János TI - Temporal Speech Parameters Indicate Early Cognitive Decline in Elderly Patients With Type 2 Diabetes Mellitus JF - ALZHEIMER DISEASE & ASSOCIATED DISORDERS J2 - ALZ DIS ASSOC DIS VL - 36 PY - 2022 IS - 2 SP - 148 EP - 155 PG - 8 SN - 0893-0341 DO - 10.1097/WAD.0000000000000492 UR - https://m2.mtmt.hu/api/publication/32749289 ID - 32749289 AB - The earliest signs of cognitive decline include deficits in temporal (time-based) speech characteristics. Type 2 diabetes mellitus (T2DM) patients are more prone to mild cognitive impairment (MCI). The aim of this study was to compare the temporal speech characteristics of elderly (above 50 y) T2DM patients with age-matched nondiabetic subjects.A total of 160 individuals were screened, 100 of whom were eligible (T2DM: n=51; nondiabetic: n=49). Participants were classified either as having healthy cognition (HC) or showing signs of MCI. Speech recordings were collected through a phone call. Based on automatic speech recognition, 15 temporal parameters were calculated.The HC with T2DM group showed significantly shorter utterance length, higher duration rate of silent pause and total pause, and higher average duration of silent pause and total pause compared with the HC without T2DM group. Regarding the MCI participants, parameters were similar between the T2DM and the nondiabetic subgroups.Temporal speech characteristics of T2DM patients showed early signs of altered cognitive functioning, whereas neuropsychological tests did not detect deterioration. This method is useful for identifying the T2DM patients most at risk for manifest MCI, and could serve as a remote cognitive screening tool. LA - English DB - MTMT ER -