TY  - CHAP
AU  - Tóth, László
AU  - Honarmandi Shandiz, Amin
AU  - Gosztolya, Gábor
AU  - Csapó, Tamás Gábor
TI  - Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
T2  - Proceedings of the 24th International Speech Communication Association, Interspeech 2023
PB  - International Speech Communication Association (ISCA)
T3  - INTERSPEECH, ISSN 2308-457X ; 2023-August.
PY  - 2023
SP  - 1169
EP  - 1173
PG  - 5
DO  - 10.21437/Interspeech.2023-1607
UR  - https://m2.mtmt.hu/api/publication/34067268
ID  - 34067268
N1  - Institute of Informatics, University of Szeged, Hungary            
            ELRN-SZTE Research Group on Artificial Intelligence, Szeged, Hungary            
            Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary            
            Export Date: 2 October 2023
AB  - Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting the recording equipment. To aid quick speaker and session adaptation of ultrasound tongue imaging-based SSI models, we extend our deep networks with a spatial transformer network (STN) module, capable of performing an affine transformation on the input images. Although the STN part takes up only about 10% of the network, our experiments show that adapting just the STN module might allow to reduce MSE by 88% on the average, compared to retraining the whole network. The improvement is even larger (around 92%) when adapting the network to different recording sessions from the same speaker. © 2023 International Speech Communication Association. All rights reserved.
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Bóna, Judit
AU  - Gosztolya, Gábor
AU  - Hoffmann, Ildikó
AU  - Klivényi, Péter
AU  - Tóth, Alinka
AU  - Svindt, Veronika
AU  - Tóth, László
AU  - Lőrincz, András
ED  - Dobrić, Arnalda
ED  - Liker, Marko
TI  - Temporal variables of speech in Parkinson’s Disease in three spontaneous speaking tasks
T2  - Book of Abstracts : The 11th scientific conference with international participation Speech Research, Faculty of Humanities and Social Sciences, Zagreb, Croatia, December 8 - 10 2022
PB  - Hrvatsko filološko društvo
CY  - Zágráb
SN  - 9789532961935
PY  - 2022
SP  - 28
EP  - 29
PG  - 2
UR  - https://m2.mtmt.hu/api/publication/33576741
ID  - 33576741
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Pap, Gergely
AU  - Ádám, Krisztián
AU  - Györgypál, Zoltán
AU  - Tóth, László
AU  - Hegedűs, Zoltán
ED  - Anon, A
TI  - Depthwise Convolutions using Physicochemical Features of DNA for Transcription Factor Binding Site Classification. Physicochemical Features for DNA-Protein Classification with Depthwise Convolutions
TS  - Physicochemical Features for DNA-Protein Classification with Depthwise Convolutions
T2  - ICAAI '22: Proceedings of the 6th International Conference on Advances in Artificial Intelligence
PB  - Association for Computing Machinery (ACM)
CY  - New York, New York
SN  - 9781450396943
PY  - 2022
SP  - 15
EP  - 21
PG  - 7
DO  - 10.1145/3571560.3571563
UR  - https://m2.mtmt.hu/api/publication/33563542
ID  - 33563542
LA  - English
DB  - MTMT
ER  - 

TY  - JOUR
AU  - Csapó, Tamás Gábor
AU  - Gosztolya, Gábor
AU  - Tóth, László
AU  - Honarmandi Shandiz, Amin
AU  - Markó, Alexandra
TI  - Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
JF  - SENSORS
J2  - SENSORS-BASEL
VL  - 22
PY  - 2022
IS  - 22
PG  - 13
SN  - 1424-8220
DO  - 10.3390/s22228601
UR  - https://m2.mtmt.hu/api/publication/33220025
ID  - 33220025
N1  - Funding Agency and Grant Number: European Commission [20192.1.2-NEMZ-2020-00012]; National Research, Development and Innovation Office of Hungary [FK 142163]; Bolyai Janos Research Fellowship of the Hungarian Academy of Sciences; New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund [UNKP-22-5-BME-316]; Hungarian Ministry of Innovation and Technology NRDI Office [TKP2021-NVA-09]; Artificial Intelligence National Laboratory [RRF-2.3.1-21-2022-00004]
            Funding text: T.G. Csapo's research was partly supported by the APH-ALARM project (contract 20192.1.2-NEMZ-2020-00012) funded by the European Commission and the National Research, Development and Innovation Office of Hungary (FK 142163 grant), by the Bolyai Janos Research Fellowship of the Hungarian Academy of Sciences and the UNKP-22-5-BME-316 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund. The work of G. Gosztolya and L. Toth were also supported by the Hungarian Ministry of Innovation and Technology NRDI Office (grant TKP2021-NVA-09) and by the Artificial Intelligence National Laboratory (RRF-2.3.1-21-2022-00004).
AB  - Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a “silent speech interface”, which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions.
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Honarmandi Shandiz, Amin
AU  - Tóth, László
ED  - Fujita, Hamido
ED  - Fournier-Viger, Philippe
ED  - Ali, Moonis
ED  - Wang, Yinglin
TI  - Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks
T2  - Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence
PB  - Springer Netherlands
CY  - Cham
SN  - 9783031085307
T3  - Lecture Notes in Computer Science, ISSN 0302-9743
PY  - 2022
SP  - 265
EP  - 274
PG  - 10
DO  - 10.1007/978-3-031-08530-7_22
UR  - https://m2.mtmt.hu/api/publication/33096595
ID  - 33096595
LA  - English
DB  - MTMT
ER  - 

TY  - JOUR
AU  - Kálmán, János
AU  - Devanand, Davangere P.
AU  - Gosztolya, Gábor
AU  - Balogh, Réka
AU  - Imre, Nóra
AU  - Tóth, László
AU  - Hoffmann, Ildikó
AU  - Kovács, Ildikó
AU  - Vincze, Veronika
AU  - Pákáski, Magdolna
TI  - Temporal speech parameters detect mild cognitive impairment in different languages: validation and comparison of the Speech-GAP Test® in English and Hungarian
JF  - CURRENT ALZHEIMER RESEARCH
J2  - CURR ALZHEIMER RES
VL  - 19
PY  - 2022
IS  - 5
SP  - 373
EP  - 386
PG  - 14
SN  - 1567-2050
DO  - 10.2174/1567205019666220418155130
UR  - https://m2.mtmt.hu/api/publication/32841964
ID  - 32841964
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Gosztolya, Gábor
AU  - Tóth, László
AU  - Svindt, Veronika
AU  - Bóna, Judit
AU  - Hoffmann, Ildikó
ED  - The Institute of Electrical, and Electronics Engineers
TI  - Using Acoustic Deep Neural Network Embeddings to Detect Multiple Sclerosis From Speech
T2  - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PB  - IEEE
CY  - Piscataway (NJ)
SN  - 9781665405409
PY  - 2022
SP  - 6927
EP  - 6931
PG  - 5
DO  - 10.1109/ICASSP43922.2022.9746856
UR  - https://m2.mtmt.hu/api/publication/32800392
ID  - 32800392
LA  - English
DB  - MTMT
ER  - 

TY  - CHAP
AU  - Kiss-Vetráb, Mercedes
AU  - José Vicente, Egas López
AU  - Balogh, Réka
AU  - Imre, Nóra
AU  - Hoffmann, Ildikó
AU  - Tóth, László
AU  - Pákáski, Magdolna
AU  - Kálmán, János
AU  - Gosztolya, Gábor
ED  - The Institute of Electrical, and Electronics Engineers
TI  - Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment
T2  - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PB  - IEEE
CY  - Piscataway (NJ)
SN  - 9781665405409
PY  - 2022
SP  - 6467
EP  - 6471
PG  - 5
DO  - 10.1109/ICASSP43922.2022.9746148
UR  - https://m2.mtmt.hu/api/publication/32800358
ID  - 32800358
LA  - English
DB  - MTMT
ER  - 

TY  - JOUR
AU  - José Vicente, Egas López
AU  - Balogh, Réka
AU  - Imre, Nóra
AU  - Hoffmann, Ildikó
AU  - Szabó, Martina Katalin
AU  - Tóth, László
AU  - Pákáski, Magdolna
AU  - Kálmán, János
AU  - Gosztolya, Gábor
TI  - Automatic screening of mild cognitive impairment and Alzheimer’s disease by means of posterior-thresholding hesitation representation
JF  - COMPUTER SPEECH AND LANGUAGE
J2  - COMPUT SPEECH LANG
VL  - 75
PY  - 2022
PG  - 13
SN  - 0885-2308
DO  - 10.1016/j.csl.2022.101377
UR  - https://m2.mtmt.hu/api/publication/32761562
ID  - 32761562
LA  - English
DB  - MTMT
ER  - 

TY  - JOUR
AU  - Imre, Nóra
AU  - Balogh, Réka
AU  - Gosztolya, Gábor
AU  - Tóth, László
AU  - Hoffmann, Ildikó
AU  - Várkonyi, Tamás
AU  - Lengyel, Csaba Attila
AU  - Pákáski, Magdolna
AU  - Kálmán, János
TI  - Temporal Speech Parameters Indicate Early Cognitive Decline in Elderly Patients With Type 2 Diabetes Mellitus
JF  - ALZHEIMER DISEASE & ASSOCIATED DISORDERS
J2  - ALZ DIS ASSOC DIS
VL  - 36
PY  - 2022
IS  - 2
SP  - 148
EP  - 155
PG  - 8
SN  - 0893-0341
DO  - 10.1097/WAD.0000000000000492
UR  - https://m2.mtmt.hu/api/publication/32749289
ID  - 32749289
AB  - The earliest signs of cognitive decline include deficits in temporal (time-based) speech characteristics. Type 2 diabetes mellitus (T2DM) patients are more prone to mild cognitive impairment (MCI). The aim of this study was to compare the temporal speech characteristics of elderly (above 50 y) T2DM patients with age-matched nondiabetic subjects.A total of 160 individuals were screened, 100 of whom were eligible (T2DM: n=51; nondiabetic: n=49). Participants were classified either as having healthy cognition (HC) or showing signs of MCI. Speech recordings were collected through a phone call. Based on automatic speech recognition, 15 temporal parameters were calculated.The HC with T2DM group showed significantly shorter utterance length, higher duration rate of silent pause and total pause, and higher average duration of silent pause and total pause compared with the HC without T2DM group. Regarding the MCI participants, parameters were similar between the T2DM and the nondiabetic subgroups.Temporal speech characteristics of T2DM patients showed early signs of altered cognitive functioning, whereas neuropsychological tests did not detect deterioration. This method is useful for identifying the T2DM patients most at risk for manifest MCI, and could serve as a remote cognitive screening tool.
LA  - English
DB  - MTMT
ER  -