mtmt
Magyar Tudományos Művek Tára
XML
JSON
Átlépés a keresőbe
In English
Idézők
/
Idézések
Phone recognition with hierarchical convolutional deep maxout networks
Tóth, L ✉ [Tóth, László (Mesterséges intel...), szerző] MTA-SZTE Mesterséges Intelligencia Kutatócsoport (SZTE / TTIK / ITCS)
Angol nyelvű Szakcikk (Folyóiratcikk) Tudományos
Megjelent:
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 1687-4714 1687-4722
2015
(1)
Paper: 25
, 13 p.
2015
SJR Scopus - Electrical and Electronic Engineering: Q2
Azonosítók
MTMT: 2963597
DOI:
10.1186/s13636-015-0068-3
WoS:
000360835700001
Scopus:
84941270134
SZTE Publicatio:
5976
Szakterületek:
Fizika
Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refinements to CNNs that have not been pursued by other authors. First, the CNN papers published up till now used sigmoid or rectified linear (ReLU) neurons. We will experiment with the maxout activation function proposed recently, which has been shown to outperform the rectifier activation function in fully connected DNNs. We will show that the pooling operation of CNNs and the maxout function are closely related, and so the two technologies can be readily combined to build convolutional maxout networks. Second, we propose to turn the CNN into a hierarchical model. The origins of this approach go back to the era of shallow nets, where the idea of stacking two networks on each other was relatively well known. We will extend this method by fusing the two networks into one joint deep model with many hidden layers and a special structure. We will show that with the hierarchical modelling approach, we can reduce the error rate of the network on an expanded context of input. In the experiments on the Texas Instruments Massachusetts Institute of Technology (TIMIT) phone recognition task, we find that a CNN built from maxout units yields a relative phone error rate reduction of about 4.3 % over ReLU CNNs. Applying the hierarchical modelling scheme to this CNN results in a further relative phone error rate reduction of 5.5 %. Using dropout training, the lowest error rate we get on TIMIT is 16.5 %, which is currently the best result. Besides experimenting on TIMIT, we also evaluate our best models on a low-resource large vocabulary task, and we find that all the proposed modelling improvements give consistently better results for this larger database as well. © 2015, Tóth.
Idézők (39)
Hivatkozás stílusok:
IEEE
ACM
APA
Chicago
Harvard
CSL
Másolás
Nyomtatás
2026-01-22 05:44
×
Lista exportálása irodalomjegyzékként
Hivatkozás stílusok:
IEEE
ACM
APA
Chicago
Harvard
Nyomtatás
Másolás