Logo
Nazad
Dino Oglic, Z. Cvetković, Peter Sollich
0 23. 6. 2019.

Bayesian Parznets for Robust Speech Recognition in the Waveform Domain

We propose a novel family of band-pass filters for efficient spectral decomposition of signals. Previous work has already established the effectiveness of representations based on static band-pass filtering of speech signals (e.g., mel-frequency cepstral coefficients and deep scattering spectrum). A potential shortcoming of these approaches is the fact that the parameters specifying such a representation are fixed a priori and not learned using the available data. To address this limitation, we propose a family of filters defined via cosine modulations of Parzen windows, where the modulation frequency models the center of a spectral band-pass filter and the length of a Parzen window is inversely proportional to its bandwidth. We propose to learn these filters as part of a multilayer convolutional operator using stochastic variational inference based on Gaussian dropout posteriors and sparsity inducing priors. Such a prior leads to an intractable integral defining the Kullback--Leibler divergence term for which we propose an effective approximation based on the Gauss--Hermite quadrature. Our empirical results demonstrate that modulation filter-learning can be statistically significantly more effective than static band-pass filtering on continuous speech recognition from raw speech. This is also the first work to achieve state-of-the-art results on speech recognition using variational inference.


Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više