Logo

Publikacije (55)

Nazad
E. Turajlić, O. Bozanovic

This paper investigates the use of neural networks and MFCC coefficients for automatic, text-dependent, speaker verification in security systems. We have aimed to optimize the classification performance in terms of learning strategy and neural network architecture, as well as to establish the optimal choice of parameters for the voice signature, where MFCC coefficients derivatives are considered. The performance evaluation is conducted on a database of 600 waveforms of a Serbo-Croatian utterance “lozinka” (meaning “password”).

E. Turajlić, O. Bozanovic

Sophisticated voice source estimation techniques, such as closed-phase pitch synchronous inverse filtering method, relay on accurate estimates of glottal closure instants (GCIs). In this paper, a group delay approach to GCI estimation is presented. Specifically, the average group delay and the energy weighted group delay measures are discussed in detail. We propose an improvement to the implementation of the group delay measures, whereby translation invariant thresholding is used to remove aspiration noise and other disturbances from the LPC residue instead of the standard 2nd order Butterworth low-pass filter. The performances of the two group delay measures with and without the proposed improvement are evaluated for a range of fixed and pitch-synchronous group delay window lengths. The results show that the optimal GCI estimation performance is achieved with the energy weighted group delay measure, translation invariant thresholding of LPC and the window length equaling exactly one pitch period.

E. Turajlić, O. Bozanovic

This paper proposes a novel Finite Impulse Response adaptive filter. The proposed algorithm is named Intelligent Bee Colony (IBC) algorithm. It takes some features from the Artificial Bee Colony algorithm and combines them with the elements from the classical gradient-based adaptive filter theory to produce an adaptive filter that is characterized by a very fast convergence rate. IBC algorithm is also a robust solution that performs the global minima search with high levels of accuracy. The performance of IBC algorithm is investigated in the context of adaptive channel equalization. A set of experiments are designed to compare its performance with the established adaptive filters, specifically Least Mean Square, Variable Step Size and Recursive Least Square filter. The results demonstrate the effectiveness of the proposed method.

E. Turajlić, S. Vaseghi

This paper presents a comparative study of the temporal structure of the glottal flow derivative estimates in relation to an idealized view of voice source realizations as defined by Liljencrants-Fant's model. Specifically, we endeavor to ascertain the extent by which Liljencrants-Fant's model can be used to represent the glottal flow derivative estimates obtained via closed-phase pitch synchronous inverse filtering of recorded speech. The study includes several phonation types and two examples of voice pathology. The study has established the following. Due to the limited degrees of freedom, Liljencrants-Fant's model is only capable of adequately representing the “coarse” glottal pulse structure. The “fine” structural elements can constitute a considerable part of a glottal flow derivative realization, and we have presented evidence that they contain information related to voice individuality. In addition, we have shown that LF-parameters do not always accurately portray significant events in the vocal fold dynamics.

The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speakerdependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate.

D. Rentzos, S. Vaseghi, E. Turajlić, Qin Yan, Ching-Hsiang Ho

The paper presents a voice conversion method based on analysis and transformation of the characteristics that define a speaker's voice. Voice characteristic features are grouped into three main categories: (a) the spectral features at formants; (b) the pitch and intonation pattern; (c) the glottal pulse shape. Modelling and transformation methods for each group of voice features are outlined. The spectral features at formants are modelled using a two-dimensional phoneme-dependent HMM. Subband frequency warping is used for spectrum transformation where the subbands are centred on estimates of formant trajectories. The F0 contour, extracted from autocorrelation-based pitchmarks, is used for modelling the pitch and intonation patterns of speech. A PSOLA based method is used for transformation of pitch, intonation patterns and speaking rate. Finally a method based on deconvolution of the vocal tract is used for modelling and mapping of the glottal pulse. The experimental results present illustrations of transformations of the various characteristics and perceptual evaluations.

Qin Yan, S. Vaseghi, D. Rentzos, Ching-Hsiang Ho, E. Turajlić

This paper presents an analysis of the acoustic correlates of the differences of British, Australian and American English accents. The structures of the differences that characterise accents in speech can be divided into two parts: (a) phonetic differences; and (b) acoustic differences. The focus of this paper is on the analysis of acoustic correlates of accents including formants and their trajectories, pitch trajectory, pitch accent, pitch nucleus, duration and speaking rate. The acoustics of accents are modelled and estimated using 2D HMMs of formants and a model of pitch such as the rise/fall/connect (RFC) model. The differences between the accents are discussed. The Australian accent has a lower 1/sup st/ formant (F1) but higher 2/sup nd/ formant (F2) compared to British and American. The 2/sup nd/ formant in speech is considered as the most sensitive to accent identity. British speakers have the largest pitch frequency range and the largest initial pitch rise and final pitch fall rates in utterances. Australian accent exhibits significant elongation of vowels and the lowest speaking rate compared to other two accents. The differences in acoustic correlates across accents are used to morph the accent of a source speaker towards a target accent.

D. Rentzos, S. Vaseghi, Qin Yan, Ching-Hsiang Ho, E. Turajlić

This paper explores the estimation and mapping of probability models of formant parameter vectors for voice conversion. The formant parameter vectors consist of the frequency, bandwidth and intensity of resonance at formants. Formant parameters are derived from the coefficients of a linear prediction (LP) model of speech. The formant distributions are modelled with phonemedependent two-dimensional hidden Markov models with state Gaussian mixture densities. The HMMs are subsequently used for re-estimation of the formant trajectories of speech. Two alternative methods are explored for voice morphing. The first is a non-uniform frequency warping method and the second is based on spectral mapping via rotation of the formant vectors of the source towards those of the target. Both methods transform all formant parameters (Frequency, Bandwidth and Intensity). In addition, the factors that affect the selection of the warping ratios for the mapping function are presented. Experimental evaluation of voice morphing examples is presented.

Qin Yan, S. Vaseghi, Ching-Hsiang Ho, D. Rentzos, E. Turajlić

Abstract The differences between the formant trajectories of British and broad Australian English accents are analysed and used for accent conversion. An improved formant model based on linear prediction (LP) feature analysis and a 2-D hidden Markov model (HMM) of formants is employed for estimation of the formant trajectories of vowels and diphthongs. Comparative analysis of the formant values, the formant trajectories and the formant target points of British and broad Australian accents are presented. A method for ranking the contribution of formants to accent identity is proposed whereby formants are ranked according to the normalised distances between formants across accents. The first two formants are considered more sensitive to accents than other formants. Finally a set of experiments on accent conversion is presented to transform the broad Australian accent of a speaker to British Received Pronunciation (RP) accent by formant mapping and prosody modification. Perceptual evaluations of accent conversion results illustrate that besides prosodic correlates such as pitch and duration, formants also play an important role in conveying accents.

E. Turajlić, D. Rentzos, S. Vaseghi, Ching-Hsiang Ho

This paper explores methods of estimation and mapping of parametric formant-based models for voice transformation. The main focus is the transformation of the parameters of a model of the vocal tract of a source speaker to a target speaker. The vocal tract parameters are represented with the linear prediction (LP) model coefficients and the associated formant frequencies, bandwidths, intensities and their temporal trajectories. Two methods are explored for vocal tract (formant) mapping. The first method is based on nonuniform frequency warping and the second is based on pole rotation. Both methods transform all parameters of the formants (frequency, bandwidth and intensity). In addition, the factors that affect the selection of the warping ratios for the mapping functions are presented. Experimental evaluation of voice morphing based on parametric models are presented.

Nema pronađenih rezultata, molimo da izmjenite uslove pretrage i pokušate ponovo!

Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više