Associate Professor, Linköping University
In the literature, auditory attention is explored through neural speech tracking, primarily entailing modeling and analyzing electroencephalography (EEG) responses to natural speech via linear filtering. Our study takes a novel approach, introducing an enhanced coherence estimation technique to assess the strength of neural speech tracking. This enables effective discrimination between attended and ignored speech. To mitigate the impact of colored noise in EEG, we address two biases–overall coherence-level bias and spectral peak-shifting bias. In a listening study involving 32 participants with hearing impairment, tasked with attending to competing talkers in background noise, our coherence-based method effectively discerns EEG representations of attended and ignored speech. We comprehensively analyze frequency bands, individual frequencies, and EEG channels. Frequency bands of importance are shown to be delta, theta and alpha, and the important EEG channels are the central. Lastly, we showcase coherence differences across different noise reduction settings implemented in hearing aids (HAs), underscoring our method's potential to objectively assess auditory attention and enhance HA efficacy.
Effective preprocessing of electroencephalography (EEG) data is fundamental for deriving meaningful insights. Independent component analysis (ICA) serves as an important step in this process by aiming to eliminate undesirable artifacts from EEG data. However, the decision on which and how many components to be removed remains somewhat arbitrary, despite the availability of both automatic and manual artifact rejection methods based on ICA. This study investigates the influence of different ICA-based artifact rejection strategies on EEG-based auditory attention decoding (AAD) analysis. We employ multiple ICA-based artifact rejection approaches, ranging from manual to automatic versions, and assess their effects on conventional AAD methods. The comparison aims to uncover potential variations in analysis results due to different artifact rejection choices within pipelines, and whether such variations differ across different AAD methods. Although our study finds no large difference in performance of linear AAD models between artifact rejection methods, two exeptions were found. When predicting EEG responses, the manual artifact rejection method appeared to perform better in frontal channel groups. Conversely, when reconstructing speech envelopes from EEG, not using artifact rejection outperformed other approaches.
Visual Abstract
Objective This study aims to investigate the potential of speech-reception-threshold (SRT) estimation through electroencephalography (EEG) based envelope reconstruction techniques with continuous speech stimuli. Additionally, we aim to investigate the influence of the stimuli signal-to-noise ratio (SNR) on the temporal response function (TRF). Methods The analysis included data from twenty young normal-hearing participants, who listened to audiobook excerpts with varying background noise levels, while data were recorded from 66 scalp EEG electrodes. A linear decoder was trained to reconstruct the speech envelope from the EEG data. The reconstruction accuracy was calculated as the Pearson’s correlation between the reconstructed envelope and the actual speech envelope. An SRT estimate (SRTneuro) was obtained as the midpoint of a sigmoid function fitted to the reconstruction accuracy-vs-SNR data. Additionally, the TRF was estimated at each SNR level, and a statistical analysis was conducted to reveal significant effects of SNR levels on the latencies and amplitudes of the most prominent components. Results The behaviorally determined SRT (SRTbeh) ranged from -6.0 to -3.28 dB with a mean of -5.35 dB. The SRTneuro was within 2 dB from the SRTbeh for 15 out of 20 participants (75%) and within 3 dB from the SRTbeh for all participants (100%). The investigation of the TRFs showed that the latency of P2 decreased significantly and the magnitude of the N1 amplitude increased significantly with increasing SNR. Conclusion The behavioral speech-reception-threshold (SRTbeh) of young normal-hearing listeners, can be accurately estimated based on EEG responses to continuous speech stimuli. In this study, the electrophysiological speech-reception-threshold (SRTneuro) measure was derived from a sigmoid fit to the envelope reconstruction accuracy vs SNR. The reconstruction accuracy of the envelope and the TRF features were all affected by changes in SNR, suggesting that the reconstruction accuracy of the envelope and the TRF features are related to the same underlying neural process. Highlights Speech reception threshold (SRT) can be estimated from EEG using continuous speech SRT estimated using sigmoid fit on envelope reconstruction accuracy Latency of P2 in the temporal response function decreases with increasing SNR Magnitude of N1 in the temporal response function increases with increasing SNR
Objective. This study develops a deep learning (DL) method for fast auditory attention decoding (AAD) using electroencephalography (EEG) from listeners with hearing impairment (HI). It addresses three classification tasks: differentiating noise from speech-in-noise, classifying the direction of attended speech (left vs. right) and identifying the activation status of hearing aid noise reduction algorithms (OFF vs. ON). These tasks contribute to our understanding of how hearing technology influences auditory processing in the hearing-impaired population. Approach. Deep convolutional neural network (DCNN) models were designed for each task. Two training strategies were employed to clarify the impact of data splitting on AAD tasks: inter-trial, where the testing set used classification windows from trials that the training set had not seen, and intra-trial, where the testing set used unseen classification windows from trials where other segments were seen during training. The models were evaluated on EEG data from 31 participants with HI, listening to competing talkers amidst background noise. Main results. Using 1 s classification windows, DCNN models achieve accuracy (ACC) of 69.8%, 73.3% and 82.9% and area-under-curve (AUC) of 77.2%, 80.6% and 92.1% for the three tasks respectively on inter-trial strategy. In the intra-trial strategy, they achieved ACC of 87.9%, 80.1% and 97.5%, along with AUC of 94.6%, 89.1%, and 99.8%. Our DCNN models show good performance on short 1 s EEG samples, making them suitable for real-world applications. Conclusion: Our DCNN models successfully addressed three tasks with short 1 s EEG windows from participants with HI, showcasing their potential. While the inter-trial strategy demonstrated promise for assessing AAD, the intra-trial approach yielded inflated results, underscoring the important role of proper data splitting in EEG-based AAD tasks. Significance. Our findings showcase the promising potential of EEG-based tools for assessing auditory attention in clinical contexts and advancing hearing technology, while also promoting further exploration of alternative DL architectures and their potential constraints.
Clusters of neurons generate electrical signals which propagate in all directions through brain tissue, skull, and scalp of different conductivity. Measuring these signals with electroencephalography (EEG) sensors placed on the scalp results in noisy data. This can have severe impact on estimation, such as, source localization and temporal response functions (TRFs). We hypothesize that some of the noise is due to a Wiener-structured signal propagation with both linear and nonlinear components. We have developed a simple nonlinearity detection and compensation method for EEG data analysis and utilize a model for estimating source-level (SL) TRFs for evaluation. Our results indicate that the nonlinearity compensation method produce more precise and synchronized SL TRFs compared to the original EEG data.
The auditory brainstem response (ABR) is a measure of subcortical activity in response to auditory stimuli. The wave V peak of the ABR depends on stimulus intensity level, and has been widely used for clinical hearing assessment. Conventional methods to estimate the ABR average electroencephalography (EEG) responses to short unnatural stimuli such as clicks. Recent work has moved towards more ecologically relevant continuous speech stimuli using linear deconvolution models called Temporal Response Functions (TRFs). Investigating whether the TRF waveform changes with stimulus intensity is a crucial step towards the use of natural speech stimuli for hearing assessments involving subcortical responses. Here, we develop methods to estimate level-dependent subcortical TRFs using EEG data collected from 21 participants listening to continuous speech presented at 4 different intensity levels. We find that level-dependent changes can be detected in the wave V peak of the subcortical TRF for almost all participants, and are consistent with level-dependent changes in click-ABR wave V. We also investigate the most suitable peripheral auditory model to generate predictors for level-dependent subcortical TRFs and find that simple gammatone filterbanks perform the best. Additionally, around 6 minutes of data may be sufficient for detecting level-dependent effects and wave V peaks above the noise floor for speech segments with higher intensity. Finally, we show a proof-of-concept that level dependent subcortical TRFs can be detected even for the inherent intensity fluctuations in natural continuous speech. Visual abstract Significance statement Subcortical EEG responses to sound depend on the stimulus intensity level and provide a window into the early human auditory pathway. However, current methods detect responses using unnatural transient stimuli such as clicks or chirps. We develop methods for detecting level-dependent responses to continuous speech stimuli, which is more ecologically relevant and may provide several advantages over transient stimuli. Critically, we find consistent patterns of level dependent subcortical responses to continuous speech at an individual level, that are directly comparable to those seen for conventional responses to click stimuli. Our work lays the foundation for the use of subcortical responses to natural speech stimuli in future applications such as clinical hearing assessment and hearing assistive technology.
In the literature, auditory attention is explored through neural speech tracking, primarily entailing modeling and analyzing electroencephalography (EEG) responses to natural speech via linear filtering. Our study takes a novel approach, introducing an enhanced coherence estimation technique that employs multitapers to assess the strength of neural speech tracking. This enables effective discrimination between attended and ignored speech. To mitigate the impact of colored noise in EEG, we address two biases – overall coherence-level bias and spectral peak-shifting bias. In a listening study involving 32 participants with hearing impairment, tasked with attending to competing talkers in background noise, our coherence-based method effectively discerns EEG representations of attended and ignored speech. We comprehensively analyze frequency bands, individual frequencies, and EEG channels. Frequency bands of importance are shown to be delta, theta and alpha, as well as the central EEG channels. Lastly, we showcase coherence differences across different noise reduction settings implemented in hearing aids, underscoring our method’s potential to objectively assess auditory attention and enhance hearing aid efficacy.
Objective. This paper presents a novel domain adaptation (DA) framework to enhance the accuracy of electroencephalography (EEG)-based auditory attention classification, specifically for classifying the direction (left or right) of attended speech. The framework aims to improve the performances for subjects with initially low classification accuracy, overcoming challenges posed by instrumental and human factors. Limited dataset size, variations in EEG data quality due to factors such as noise, electrode misplacement or subjects, and the need for generalization across different trials, conditions and subjects necessitate the use of DA methods. By leveraging DA methods, the framework can learn from one EEG dataset and adapt to another, potentially resulting in more reliable and robust classification models. Approach. This paper focuses on investigating a DA method, based on parallel transport, for addressing the auditory attention classification problem. The EEG data utilized in this study originates from an experiment where subjects were instructed to selectively attend to one of the two spatially separated voices presented simultaneously. Main results. Significant improvement in classification accuracy was observed when poor data from one subject was transported to the domain of good data from different subjects, as compared to the baseline. The mean classification accuracy for subjects with poor data increased from 45.84% to 67.92%. Specifically, the highest achieved classification accuracy from one subject reached 83.33%, a substantial increase from the baseline accuracy of 43.33%. Significance. The findings of our study demonstrate the improved classification performances achieved through the implementation of DA methods. This brings us a step closer to leveraging EEG in neuro-steered hearing devices.
The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.
Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo
Saznaj više