Publications

Conference paper

Lightburn L, De Sena E, Moore AH, Naylor PA, Brookes Det al., 2017,

Improving the perceptual quality of ideal binary masked speech

, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Journal article

Ciganovic N, Wolde-Kidan A, Reichenbach JDT, 2017,

Hair bundles of cochlear outer hair cells are shaped to minimize their fluid-dynamic resistance

, Scientific Reports, Vol: 7, ISSN: 2045-2322

The mammalian sense of hearing relies on two types of sensory cells: inner hair cells transmit the auditory stimulus to the brain, while outer hair cells mechanically modulate the stimulus through active feedback. Stimulation of a hair cell is mediated by displacements of its mechanosensitive hair bundle which protrudes from the apical surface of the cell into a narrow fluid-filled space between reticular lamina and tectorial membrane. While hair bundles of inner hair cells are of linear shape, those of outer hair cells exhibit a distinctive V-shape. The biophysical rationale behind this morphology, however, remains unknown. Here we use analytical and computational methods to study the fluid flow across rows of differently shaped hair bundles. We find that rows of V-shaped hair bundles have a considerably reduced resistance to crossflow, and that the biologically observed shapes of hair bundles of outer hair cells are near-optimal in this regard. This observation accords with the function of outer hair cells and lends support to the recent hypothesis that inner hair cells are stimulated by a net flow, in addition to the well-established shear flow that arises from shearing between the reticular lamina and the tectorial membrane.

Conference paper

Picinali L, Wallin A, Levtov Y, Poirier-Quinot Det al., 2017,

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

, AES 2017, Publisher: Audio Engineering Society

Reverberation has always been considered of primary importance in order to improve the realism, externalisation and immersiveness of binaurally spatialised sounds. Different techniques exist for implementing reverberation in a binaural context, each with a different level of computational complexity and spatial accuracy. A perceptual study has been performed in order to compare between the realism and localization accuracy achieved using 5 different binaural reverberation techniques. These included multichannel Ambisonic-based, stereo and mono reverberation methods. A custom web-based application has been developed implementing the testing procedures, and allowing participants to take the test remotely. Initial results with 54 participants show that no major difference in terms of perceived level of realism and spatialisation accuracy could be found between four of the five proposed reverberation methods, suggesting that a high level of complexity in the reverberation process does not always correspond to improved perceptual attributes.

Journal article

Doire CSJ, Brookes DM, Naylor PA, 2017,

Robust and efficient Bayesian adaptive psychometric function estimation

, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

Conference paper

Pinero G, Naylor PA, 2017,

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149

Conference paper

Javed HA, Cauchi B, Doclo S, Naylor PA, Goetze Set al., 2017,

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149

Conference paper

Forte AE, Etard O, Reichenbach J, 2017,

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech

, ARO 2017

Cite

Book

Jarrett DP, Habets EAP, Naylor PA, 2017,

Theory and Applications of Spherical Microphone Array Processing Introduction

, Publisher: SPRINGER-VERLAG BERLIN, ISBN: 978-3-319-42209-1

Conference paper

Evers C, Moore A, Naylor P, 2016,

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.

Journal article

Moore AH, Evers C, Naylor PA, 2016,

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290

Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.

Conference paper

Xue W, Brookes M, Naylor PA, 2016,

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Journal article

Warren RL, Ramamoorthy S, Ciganovic N, Zhang Y, Wilson T, Petrie T, Wang RK, Jacques SL, Reichenbach JDT, Nuttall AL, Fridberger Aet al., 2016,

Minimal basilar membrane motion in low-frequency hearing

, Proceedings of the National Academy of Sciences of the United States of America, Vol: 113, Pages: E4304-E4310, ISSN: 1091-6490

Low-frequency hearing is critically important for speech and music perception, but no mechanical measurements have previously been available from inner ears with intact low-frequency parts. These regions of the cochlea may function in ways different from the extensively studied high-frequency regions, where the sensory outer hair cells produce force that greatly increases the sound-evoked vibrations of the basilar membrane. We used laser interferometry in vitro and optical coherence tomography in vivo to study the low-frequency part of the guinea pig cochlea, and found that sound stimulation caused motion of a minimal portion of the basilar membrane. Outside the region of peak movement, an exponential decline in motion amplitude occurred across the basilar membrane. The moving region had different dependence on stimulus frequency than the vibrations measured near the mechanosensitive stereocilia. This behavior differs substantially from the behavior found in the extensively studied high-frequency regions of the cochlea.

Journal article

Reichenbach CS, Braiman C, Schiff ND, Hudspeth AJ, Reichenbach JDTet al., 2016,

The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing

, Frontiers in Computational Neuroscience, Vol: 10, ISSN: 1662-5188

The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the auditory brainstem response is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function.

Conference paper

Evers C, Moore A, Naylor P, 2016,

Towards Informative Path Planning for Acoustic SLAM

, DAGA 2016

Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.

Conference paper

Picinali L, Gerino A, Bernareggi C, Alabastro N, Mascetti Set al., 2015,

Towards Large Scale Evaluation of Novel Sonification Techniques for Non Visual Shape Exploration

, ACM SIGACCESS Conference on Computers & Accessibility, Publisher: ACM, Pages: 13-21

Conference paper

Hu M, sharma D, Doclo S, Brookes D, naylor Pet al., 2015,

Speaker change detection and speaker diarization using spatial information

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Cite

Conference paper

Moore AH, Naylor PA, Skoglund J, 2014,

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

, European Signal Processing Conference, ISSN: 2219-5491

Cite

Journal article

Goodman DF, Benichoux V, Brette R, 2013,

Decoding neural responses to temporal cues for sound localization

, eLife, Vol: 2, ISSN: 2050-084X

The activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001.

Conference paper

Goodman DFM, Brette R, 2010,

Learning to localise sounds with spiking neural networks

To localise the source of a sound, we use location-specific properties of the signals received at the two ears caused by the asymmetric filtering of the original sound by our head and pinnae, the head-related transfer functions (HRTFs). These HRTFs change throughout an organism's lifetime, during development for example, and so the required neural circuitry cannot be entirely hardwired. Since HRTFs are not directly accessible from perceptual experience, they can only be inferred from filtered sounds. We present a spiking neural network model of sound localisation based on extracting location-specific synchrony patterns, and a simple supervised algorithm to learn the mapping between synchrony patterns and locations from a set of example sounds, with no previous knowledge of HRTFs. After learning, our model was able to accurately localise new sounds in both azimuth and elevation, including the difficult task of distinguishing sounds coming from the front and back.

Abstract
Cite
Citations: 7

Journal article

Goodman DF, Brette R, 2010,

Spike-timing-based computation in sound localization

, PLOS Computational Biology, Vol: 6, ISSN: 1553-734X

Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal.

Imperial College London

Latest News

Natural and Machine Hearing

Improving the perceptual quality of ideal binary masked speech

Hair bundles of cochlear outer hair cells are shaped to minimize their fluid-dynamic resistance

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

Robust and efficient Bayesian adaptive psychometric function estimation

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech

Theory and Applications of Spherical Microphone Array Processing Introduction

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Minimal basilar membrane motion in low-frequency hearing

The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing

Towards Informative Path Planning for Acoustic SLAM

Towards Large Scale Evaluation of Novel Sonification Techniques for Non Visual Shape Exploration

Speaker change detection and speaker diarization using spatial information

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Decoding neural responses to temporal cues for sound localization

Learning to localise sounds with spiking neural networks

Spike-timing-based computation in sound localization

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Towards Informative Path Planning for Acoustic SLAM

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Learning to localise sounds with spiking neural networks