Publications
Results
- Showing results for:
- Reset all filters
Search results
-
Conference paperLightburn L, De Sena E, Moore AH, et al., 2017,
Improving the perceptual quality of ideal binary masked speech
, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.
-
Journal articleCiganovic N, Wolde-Kidan A, Reichenbach JDT, 2017,
Hair bundles of cochlear outer hair cells are shaped to minimize their fluid-dynamic resistance
, Scientific Reports, Vol: 7, ISSN: 2045-2322The mammalian sense of hearing relies on two types of sensory cells: inner hair cells transmit the auditory stimulus to the brain, while outer hair cells mechanically modulate the stimulus through active feedback. Stimulation of a hair cell is mediated by displacements of its mechanosensitive hair bundle which protrudes from the apical surface of the cell into a narrow fluid-filled space between reticular lamina and tectorial membrane. While hair bundles of inner hair cells are of linear shape, those of outer hair cells exhibit a distinctive V-shape. The biophysical rationale behind this morphology, however, remains unknown. Here we use analytical and computational methods to study the fluid flow across rows of differently shaped hair bundles. We find that rows of V-shaped hair bundles have a considerably reduced resistance to crossflow, and that the biologically observed shapes of hair bundles of outer hair cells are near-optimal in this regard. This observation accords with the function of outer hair cells and lends support to the recent hypothesis that inner hair cells are stimulated by a net flow, in addition to the well-established shear flow that arises from shearing between the reticular lamina and the tectorial membrane.
-
Conference paperPicinali L, Wallin A, Levtov Y, et al., 2017,
Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context
, AES 2017, Publisher: Audio Engineering SocietyReverberation has always been considered of primary importance in order to improve the realism, externalisation and immersiveness of binaurally spatialised sounds. Different techniques exist for implementing reverberation in a binaural context, each with a different level of computational complexity and spatial accuracy. A perceptual study has been performed in order to compare between the realism and localization accuracy achieved using 5 different binaural reverberation techniques. These included multichannel Ambisonic-based, stereo and mono reverberation methods. A custom web-based application has been developed implementing the testing procedures, and allowing participants to take the test remotely. Initial results with 54 participants show that no major difference in terms of perceived level of realism and spatialisation accuracy could be found between four of the five proposed reverberation methods, suggesting that a high level of complexity in the reverberation process does not always correspond to improved perceptual attributes.
-
Journal articleDoire CSJ, Brookes DM, Naylor PA, 2017,
Robust and efficient Bayesian adaptive psychometric function estimation
, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.
-
Conference paperPinero G, Naylor PA, 2017,
CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS
, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149 -
Conference paperJaved HA, Cauchi B, Doclo S, et al., 2017,
MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION
, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149 -
Conference paperForte AE, Etard O, Reichenbach J, 2017,
Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech
, ARO 2017 -
BookJarrett DP, Habets EAP, Naylor PA, 2017,
Theory and Applications of Spherical Microphone Array Processing Introduction
, Publisher: SPRINGER-VERLAG BERLIN, ISBN: 978-3-319-42209-1 -
Conference paperEvers C, Moore A, Naylor P, 2016,
Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition
, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.
-
Journal articleMoore AH, Evers C, Naylor PA, 2016,
Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors
, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.
-
Conference paperXue W, Brookes M, Naylor PA, 2016,
Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization
, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465 -
Journal articleWarren RL, Ramamoorthy S, Ciganovic N, et al., 2016,
Minimal basilar membrane motion in low-frequency hearing
, Proceedings of the National Academy of Sciences of the United States of America, Vol: 113, Pages: E4304-E4310, ISSN: 1091-6490Low-frequency hearing is critically important for speech and music perception, but no mechanical measurements have previously been available from inner ears with intact low-frequency parts. These regions of the cochlea may function in ways different from the extensively studied high-frequency regions, where the sensory outer hair cells produce force that greatly increases the sound-evoked vibrations of the basilar membrane. We used laser interferometry in vitro and optical coherence tomography in vivo to study the low-frequency part of the guinea pig cochlea, and found that sound stimulation caused motion of a minimal portion of the basilar membrane. Outside the region of peak movement, an exponential decline in motion amplitude occurred across the basilar membrane. The moving region had different dependence on stimulus frequency than the vibrations measured near the mechanosensitive stereocilia. This behavior differs substantially from the behavior found in the extensively studied high-frequency regions of the cochlea.
-
Journal articleReichenbach CS, Braiman C, Schiff ND, et al., 2016,
The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing
, Frontiers in Computational Neuroscience, Vol: 10, ISSN: 1662-5188The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the auditory brainstem response is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function.
-
Conference paperEvers C, Moore A, Naylor P, 2016,
Towards Informative Path Planning for Acoustic SLAM
, DAGA 2016Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.
-
Conference paperPicinali L, Gerino A, Bernareggi C, et al., 2015,
Towards Large Scale Evaluation of Novel Sonification Techniques for Non Visual Shape Exploration
, ACM SIGACCESS Conference on Computers & Accessibility, Publisher: ACM, Pages: 13-21 -
Conference paperHu M, sharma D, Doclo S, et al., 2015,
Speaker change detection and speaker diarization using spatial information
, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) -
Conference paperMoore AH, Naylor PA, Skoglund J, 2014,
An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech
, European Signal Processing Conference, ISSN: 2219-5491 -
Journal articleGoodman DF, Benichoux V, Brette R, 2013,
Decoding neural responses to temporal cues for sound localization
, eLife, Vol: 2, ISSN: 2050-084XThe activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001.
-
Conference paperGoodman DFM, Brette R, 2010,
Learning to localise sounds with spiking neural networks
To localise the source of a sound, we use location-specific properties of the signals received at the two ears caused by the asymmetric filtering of the original sound by our head and pinnae, the head-related transfer functions (HRTFs). These HRTFs change throughout an organism's lifetime, during development for example, and so the required neural circuitry cannot be entirely hardwired. Since HRTFs are not directly accessible from perceptual experience, they can only be inferred from filtered sounds. We present a spiking neural network model of sound localisation based on extracting location-specific synchrony patterns, and a simple supervised algorithm to learn the mapping between synchrony patterns and locations from a set of example sounds, with no previous knowledge of HRTFs. After learning, our model was able to accurately localise new sounds in both azimuth and elevation, including the difficult task of distinguishing sounds coming from the front and back.
-
Journal articleGoodman DF, Brette R, 2010,
Spike-timing-based computation in sound localization
, PLOS Computational Biology, Vol: 6, ISSN: 1553-734XSpike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.