New York, NY, October 15, 2013 — The 135th Audio Engineering Society Convention (Thursday, October 17, through Sunday, October 20, 2013, at the Javits Center in New York City) features the presentation of the annual AES “Best Peer-Reviewed Paper Award” and “Best Student Paper Award” distinctions, honoring outstanding achievement in academic papers presented at the convention. The awards are being presented by Brett Leonard and Tae Hong Park, 135th AES Convention Papers Co-chairs.
This year’s “Best Peer-Reviewed Paper Award” distinctions were presented to:
- Esben Skovenborg and Thomas Lund (both of TC Electronic, Risskov, Denmark), for their paper “Level-Normalization of Feature Films Using Loudness vs Speech.”
- Yoshito Sonoda and Toshiyuki Nakamiya (both of Tokai University, Kumamota, Japan), for their paper “Proposal of Optical Wave Microphone and Physical Mechanism of Sound Detection.”
This year’s “Best Student Paper Award” distinctions were presented to:
- David Romblom, with co-authors Richard King and Catherine Guastavino (all of McGill University – Montreal, Quebec, Canada, Centre for Interdisciplinary Research in Music Media and Technology [CIRMMT]), for their paper “A Perceptual Evaluation of Room Effect Methods for Multichannel Spatial Audio.”
- Teemu Koski, with co-authors Ville Sivonen and Ville Pulkki (all of Technical University of Denmark), for their paper “Measuring Speech Intelligibility in Noisy Environments Reproduced with Parametric Spatial Audio”
Abstract for “Level-Normalization of Feature Films Using Loudness vs Speech” (Convention Paper 8983):
We present an empirical study of the differences between level-normalization of feature films using the two dominant methods: loudness normalization and speech (“dialog”) normalization. The sound of 35 recent “blockbuster” DVDs were analyzed using both methods. The difference in normalization level was up to 14 dB, on average 5.5 dB. For all films the loudness method provided the lowest normalization level and hence the greatest headroom. Comparison of automatic speech measurement to manual measurement of dialog anchors shows a typical difference of 4.5 dB, with the automatic measurement producing the highest level. Employing the speech-classifier to process rather than measure the films, a listening test suggested that the automatic measure is positively biased because it sometimes fails to distinguish between “normal speech” and speech combined with “action” sounds. Finally, the DialNorm values encoded in the AC-3 streams on DVDs were compared to both the automatically and the manually measured speech levels and found to match neither one well.
Abstract for “Proposal of Optical Wave Microphone and Physical Mechanism of Sound Detection” (Convention Paper 8924):
An optical wave microphone with no diaphragm, which uses wave optics and a laser beam to detect sounds, can measure sounds without disturbing the sound field. The theoretical equation for this measurement can be derived from the optical diffraction integration equation coupled to the optical phase modulation theory, but the physical interpretation or meaning of this phenomenon is not clear from the mathematical calculation process alone. In this paper the physical meaning in relation to wave-optical processes is considered. Furthermore, the spatial sampling theorem is applied to the interaction between a laser beam with a small radius and a sound wave with a long wavelength, showing that the wavenumber resolution is lost in this case, and the spatial position of the maximum intensity peak of the optical diffraction pattern generated by a sound wave is independent of the sound frequency. This property can be used to detect complex tones composed of different frequencies with a single photo-detector. Finally, the method is compared with the conventional Raman-Nath diffraction phenomena relating to ultrasonic waves.
Abstract for “A Perceptual Evaluation of Recording, Rendering, and Reproduction Techniques for Multichannel Spatial Audio” (Convention Paper 9004):
The objective of this project is to perceptually evaluate the relative merits of two different spatial audio recording and rendering techniques within the context of two different multichannel reproduction systems. The two recordings and rendering techniques are "natural," using main microphone arrays, and "virtual," using spot microphones, panning, and simulated acoustic delay. The two reproduction systems are the 3/2 system (5.1 surround) and a 12/2 system, where the frontal L/C/R triplet is replaced by a 12-loudspeaker linear array. The perceptual attributes of multichannel spatial audio have been established by previous authors. In this study magnitude ratings of selected spatial audio attributes are presented for the above treatments and results are discussed.
Abstract for Measuring Speech Intelligibility in Noisy Environments Reproduced with Parametric Spatial Audio (Convention Paper 8952):
This work introduces a method for speech intelligibility testing in reproduced sound scenes. The proposed method uses background sound scenes augmented by target speech sources and reproduced over a multichannel loudspeaker setup with time-frequency domain parametric spatial audio techniques. Subjective listening tests were performed to validate the proposed method: speech recognition thresholds (SRT) in noise were measured in a reference sound scene and in a room where the reference was reproduced by a loudspeaker setup. The listening tests showed that for normally-hearing test subjects the method provides nearly indifferent speech intelligibility compared to the real-life reference when using a nine-loudspeaker reproduction setup in anechoic conditions (<0.3 dB error in SRT). Due to the flexible technical requirements, the method is potentially applicable to clinical environments.