Jiaqi Su will present her General Exam "Perceptually-motivated environment-specific speech enhancement" on Tuesday, May 21, 2019 at 3pm in CS 302.

The members of her committee are Adam Finkelstein (adviser), Olga Russakovsky, and Szymon Rusinkiewicz.

Everyone is invited to attend her talk, and those faculty wishing to remain for the oral exam following are welcome to do so. Her abstract and reading list follow below.

Title: Perceptually-motivated environment-specific speech enhancement

Abstract: People make "in-the-wild" recordings every day using their devices such as phones and laptops. However, many factors in a typical environment can diminish the quality of a audio recording, including noise, reverberation and undesirable equalization. We propose a deep learning approach to enhance speech recordings made in a specific environment. A single neural network that operates on waveform learns to ameliorate several types of recording artifacts. The method relies on a new perceptually-motivated loss function that combines an adversarial loss with spectrogram features. The loss function aligns better with human perception of audio quality in comparison to commonly used sample loss, and enables learning on weakly aligned recording pairs. Both subjective and objective evaluations show that the proposed approach improves on state-of-the-art baseline methods.

Papers:

Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior AW, Kavukcuoglu K. WaveNet: A generative model for raw audio. SSW. 2016 Sep 13;125.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Advances in neural information processing systems 2014 (pp. 2672-2680).
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR 2016
Mysore GJ. Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?—a dataset, insights, and challenges. IEEE Signal Processing Letters. 2015 Aug;22(8):1006-10.
Pascual S, Bonafonte A, Serrà J. SEGAN: Speech Enhancement Generative Adversarial Network. Proc. Interspeech 2017. 2017:3642-6.
Rethage D, Pons J, Serra X. A wavenet for speech denoising. In2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 Apr 15 (pp. 5069-5073). IEEE
Germain FG, Mysore GJ, Fujioka T. Equalization matching of speech recordings in real-world environments. In2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016 Mar 20 (pp. 609-613). IEEE.
Kinoshita K, Delcroix M, Gannot S, Habets EA, Haeb-Umbach R, Kellermann W, Leutnant V, Maas R, Nakatani T, Raj B, Sehr A. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP Journal on Advances in Signal Processing. 2016 Dec;2016(1):7.
Hershey JR, Chen Z, Le Roux J, Watanabe S. Deep clustering: Discriminative embeddings for segmentation and separation. In2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016 Mar 20 (pp. 31-35). IEEE.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 586-595).
Hoffer E, Ailon N. Deep metric learning using triplet network. InInternational Workshop on Similarity-Based Pattern Recognition 2015 Oct 12 (pp. 84-92). Springer, Cham.

Textbook:

Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016 Nov 10. http://www.deeplearningbook.org http://www.deeplearningbook.org