Creating a Better WFH Experience: Removing Noise from Audio

Schuyler Tilney-Volk,Semir Shafi,Marco Mora-Mendoza

semanticscholar(2020)

Cited 0|Views2
No score
Abstract
As a result of the COVID-19 pandemic, non-essential workers have been working from home for the past couple months. Zoom, Microsoft Teams, and Webex have quickly replaced in person meetings. However, background noise including playing children, television, typing, coughing, breathing, and chewing can impact the quality of a video conference call with colleagues. Until recently, many models have used filters typically found in electrical and sound engineering to limit the range of transmitted frequencies. Our goal is to develop a tool capable of using deep learning to remove noise in real time. Even after COVID-19, this problem will most likely remain pertinent as companies continue to integrate work-from-home solutions into their work culture. 1 Task Description and Background Noise filtering has typically been accomplished using a variety of tools and filters that stem from electrical engineering and signals processing. As applications in digital software systems have become more nuanced, typical filters are sometimes unable to meet the modern requirements for new applications of noise filtering, as they often operate on eliminating all sounds of a certain frequency. For example, this can prove to be problematic in contemporary voice filtering applications–if two humans are speaking at the same time and with voices of roughly the same frequency, traditional filtering techniques will eliminate neither or both voices. Deep learning is capable of recognizing more complex non-linear patterns such as undesired background noise, and is therefore a recent topic of study in voice filtering applications and background noise removal. In 2012, Maas et al. introduced a RNN model to remove noise from the input to speech recognition software [5]. They conclude that both mutliple hidden layers and temporally recurrent connections are important to perform well on both the training data and unseen noise. Kim and Smaragdis (2015) developed a fine-tune scheme to further improve the performance of an already-trained Denoising AutoEncoder (DAE) in the context of semi-supervised audio source separation [4]. Park and Lee (2017) attempt to remove babble noise to help increase the intelligibility of human speech for hearing aids [6]. They propose using a convolutional neural network, specifically the Redundant Convolutional Encoder Decoder (R-CED). This convolutional neural network can be 12 times smaller than a recurrent neural network and still achieve better performance indicating that usefulness in embedded systems. Grais et al. (2017) used a similar approach of denoising autoencoders with convolutional neural networks [3]. However, their goal was to separate out all individual sources of noise in audio clips. CS230: Deep Learning, Spring 2020, Stanford University, CA. (LateX template borrowed from NIPS 2017.) They used three metrics to determine their success: SDR (signal distortion ratio), SIR (signal to interference), and SAR (signal to artifact). Germain et al. (2018) tried a novel approach still using a convolutional neural network approach with deep feature losses [2]. This loss compares the internal feature activations with a different network trained for acoustic environment detection and domestic audio tagging. Our project was inspired by the the 2017 paper by Park and Lee suggesting the use of a convolutional neural network to help solve the audio denoising problem. We use their preprocessing description on our datasets and their Cascaded R-CED Convolutional Neural Network.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined