Anurag Kumar

Research Lead and Scientist, Meta

anuragkr [AT] ieee [DOT] org


I am a research lead and scientist at Meta Research. My broad research interests include Deep Learning, Audio/Speech Processing and Multimodal Learning. Often, my research focuses on weakly, self-supervised and unsupervised learning methods for different domains and problems.

Before joining Meta, I finished my PhD from School of Computer Science at Carnegie Mellon University in 2018. I was advised by Prof. Bhiksha Raj. My PhD thesis was Acoustic Intelligence in Machines, and it introduced weakly labeled learning of sounds, which has since then played a crucial role in scaling sound event detection and classification. I obtained my undegraduate degree in Electrical Engineering from Indian Institute of Technology (IIT), Kanpur in 2013.

Some of my recent works have focused on Scene Understanding and Generation (audio-only and multimodal) [ Neurips-2023, CVPR-2023, CVPR-2022, IJCAI-2020, ICML-2020]; Speech Enhancement (single chanel, multi-channel, audio-visual) [ICASSP-2023, IEEE JSTSP-2022, ICASSP-2022, ICASSP-2021, ASRU-2021]; Deep Learning based Speech Assessment (Quality and Intelligibility) [ICASSP-2023, Interspeech-2022, Neurips-2021]. Check out my Google Scholar for a complete lists of my published works in various areas.

I regularly participate in different AI/Speech conferences (Neurips, ICML, ICASSP, Interspeech, ICLR, to mention a few) and journals (IEEE TASLP, IEEE SPL, IEEE TSP, Neural Networks, TMLR) in various roles - as Organizer/Reviewer/Program Committee Member/Guest Editor.


Selected Publications

Google Scholar lists all of my publications.
indicates equal contribution.

Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech-the Importance of Energetic, Temporal and Spatial Information

Orel Ben Zaken, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Advances in neural information processing systems (Neurips), 2023.

Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio

Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

Egocentric Audio-Visual Object Localization

Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

Efthymios Tzinis, Yossi Adi, Vamsi K Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

IEEE Journal of Selected Topics in Signal Processing, 2022.

NORESQA--A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Advances in neural information processing systems (Neurips), 2021.

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

Anurag Kumar, Vamsi Krishna Ithapu

International Conference on Machine Learning (ICML), 2020.

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Haytham Fayek , Anurag Kumar

International Joint Conference on Artificial Intelligence (IJCAI), 2020.

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

Anurag Kumar, Maksim Khadkevich, Christian Fügen

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.

Audio Event Detection using Weakly Labeled Data

Anurag Kumar, Bhiksha Raj

ACM International Conference on Multimedia (ACM MM), 2016.


This website uses the website design and template by Martin Saveski.