Anurag Kumar







I am currently a Research Scientist at Facebook Research, more specifically at Facebook Reality Labs. Before joining Facebook in Dec. 2018, I was doing PhD at Language Technologies Institute (LTI) in School of Computer Science, Carnegie Mellon University.

I was advised by Prof. Bhiksha Raj, who leads the Machine Learning and Signal Processing group. I joined CMU in Fall 2013 and before that I spent five wonderful years at Indian Institute of Technology (IIT) Kanpur for my undergraduate degree. I obtained B.Tech-M.Tech Integrated Dual degree in Electrical Engineering from IIT Kanpur in 2013.

I defended my PhD thesis, Acoustic Intelligence In Machines , on Sep. 27, 2018. Here is the presentation deck from my defense talk.

During my PhD, I interned at Facebook Research and Microsoft Research. At Facebook (summer 2017), I interned with Christian Fuegen in the Speech and Audio Team. At Microsoft, I interned with Dinei Florencio in the Multimedia, Interaction, and Communication (MIC) group.



My broad research interests are Machine Learning, Deep Learning, Audio and Speech Processing, and Multimodal Learning. From machine learning perspective my focus often is on reducing the supervision required in the learning, Weakly Supervised, Semi and Self Supervised Learning.

My PhD research work introduced weakly labeled learning of sounds, which since then has become a major area of research in the sound understanding community and has played a crucial role in scaling sound event detection and classification.

Most of the time I design, develop and explore these methods for audio, speech and multimodal problems. These problems include Sound Understanding (Sound Event Recognition and more), Speech Enhancement, Speech Separation, Audio-Visual Scene Understanding.



I regularly serve as reviewer and program committee member for top-tier conferences and journals in the areas of Machine Learning, Audio and Speech Processing, Multimedia. Conferences and journals for which I have served as reviewer or program committee member are listed here.

Conferences (PC Member/Reviewer):
  • International Conference on Machine Learning (ICML)
  • Neural Information Processing Systems ( Neurips )
  • AAAI Conference on Artificial Intelligence ( AAAI )
  • IEEE International Conference on Audio, Speech, and Signal Processing (ICASSP)
  • IEEE International Conference on Multimedia and Expo ( ICME )
  • IEEE Global Conference on Signal and Information Processing (GlobalSIP)
  • Human Computer Interaction: ACM CHI
Journals (Reviewer):
  • IEEE Transactions on Audio, Speech and Language Processing (IEEE TASLP)
  • Neural Networks (NN)
  • IEEE Transactions on Multimedia (IEEE TMM)
  • IEEE Transactions on Signal Processing (IEEE TSP)
  • IEEE Signal Processing Letters (IEEE SPL)
  • EURASIP Journal on Audio, Speech, and Music Processing (EURASIP JASMP)
  • IEEE Transactions on Emerging Topics in Computational Intelligence (IEEE TETCI)




I have had my results for a long time, but I do not yet know how I am to arrive at them. - Carl Friedrich Gauss