About me
Academics
Publications
Experience
Projects
Resume

Contact Info:

vijay [dot] p [at] jhu [dot] edu

Vijayaditya Peddinti

About me

I graduated from the PhD program of the Electrical and Computer engineering department at Johns Hopkins University. I am currently a research scientist @ Google Speech.

Previously I worked in the Center for Language and Speech Processing on acoustic models for speech recognition, with Dan Povey and Sanjeev Khudanpur. I contribute to the acoustic modelling code in Kaldi project.

I had previously worked with Hynek Hermansky, on distortion invariant feature design for acoustic models. I worked in Speech and Vision Lab at IIIT-Hyd with Kishore Prahallad, on efficient back-off strategies for quality speech synthesis, for my Masters (by research)

Research Interests: Speech Recognition, Machine Learning

Academics

Johns Hopkins University, Maryland, US
PhD in Electrical and Computer Engineering, 2011 - 2017;
International Institute of Information Technology, Hyderabad, India
Master of Science (by Research) in Computer Science, 2011
Thesis: Synthesis of missing units in Telugu text-to-speech system
Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India
Bachelor of Technology in Information and Communication Technology, 2007

Publications

2016

Far-field ASR without parallel data
Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Submitted to Interspeech, 2016

[abstract] [bib]

Purely sequence-trained neural networks for ASR based on lattice-free MMI
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Yiming Wang, Xingyu Na and Sanjeev Khudanpur
Submitted to Interspeech, 2016

[abstract] [bib]

2015

Winner of the IARPA ASpIRE challenge [press announcement]

Reverberation robust acoustic modeling using with time delay neural networks
Vijayaditya Peddinti, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur
Proceedings of Interspeech, 2015

[abstract] [bib]

Audio Augmentation for Speech Recognition
Tom Ko, Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur
Proceedings of Interspeech, 2015

[abstract] [bib]

Best paper award

A time delay neural network architecture for efficient modeling of long temporal contexts
Vijayaditya Peddinti, Daniel Povey and Sanjeev Khudanpur
Proceedings of Interspeech, 2015

[abstract] [bib]

2014

Deep Scattering Spectrum with deep neural networks
Vijayaditya Peddinti, T. Sainath, S. Maymon, B. Ramabhadran, D. Nahamoo and Vaibhava Goel
Proceedings of ICASSP, 2014

[abstract] [bib]

Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise
Thomas Schatz, Vijayaditya Peddinti, Yuan Cao, Francis Bach, Hynek Hermansky and Emmanuel Dupoux
Proceedings of Interspeech, 2014

Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks
Tara N Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran and David Nahamoo
Proceedings of Interspeech, 2014

[abstract] [bib]

2013

Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline
Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky and Emmanuel Dupoux
Proceedings of Interspeech, 2013

A Summary Of The 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard Rose, Michael Seltzer, Pascal Clark, Ian Mcgraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Borschinger, Justin Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-Ying Lee, Keith Levin, Atta Norouzain, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz and Samuel Thomas
Proceedings of ICASSP, 2013

Mean Temporal Distance: Predicting ASR Error from Temporal Properties of Speech Signal
Hynek Hermansky, Ehsan Variani and Vijayaditya Peddinti
Proceedings of ICASSP, 2013

Filter-Bank Optimization for Frequency Domain Linear Prediction
Vijayaditya Peddinti and Hynek Hermansky
Proceedings of ICASSP, 2013

[abstract] [bib]

2011

Significance of vowel epenthesis in Telugu text-to-speech synthesis
Vijayaditya Peddinti and K. Prahallad
Proceedings of ICASSP, 2011

[abstract] [bib]

Exploiting Phone-Class Specific Landmarks for Refinement of Segment Boundaries in TTS Databases
Vijayaditya Peddinti and Kishore Prahallad
Proceedings of Interspeech, 2011

[abstract] [bib]

Experience

Participant, JSALT-2015 Workshop Jul '15 - Aug '15 [homepage] [video]
Research Intern, Microsoft Research
Mentor: Mike Seltzer Sept '14 - Dec '14
Research Intern, IBM T.J. Watson Research Center
Mentor: Tara Sainath May '13 - Aug '13
Participant, JSALT-2014 Workshop July '14 - August '14 [homepage] [pdf] [video]
Participant, Zero resource workshop July '12 [homepage] [pdf] [video1] [video2]
Research Assistant, JHU, September '11 -
Teaching Assistant, JHU
Course: Processing of Audio and Visual Signals (Instructor: Prof. Hynek Hermansky)
Course: Speech and Audio processing by humans and machines (Instructor: Prof. Hynek Hermansky)
Analytics Intern, I-Labs 24/7 Customer , Bangalore, Jan '11 - Jul '11
Part of text and data mining team. Developed a prototype for Event detection in Twitter, Facebook, Forum and Customer Care chat data.
Research Assistant, IIIT-Hyderabad, India, Dec '08 - Dec '10
Worked on the Indian Language TTS (Ministry of Commn. and Info. Tech., India) and Indian Language Data Collection (LDC-IL) projects
Technical Associate, TechMahindra Ltd., Jul '07 - Jul '08

Projects

Robust Automatic Transcription of Speech (RATS):
DARPA project
Indian Language TTS ,
Funded By Ministry of Commn. & Info. Tech., India (MCIT)
Involved in the development of a text-to-speech (TTS) synthesizer for Telugu as part of the Indian Language TTS Consortium. Developed an algorithm for automatic segmentation of audio databases (published in Interspeech, 2011) and designed a back-off strategy for missing units (published in ICASSP,2011), implementation syllable based synthesizer in the Festival framework.
Indian Language Data Collection
Funded by Lang. Data Consortium Indian Languages (LDC-IL)
Worked on Automatic generation of phonetic alignments of audio data with erroneous transcripts for speech data in Telugu as part of Indian Language Data Collection project for LDC-IL for the collection of 500 hours of speech data each for Telugu, Kannada and English languages.
Temporal Event Detection in Social Media Streams
At I-labs, 24/7 Customer
Developed an algorithm for event detection in volume time series created from multiple data streams like microblogs (like Twitter), social networks (like Facebook) and Chats (from customer service centers).