Geonmin Kim

Research Engineer at Clova Speech, Naver


I am currently a research engineer at Clova Speech since March 2020. At Clova Speech, I am developing speech recognition system on adverse condition.

In 2020, I received Ph.D. (Thesis: Generalization of neural network on unseen acoustic environment and sentence for spoken dialog system) from the School of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST) under the supervision of Prof. Soo-Young Lee. In 2012, I received B.S. from the department of Electrical Engineering (major) and Mathematical Science (minor) at KAIST.

Research Interests

Dialog System, Speech Recognition, Neural Networks and Machine Learning


(J: journal, C: conference, W: workshop, A: arxiv preprint, T: phD thesis)
(* authors contributed equally)


[A] Semi-supervised Disentanglement with Independent Vector Variational Autoencoders

[T] Generalization of Neural Network on Unseen Acoustic Environment and Sentence for Spoken Dialog System


[J4] Style-Controlled Synthesis of Clothing Segments for Fashion Image Manipulation

[J3] Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition


[A1] A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

[J2] Rescroing of N-best Hypotheses using Top-down Selective Attention for Automatic Speech Recognition


[W4] A Deep Chatbot for QA and Chitchat

[C3] Compositional Sentence Representation from Character within Large Context Text

[W3] Fusing Aligned and Non-Aligned Face Information for Automatic Affect Recognition in the Wild: A Deep learning Approach

[J1] Deep CNNs Along the Time Axis With Intermap Pooling within for Robustness to Spectral Variations

[C2] Active Learning for Large-scale Object Classification: from Exploration to Exploitation

[W2] Spoken Sentence Embedding from Character by Jointly Learning Character-level Compositional Word Model and RNN Sentence Encoder

[W1] Learning Tonotopically Organized Auditory Feature-map from Speech by an Intermap Pooling Layer in a Deep CNN

[C1] Implement Real-time Polyphonic Pitch Detection and Feedback System for the Melodic Instrument Player

Honors and Awards

Working experience

(L: leader, M: member)

Computational NeuroSystem Laboratory, KAIST (Graduate student, Mar. 2013 - Dec. 2019)

Speech recognition

Speech enhancement

Natural language generation

Sony Interactive Entertainment America (Intern, Oct. 2012 - Feb. 2013 )