Geonmin Kim

Description

I am currently a research engineer at Clova Speech since March 2020. At Clova Speech, I am developing speech recognition system on adverse condition.

In 2020, I received Ph.D. (Thesis: Generalization of neural network on unseen acoustic environment and sentence for spoken dialog system) from the School of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST) under the supervision of Prof. Soo-Young Lee. In 2012, I received B.S. from the department of Electrical Engineering (major) and Mathematical Science (minor) at KAIST.

Research Interests

Dialog System, Speech Recognition, Neural Networks and Machine Learning

Publications

(J: journal, C: conference, W: workshop, A: arxiv preprint, T: phD thesis)
(* authors contributed equally)

2020

[A] Semi-supervised Disentanglement with Independent Vector Variational Autoencoders

Bo-Kyeong Kim, Seungjin Kim, Geonmin Kim, Soo-Young Lee
Under review
[paper]

[T] Generalization of Neural Network on Unseen Acoustic Environment and Sentence for Spoken Dialog System

phD Thesis, Korea Advanced Institute of Science and Technology
(advised by Prof Daeshik Kim and Soo-Young Lee)
[paper]

2019

[J4] Style-Controlled Synthesis of Clothing Segments for Fashion Image Manipulation

Bo-Kyeong Kim, Geonmin Kim, Soo-Young Lee
IEEE TMM 2019
[paper]

[J3] Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Geonmin Kim, Hwaran Lee, Bo-Kyeong Kim, Sang-Hoon Oh, Soo-Young Lee
IEEE SPL 2019
[paper] [code]

2018

[A1] A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

Azam Rabiee, Geonmin Kim, Tae-Ho Kim, Soo-Young Lee
arxiv 2018
[paper]

[J2] Rescroing of N-best Hypotheses using Top-down Selective Attention for Automatic Speech Recognition

Ho-Gyeong Km, Hwaran Lee, Geonmin Kim, Sang-Hoon Oh, Soo-Young Lee
IEEE SPL 2018
[paper]

~2017

[W4] A Deep Chatbot for QA and Chitchat

Geonmin Kim*, Hwaran Lee*, CheongAn Lee, Eunmi Hong, Byunggeun Kim, Soo-Young Lee (team kAIb)
NIPS competition track workshop 2017
[paper] [code] [poster]

[C3] Compositional Sentence Representation from Character within Large Context Text

Geonmin Kim Hwaran Lee, Bo-Kyeong Kim, Soo-Young Lee
ICONIP 2017
[paper]

[W3] Fusing Aligned and Non-Aligned Face Information for Automatic Affect Recognition in the Wild: A Deep learning Approach

Bo-Kyeong Kim, Suh-Yeon Dong, Jihyeon Roh, Geonmin Kim, Soo-Young Lee
CVPR workshop, 2016
[paper]

[J1] Deep CNNs Along the Time Axis With Intermap Pooling within for Robustness to Spectral Variations

Hwaran Lee, Geonmin Kim, Ho-Gyeong Kim, Sang-Hoon Oh, Soo-Young Lee
IEEE SPL, 2016
[paper]

[C2] Active Learning for Large-scale Object Classification: from Exploration to Exploitation

Ho-Gyeong Kim, Jihyeon Roh, Hwaran Lee, Geonmin Kim, Soo-Young Lee
HAI 2015
[paper]

[W2] Spoken Sentence Embedding from Character by Jointly Learning Character-level Compositional Word Model and RNN Sentence Encoder

Geonmin Kim, Hwaran Lee, Jaemyung Yu, Soo-Young Lee
NBNI 2015 (only abstract)
[paper]`

[W1] Learning Tonotopically Organized Auditory Feature-map from Speech by an Intermap Pooling Layer in a Deep CNN

Hwaran Lee, Geonmin Kim, Jihyeon Roh, Soo-Young Lee
NBNI 2015 (only abstract)
[paper]

[C1] Implement Real-time Polyphonic Pitch Detection and Feedback System for the Melodic Instrument Player

Geonmin Kim, Chang-Hyun Kim, Soo-Young Lee
ICONIP 2012
[paper]

Honors and Awards

Ranked 3rd, ConvAI challenge, NIPS 2017 Competition Track Workshop, 2017
Best Paper Award, HAI, 2015
Qualcomm Innovation Award, 2015
title: Active Learning for Large-scale Object Classification: from Exploration to Exploitation
BK21 Plus Financial Support for Graduates Long Term Training, July. 2014
Visiting student at BioPOETS lab, UC Berkeley, U.S., July. - August. 2014
KAIST Graduate Scholarship, Mar. 2013 - Feb. 2020

Working experience

(L: leader, M: member)

Computational NeuroSystem Laboratory, KAIST (Graduate student, Mar. 2013 - Dec. 2019)

Speech recognition

Semi-supervised continuous speech recognition (2016, L)
End-to-end continuous speech recognition (2015, L)
Acoustic model for Korean Syllable (2013-2014, L)
for spontaneous spoken dialog system for language learning Electronics and Telecommunications Research Institute (ETRI)

Speech enhancement

Location-robust blind source extraction (2018, M)
for free-running embedded speech recognition technology for natural language dialogue with robots Korea Evaluation Institute of Industrial Technology (KEIT)
Unpaired speech enhancement (2017, L)
for spontaneous spoken dialog system for language learning Electronics and Telecommunications Research Institute (ETRI)

Natural language generation

Article based question-answering and chitchat bot (2017, L)
for emotional intelligence technology to infer human emotion and carry on dialogue accordingly Institute for Information & Communication Technology Promotion (IITP)

Sony Interactive Entertainment America (Intern, Oct. 2012 - Feb. 2013 )

Multiple keyword spotting in audio