Chaerin Kong
I am a problem-driven AI researcher, posing AI as a means of real world problem solving.
I have studied multimodal representation learning and generative modeling in Seoul National University, under the supervision of prof. Nojun Kwak.
Currently I am working on lifting image generation models to commercial level so that they can be readily used for real-life applications, in terms of both quality and efficiency.
I was extremely privileged to work with talented colleauges at Twelve Labs and NAVER.
Email  / 
CV  / 
Google Scholar  / 
Github
|
|
|
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
Donghoon Han,
Seunghyeun Seo,
Donghyeon Jeon,
Jiho Jang,
Chaerin Kong,
Nojun Kwak
NeurIPS 2023 Workshop on Advancing Neural Network Training (Oral)
arXiv
We explore means to accelerate ViT inference by concatenating abstract visual tokens of multiple images along dim=1 and processing them at once.
|
|
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion
Seungwoo Lee,
Chaerin Kong,
Donghyeon Jeon,
Nojun Kwak
CVPR 2023 Workshop on AI for Content Creation
arXiv
We introduce a simple framework for audio-aligned text-to-video synthesis that employs an off-the-shelf text-to-image diffusion model.
|
|
Analyzing Multimodal Objectives through the Lens of Generative Diffusion Guidance
Chaerin Kong,
Nojun Kwak
ICLR 2023 Workshop on Multimodal Representation Learning (Spotlight)
arXiv
/
poster
We study the semantic information encoded in widely used Vision-Language objectives (e.g., contrastive, captioning) by using each as diffusion guidance and inspecting the visualized images.
|
|
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang*,
Chaerin Kong*,
Donghyeon Jeon,
Seonhoon Kim,
Nojun Kwak
AAAI 2023 (Oral)
arXiv
/
poster
We train a modality-agnostic Vision-Language model, OneR, and investigate intriguing properties of a unified V-L representation.
|
|
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation
Chaerin Kong,
Donghyeon Jeon,
Ohjoon Kwon,
Nojun Kwak
WACV 2023
arXiv
/
poster
We propose a mask-free fashion attribute editing framework that employs a pretrained diffuser and an efficiently finetuned guidance model.
|
|
Self-distilled Self-supervised Representation Learning
Jiho Jang,
Seonhoon Kim*,
KiYoon Yoo*,
Chaerin Kong,
Jangho Kim,
Nojun Kwak
WACV 2023
By encouraging self-distillation from lower layer to upper layer in traditional SSL frameworks (e.g., SimCLR, MoCo, BYOL), we can improve the representation quality as confirmed by various downstream task performances.
|
|
Towards Efficient Neural Scene Graphs by Learning Consistency Fields
Yeji Song,
Chaerin Kong,
Seo Young Lee,
Nojun Kwak,
Joonseok Lee
BMVC 2022
code
/
arXiv
Neural Scene Graphs can be rendered more efficiently and in a more controllable manner by learning the consistency field of a given scene.
|
|
Few-shot Image Generation with Mixup-based Distance Learning
Chaerin Kong,
Jeesoo Kim,
Donghoon Han,
Nojun Kwak
ECCV 2022
code
/
arXiv
/
poster
Instead of directly combatting memorization for few-shot (n<100) image synthesis, we propose latent space smoothing regularizations that empower the generator to produce diverse (perceptually continuous) set of samples.
|
The template is from Jon Barron. Thank you for sharing!
|
|