Chaerin Kong

I collect data, train models and measure their effectiveness in solving real-world problems. I have studied multimodal representation learning and generative modeling in Seoul National University, under the supervision of prof. Nojun Kwak. Currently I am working on large-scale video-language model training and agent applications at Twelve Labs. Previously, I was immersed in AI-driven sleep interpretation task (sleep sound to human-readable report) at ASLEEP and high-precision commercial image generation at NXN Labs.

Email / CV / Google Scholar / Github

Published

	ConcatPlexer: Additional Dim1 Batching for Faster ViTs Donghoon Han, Seunghyeun Seo, Donghyeon Jeon, Jiho Jang, Chaerin Kong, Nojun Kwak NeurIPS 2023 Workshop on Advancing Neural Network Training (Oral) arXiv We explore means to accelerate ViT inference by concatenating abstract visual tokens of multiple images along dim=1 and processing them at once.
	AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion Seungwoo Lee, Chaerin Kong, Donghyeon Jeon, Nojun Kwak CVPR 2023 Workshop on AI for Content Creation arXiv We introduce a simple framework for audio-aligned text-to-video synthesis that employs an off-the-shelf text-to-image diffusion model.
	Analyzing Multimodal Objectives through the Lens of Generative Diffusion Guidance Chaerin Kong, Nojun Kwak ICLR 2023 Workshop on Multimodal Representation Learning (Spotlight) arXiv / poster We study the semantic information encoded in widely used Vision-Language objectives (e.g., contrastive, captioning) by using each as diffusion guidance and inspecting the visualized images.
	Unifying Vision-Language Representation Space with Single-tower Transformer Jiho Jang, Chaerin Kong, Donghyeon Jeon, Seonhoon Kim, Nojun Kwak AAAI 2023 (Oral) arXiv / poster We train a modality-agnostic Vision-Language model, OneR, and investigate intriguing properties of a unified V-L representation.
	Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation Chaerin Kong, Donghyeon Jeon, Ohjoon Kwon, Nojun Kwak WACV 2023 arXiv / poster We propose a mask-free fashion attribute editing framework that employs a pretrained diffuser and an efficiently finetuned guidance model.
	Self-distilled Self-supervised Representation Learning Jiho Jang, Seonhoon Kim, KiYoon Yoo, Chaerin Kong, Jangho Kim, Nojun Kwak WACV 2023 By encouraging self-distillation from lower layer to upper layer in traditional SSL frameworks (e.g., SimCLR, MoCo, BYOL), we can improve the representation quality as confirmed by various downstream task performances.
	Towards Efficient Neural Scene Graphs by Learning Consistency Fields Yeji Song, Chaerin Kong, Seo Young Lee, Nojun Kwak, Joonseok Lee BMVC 2022 code / arXiv Neural Scene Graphs can be rendered more efficiently and in a more controllable manner by learning the consistency field of a given scene.
	Few-shot Image Generation with Mixup-based Distance Learning Chaerin Kong, Jeesoo Kim, Donghoon Han, Nojun Kwak ECCV 2022 code / arXiv / poster Instead of directly combatting memorization for few-shot (n<100) image synthesis, we propose latent space smoothing regularizations that empower the generator to produce diverse (perceptually continuous) set of samples.

Under Review

Fashion Style Editing with Generative Human Prior
Chaerin Kong*, Seungyong Lee* Soohyeok Im* Wonsuk Yang*
arXiv

We achieve high fidelity text-driven fashion style editing in a compute-efficient manner by leveraging a generative human prior.

Conservative Generator, Progressive Discriminator: Coordination of Adversaries in Incremental Few-shot Image Synthesis
Chaerin Kong, Nojun Kwak
arXiv

We tackle the challenging task of few-shot incremental image synthesis by training a knowledge-preserving (conservative) generator and semantic learning (progressive) discriminator.

The template is from Jon Barron. Thank you for sharing!