seil@kang:~$whoami
seil@kang:~$cat /etc/motd
I am a Ph.D. student in Computer Science at Yonsei University.
# research focus
building better multimodal AI systems by rigorously understanding how large-scale transformers internally represent and process cross-modal information, and applying those insights to improve both model performance and downstream domain experts experience.
# currently
language models, vision-language models, model and agentic system interpretability.
seil@kang:~$ls ~/links
- cvresume.pdf
- scholarscholar.google.com/citations
- github@seilk
- linkedinin/seil-kang
- x@seil3331
seil@kang:~$opensource-contributor-of
- Zero-dependency GPU allocation enforcement across multiple SSH nodes, with an interactive TUI and policy-based violation detection.
- Contributed a merged pull request to the opencode main branch.
- Contributing to Openclaw since v2026.2.22, with ongoing contributions.
seil@kang:~$ls -la ~/publications/
Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them
Real-Time Visual Attribution Streaming in Thinking Model
Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers
ViKey: Enhancing Temporal Understanding in Videos via Visual Prompting
Rare Text Semantics Were Always There in Your Diffusion Transformer
Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models
Neuron-Level Approach for Multi-Hop Reasoning in Large Vision-Language Models
Your Large Vision Language Model Only Needs A Few Attention Heads for Visual Grounding
See What You Are Told: Visual Attention Sink in Large Multimodal Models
FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing
WoLF: Wide-scope Large Language Model Framework for CXR Understanding
CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation