Seoul National University

university

Verified

https://www.snu.ac.kr/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

mbkim authored a paper about 23 hours ago

Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR

mbkim submitted a paper 8 days ago

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

mbkim authored a paper 9 days ago

LifeTox: Unveiling Implicit Toxicity in Life Advice

View all activity

Papers

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

View all Papers

jfdkjjs

authored a paper 4 months ago

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models

Paper • 2510.09008 • Published Oct 10, 2025 • 16

given131

authored 2 papers 9 months ago

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

Paper • 2407.00369 • Published Jun 29, 2024

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

Paper • 2505.18842 • Published May 24, 2025 • 36

jusjinuk

authored 4 papers 9 months ago

kiyoonyoo

authored a paper about 1 year ago

Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models

Paper • 2412.11423 • Published Dec 16, 2024 • 2

beomi

posted an update over 1 year ago

Post

10798

# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)

but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273

beomi

posted an update almost 2 years ago

Post

18822

#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image 🥲

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax

beomi

posted an update almost 2 years ago

Post

12335

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉