solved classic rl environments
Nitish Pandey
nitishpandey04
AI & ML interests
LLMs, Translation
Organizations
Optimization
Distributed Inference
Quantization
-
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Paper • 2504.04823 • Published • 31 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 11 -
The case for 4-bit precision: k-bit Inference Scaling Laws
Paper • 2212.09720 • Published • 3
WOW
Papers that made me go wow!
Reading List
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
Architecture
Classic Reinforcement Learning
solved classic rl environments
WOW
Papers that made me go wow!
Optimization
Reading List
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
Distributed Inference
Architecture
Quantization
-
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Paper • 2504.04823 • Published • 31 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 11 -
The case for 4-bit precision: k-bit Inference Scaling Laws
Paper • 2212.09720 • Published • 3