Andrey's picture

17 3

Andrey

Bochkov

·

AI & ML interests

None yet

Recent Activity

reacted to sergiopaniego's post with 🔥 about 18 hours ago

New REPL environment in OpenEnv available! ✨ Used in the Recursive Language Models (RLM) paper by Alex Zhang. Ready for inference & post-training using trajectories. Handles long contexts: > Run Python code in a sandbox > Make recursive calls to LMs > Explore data programmatically > Return final result Docs: https://meta-pytorch.org/OpenEnv/environments/repl/ Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py

posted an update about 22 hours ago

Curious reproducible fact: I trained a GPT-like decoder-only Transformer where the entire input embedding table is frozen and reduced to a 16‑D binary token-ID code (0/1) — this is NOT 16-bit quantization. Key details: - vocab_size = 65536, n_embed = 16 (2^16 = 65536 unique IDs) - deterministic expansion 16 → d_model=1024 via repeat_interleave (scale=64) - full embedding table is published (embeddings.txt) for auditability Repro note + verification script: https://huggingface.co/blog/Bochkov/emergent-semantics-beyond-token-embeddings Model repo: https://huggingface.co/Bochkov/emergent-semantics-model-16-bit-269m License: Apache-2.0

upvoted a paper 4 days ago

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

View all activity

Organizations

None yet

Bochkov 's datasets

None public yet