TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper
โข
2305.07759
โข
Published
โข
38
A compact GPT-style language model trained from scratch on the TinyStories dataset, designed for generating simple, coherent stories suitable for children.
This is a small-scale transformer language model with the following specifications:
Training Hyperparameters:
Training Infrastructure:
pip install torch tiktoken huggingface_hub
import torch
import tiktoken
from huggingface_hub import hf_hub_download
# Download model files
model_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="config.json")
# Load tokenizer
enc = tiktoken.get_encoding("gpt2")
# Load model (you'll need the model.py file)
from model import GPT, GPTConfig
import json
with open(config_path, 'r') as f:
config_dict = json.load(f)
config = GPTConfig(
vocab_size=config_dict["vocab_size"],
block_size=config_dict["block_size"],
n_layer=config_dict["n_layer"],
n_head=config_dict["n_head"],
n_embd=config_dict["n_embd"],
dropout=0.0, # Set to 0 for inference
bias=config_dict["bias"]
)
model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
# Generate text
def generate_story(prompt, max_tokens=200, temperature=1.0):
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
with torch.no_grad():
generated = model.generate(
context,
max_new_tokens=max_tokens,
temperature=temperature
)
return enc.decode(generated.squeeze().tolist())
# Example usage
story = generate_story("Once upon a time there was a pumpkin.")
print(story)
Input: "Once upon a time there was a pumpkin."
Once upon a time there was a pumpkin. The pumpkin was very big and orange.
It lived in a garden with many other vegetables. One day, a little girl
named Lucy came to visit the garden. She saw the big pumpkin and smiled...
Input: "A little girl went to the woods"
A little girl went to the woods and saw some big, colorful flowers. She
picked one up and smelled it. It smelled very nice. Then she heard a sound
behind a tree. It was a small bunny rabbit...
Input: "In a magical kingdom far away"
In a magical kingdom far away, there lived a kind princess. She had long,
beautiful hair and a pretty dress. Every day, she would help the people
in her kingdom and make them happy...
| Specification | Value |
|---|---|
| Parameters | ~22M |
| Architecture | GPT (Decoder-only Transformer) |
| Context Length | 128 tokens |
| Vocabulary Size | 50,257 tokens |
| Model Size | ~87 MB |
| Inference Speed | ~50-100 tokens/sec (CPU) |
| Memory Usage | ~200MB (inference) |
โโโ README.md # This documentation
โโโ config.json # Model configuration
โโโ pytorch_model.bin # Model weights (87MB)
โโโ model.py # Model architecture code
โโโ requirements.txt # Python dependencies
โโโ example_usage.py # Usage examples
The model was trained with careful monitoring of training dynamics:
The model was evaluated on the TinyStories validation set:
This model is designed for:
@misc{tinystories-slm-2025,
title={TinyStories Small Language Model},
author={Abhilash},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/abhilash88/tinystories-slm-gpt}
}
This model is released under the MIT License. See the LICENSE file for details.
For questions, issues, or collaboration:
Model trained and released on July 31, 2025