TinyStories Small Language Model (SLM)

A compact GPT-style language model trained from scratch on the TinyStories dataset, designed for generating simple, coherent stories suitable for children.

Model Details

Model Description

This is a small-scale transformer language model with the following specifications:

Model Type: GPT (Generative Pre-trained Transformer)
Parameters: ~22M parameters
Context Length: 128 tokens
Vocabulary Size: 50,257 (GPT-2 tokenizer)
Language: English
License: MIT

Model Architecture

Layers: 6 transformer blocks
Attention Heads: 6 heads per layer
Hidden Size: 384
Feed-forward Size: 1,536 (4 × hidden_size)
Dropout Rate: 0.1 (training), 0.0 (inference)
Activation Function: GELU
Position Encoding: Learned positional embeddings
Weight Tying: Shared embedding and output layer weights

Training Details

Training Data

Dataset: TinyStories
Training Examples: ~2.1M stories
Validation Examples: ~22K stories
Tokenizer: GPT-2 tokenizer (tiktoken)
Data Processing: Stories tokenized and concatenated into training sequences

Training Procedure

Training Hyperparameters:

Optimizer: AdamW
Learning Rate: 1e-4
Learning Rate Schedule: Linear warmup (1,000 steps) + Cosine annealing
Weight Decay: 0.1
Beta1, Beta2: 0.9, 0.95
Epsilon: 1e-9
Batch Size: 32
Gradient Accumulation Steps: 32
Effective Batch Size: 1,024
Training Steps: 20,000
Gradient Clipping: 0.5
Mixed Precision: bfloat16/float16

Training Infrastructure:

Hardware: NVIDIA Tesla T4 GPU
Training Time: ~3.5 hours
Framework: PyTorch 2.0+

Training Results

Final Training Loss: 2.39
Final Validation Loss: 2.39
Best Validation Loss: 2.39 (achieved around step 19,000)
Validation Perplexity: ~10.9
Convergence: Stable training with minimal overfitting

Usage

Installation

pip install torch tiktoken huggingface_hub

Quick Start

import torch
import tiktoken
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="config.json")

# Load tokenizer
enc = tiktoken.get_encoding("gpt2")

# Load model (you'll need the model.py file)
from model import GPT, GPTConfig
import json

with open(config_path, 'r') as f:
    config_dict = json.load(f)

config = GPTConfig(
    vocab_size=config_dict["vocab_size"],
    block_size=config_dict["block_size"],
    n_layer=config_dict["n_layer"],
    n_head=config_dict["n_head"],
    n_embd=config_dict["n_embd"],
    dropout=0.0,  # Set to 0 for inference
    bias=config_dict["bias"]
)

model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Generate text
def generate_story(prompt, max_tokens=200, temperature=1.0):
    context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
    
    with torch.no_grad():
        generated = model.generate(
            context, 
            max_new_tokens=max_tokens,
            temperature=temperature
        )
    
    return enc.decode(generated.squeeze().tolist())

# Example usage
story = generate_story("Once upon a time there was a pumpkin.")
print(story)

Example Outputs

Input: "Once upon a time there was a pumpkin."

Once upon a time there was a pumpkin. The pumpkin was very big and orange. 
It lived in a garden with many other vegetables. One day, a little girl 
named Lucy came to visit the garden. She saw the big pumpkin and smiled...

Input: "A little girl went to the woods"

A little girl went to the woods and saw some big, colorful flowers. She 
picked one up and smelled it. It smelled very nice. Then she heard a sound 
behind a tree. It was a small bunny rabbit...

Input: "In a magical kingdom far away"

In a magical kingdom far away, there lived a kind princess. She had long, 
beautiful hair and a pretty dress. Every day, she would help the people 
in her kingdom and make them happy...

Model Performance

Capabilities ✅

Generates coherent short stories with simple narratives
Maintains child-friendly vocabulary and themes
Fast inference due to compact size (~22M parameters)
Suitable for educational purposes and creative writing assistance
Good for prototyping and research in small language models

Limitations ❌

Limited context window (128 tokens)
Focused on simple vocabulary and sentence structures
May generate repetitive content in longer sequences
Not suitable for complex reasoning or factual accuracy
Grammar inconsistencies in extended narratives
Limited knowledge beyond children's story patterns

Use Cases

🎓 Educational: Teaching transformer architectures and NLP concepts
📚 Creative Writing: Generating story ideas and children's content
🔬 Research: Baseline for small language model experiments
🎮 Interactive Applications: Story generation for games and apps
📖 Content Creation: Assisting with creative writing for children

Technical Specifications

Specification	Value
Parameters	~22M
Architecture	GPT (Decoder-only Transformer)
Context Length	128 tokens
Vocabulary Size	50,257 tokens
Model Size	~87 MB
Inference Speed	~50-100 tokens/sec (CPU)
Memory Usage	~200MB (inference)

Files and Structure

├── README.md              # This documentation
├── config.json            # Model configuration
├── pytorch_model.bin      # Model weights (87MB)
├── model.py              # Model architecture code
├── requirements.txt       # Python dependencies
└── example_usage.py       # Usage examples

Training Metrics

The model was trained with careful monitoring of training dynamics:

Training Loss Curve: Smooth convergence from ~9.5 to 2.39
Validation Loss: Closely tracks training loss, indicating good generalization
Learning Rate Schedule: Effective warmup and cosine decay
Gradient Norms: Stable throughout training with clipping at 0.5
Memory Efficiency: Trained with gradient accumulation and mixed precision

Evaluation

The model was evaluated on the TinyStories validation set:

Perplexity: 10.9 (exp(2.39))
Generation Quality: Produces coherent stories with proper narrative structure
Vocabulary Usage: Appropriate for target audience (children's stories)
Repetition: Minimal repetitive patterns in generated content

Ethical Considerations

Intended Use

This model is designed for:

Educational purposes and research
Creative writing assistance for children's content
Demonstrating small-scale language model capabilities

Out-of-Scope Use

Generating factual information or news content
Professional writing or formal documentation
Any application requiring factual accuracy
Content for adult audiences requiring complex reasoning

Biases and Limitations

Trained exclusively on English children's stories
May reflect biases present in the training data
Limited cultural and linguistic diversity
Simple moral frameworks typical of children's literature

Citation

@misc{tinystories-slm-2025,
  title={TinyStories Small Language Model},
  author={Abhilash},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/abhilash88/tinystories-slm-gpt}
}

Acknowledgments

TinyStories Dataset: Created by Ronen Eldan et al. (Paper)
Architecture Inspiration: Based on nanoGPT by Andrej Karpathy
Tokenizer: GPT-2 tokenizer by OpenAI
Framework: PyTorch and Hugging Face ecosystem

License

This model is released under the MIT License. See the LICENSE file for details.

Contact

For questions, issues, or collaboration:

Hugging Face: @abhilash88
Repository: Model Repository

Model trained and released on July 31, 2025

Downloads last month: 13

Dataset used to train abhilash88/tinystories-slm-gpt

Paper for abhilash88/tinystories-slm-gpt

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 38

Evaluation results

Validation Perplexity on TinyStories
self-reported

10.900
Validation Loss on TinyStories
self-reported

2.390