Historical Page Classifier (MobileNetV2)
A lightweight image classifier for detecting illustrated pages in historical documents. Optimized for browser deployment via transformers.js.
Why This Exists
Finding illustrations in digitized books typically means manually scrolling through hundreds of pages. This model automates that task.
At 2.5MB, it's small enough to run directly in your browser - no server, no GPU, no sending images anywhere.
Trained on hand-labeled data from 2022 using Internet Archive books.
Part of small-models-for-glam - a collection of efficient models for galleries, libraries, archives, and museums.
Model Description
- Architecture: MobileNetV2 fine-tuned for binary classification
- Task: Classify historical document pages as "illustrated" or "not-illustrated"
- Model Size: ~2.5 MB (quantized ONNX)
- Accuracy: 95.09% on test set
Usage
Browser (transformers.js)
import { pipeline } from "@huggingface/transformers";
const classifier = await pipeline("image-classification", "small-models-for-glam/historical-illustration-detector");
// Classify an image
const result = await classifier("https://example.com/page.jpg");
console.log(result);
// [{ label: "illustrated", score: 0.95 }]
Python (transformers + optimum)
from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoImageProcessor
from PIL import Image
model = ORTModelForImageClassification.from_pretrained("small-models-for-glam/historical-illustration-detector")
processor = AutoImageProcessor.from_pretrained("small-models-for-glam/historical-illustration-detector")
image = Image.open("page.jpg")
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
# 0 = not-illustrated, 1 = illustrated
Labels
| Label ID | Label |
|---|---|
| 0 | not-illustrated |
| 1 | illustrated |
Training Data
Trained on ImageIN/ImageIn_annotations:
- 1,896 historical book pages
- 466 illustrated (25%), 1,430 not-illustrated (75%)
What counts as "illustrated"?
Positive (illustrated):
- Engravings, woodcuts, lithographs
- Photographs
- Maps, diagrams, charts
- Scientific illustrations
Negative (not-illustrated):
- Plain text pages
- Decorative drop caps, borders
- Printer's devices
- Tables (structured text)
Examples
Performance
| Metric | Value |
|---|---|
| Accuracy | 95.09% |
| F1 (illustrated) | 0.92 |
| Inference time (CPU) | ~20ms |
Intended Use
- IIIF manifest processing
- Digital library workflows
- Historical document analysis
- Research and scholarship
Limitations
- Trained primarily on Western historical books
- May struggle with edge cases (text + small illustration)
- Not designed for modern documents
Development
Built with assistance from Claude Code (Anthropic). The training pipeline, ONNX export, quantization, and browser demo were developed in a single session - enabled by the pre-existing labeled dataset from 2022.
Citation
If you use this model, please cite:
@misc{historical-page-classifier,
author = {van Strien, Daniel},
title = {Historical Page Classifier},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/small-models-for-glam/historical-illustration-detector}
}
- Downloads last month
- 364

