Historical Page Classifier (MobileNetV2)

A lightweight image classifier for detecting illustrated pages in historical documents. Optimized for browser deployment via transformers.js.

Why This Exists

Finding illustrations in digitized books typically means manually scrolling through hundreds of pages. This model automates that task.

At 2.5MB, it's small enough to run directly in your browser - no server, no GPU, no sending images anywhere.

Trained on hand-labeled data from 2022 using Internet Archive books.

Part of small-models-for-glam - a collection of efficient models for galleries, libraries, archives, and museums.

Model Description

Architecture: MobileNetV2 fine-tuned for binary classification
Task: Classify historical document pages as "illustrated" or "not-illustrated"
Model Size: ~2.5 MB (quantized ONNX)
Accuracy: 95.09% on test set

Usage

Browser (transformers.js)

import { pipeline } from "@huggingface/transformers";

const classifier = await pipeline("image-classification", "small-models-for-glam/historical-illustration-detector");

// Classify an image
const result = await classifier("https://example.com/page.jpg");
console.log(result);
// [{ label: "illustrated", score: 0.95 }]

Python (transformers + optimum)

from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoImageProcessor
from PIL import Image

model = ORTModelForImageClassification.from_pretrained("small-models-for-glam/historical-illustration-detector")
processor = AutoImageProcessor.from_pretrained("small-models-for-glam/historical-illustration-detector")

image = Image.open("page.jpg")
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
# 0 = not-illustrated, 1 = illustrated

Labels

Label ID	Label
0	not-illustrated
1	illustrated

Training Data

Trained on ImageIN/ImageIn_annotations:

1,896 historical book pages
466 illustrated (25%), 1,430 not-illustrated (75%)

What counts as "illustrated"?

Positive (illustrated):

Engravings, woodcuts, lithographs
Photographs
Maps, diagrams, charts
Scientific illustrations

Negative (not-illustrated):

Plain text pages
Decorative drop caps, borders
Printer's devices
Tables (structured text)

Examples

Illustrated	Not Illustrated

Classical engraving	Plain text page

Performance

Metric	Value
Accuracy	95.09%
F1 (illustrated)	0.92
Inference time (CPU)	~20ms

Intended Use

IIIF manifest processing
Digital library workflows
Historical document analysis
Research and scholarship

Limitations

Trained primarily on Western historical books
May struggle with edge cases (text + small illustration)
Not designed for modern documents

Development

Built with assistance from Claude Code (Anthropic). The training pipeline, ONNX export, quantization, and browser demo were developed in a single session - enabled by the pre-existing labeled dataset from 2022.

Citation

If you use this model, please cite:

@misc{historical-page-classifier,
  author = {van Strien, Daniel},
  title = {Historical Page Classifier},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/small-models-for-glam/historical-illustration-detector}
}

Downloads last month: 364

Safetensors

Model size

2.26M params

Tensor type

F32

small-models-for-glam
/

historical-illustration-detector