mt-dspec-legislation-en-cy

English-to-Welsh translation model specialised for the legislation domain, built using Marian NMT.

Installation

pip install sentencepiece transformers

Usage

import transformers

model_id = "techiaith/mt-dspec-legislation-en-cy"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(model_id)
translate = transformers.pipeline("translation", model=model, tokenizer=tokenizer)

result = translate(
    "The Curriculum and Assessment (Wales) Act 2021 established "
    "the Curriculum for Wales."
)
print(result[0]["translation_text"])
# Sefydlodd Deddf Cwricwlwm ac Asesu (Cymru) 2021 y Cwricwlwm i Gymru.

Training Data

  • UK Government Legislation data
  • OPUS-cy-en corpus
  • Cofnod y Cynulliad (Welsh Assembly Records)
  • Cofion Techiaith Cymru

Evaluation

Metric Score
SacreBLEU 65.51
CER 0.28
WER 0.39
CHRF 74.69

Version History

2026-02-26: Re-converted with weight tying fix. The previous version required transformers<=4.30.2 due to issue #26271. This version works with all transformers versions.

Links

License

Apache 2.0

Downloads last month
2,170
Safetensors
Model size
69.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including techiaith/mt-dspec-legislation-en-cy