Improve dataset hygiene for literary corpora

#11
by jbakerx - opened

For Russian Gutenberg/other sources: Strip editorial notes, footnotes, chapter headers & license boilerplate

Sign up or log in to comment