Longtexts

A collection of long literary texts for the computational linguistics research purposes.

The texts are, as I believe, in the public domain. The texts have been obtained from Project Gutenberg, Wikisource, Royallib and lib.ru and preprocessed so as to fit specific research purposes:

Copyright texts were removed from the files
Author and translator notes were removed
Table of contents and any indices were removed, except for the table of contents from Don Quixote
Any links to illustrations have been removed
In the Russian version of War and Peace any non-Russian text have been replaced with Russian translations
Etymology was removed from Moby-Dick or, The Whale, where encountered, as some languages missed it

Notice and take down policy

Notice: Should you consider that the data contains material that is owned by you and should therefore not be reproduced here, please:

Clearly identify yourself, with detailed contact data such as the email address at which you can be contacted.
Clearly identify the copyrighted work claimed to be infringed.
Clearly identify the material that is claimed to be infringing and information reasonably sufficient to allow us to locate the material.

Send the request to nickm@ntrlab.com

Take down: I will comply with legitimate requests by removing the affected sources from the corpus.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Critique of Pure Reason		Critique of Pure Reason
Don Quijote de la Mancha		Don Quijote de la Mancha
Moby-Dick_ or, The Whale		Moby-Dick_ or, The Whale
The Adventures of Tom Sawyer		The Adventures of Tom Sawyer
The Iliad		The Iliad
The Republic		The Republic
War and Peace		War and Peace
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Longtexts

Notice and take down policy

About

Releases

Packages

nickm197/Longtexts

Folders and files

Latest commit

History

Repository files navigation

Longtexts

Notice and take down policy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages