Gutenburg_Text

Downloads from Project Gutenberg

This collection of text was pulled from Project Gutenberg's Top 100 Authors Yesterday. Only the text in English were pulled if other languages were present. Indexes were deleted. This list changes frequently on project gutenberg's website, so this list will not be the same as what it is currently.

These were pulled using this python package since they made it very easy to pull the ebooks from them. Since pulling from Project Gutenberg required some specfic installs, virtual enviornments are best if using this library.

Issues

There are some current issues with this specific dataset, listed here. Take a look and see for yourself, and feel free to add if there are any more spotted.

Dataset (1.7Gigabytes)

There are 98 authors in this dataset, two were just directories of indexes. Each containing a different amount of txt files. Each txt file has a header and footer added by Project Gutenberg. Thus will needed to be removed when doing any type of analysis.

authors.xls: a spreadsheet of meta data about each author. Some authors have a accent marks in their names and some programs overide the encoding of a csv. The encoding holds in .xls (Excel Format). Python and R can read excel files just as they can csv files.
- Name: str The author's name
- Nationality: str The authors nationality
- Gender: str The author's gender
- genre: str the genre the author belongs in
  - These are broad categories about the authors
- original language: str the language the authors original language
  - If not Engligh, then the text was translated from that orginal language

If you are looking for some types of ideas on analyzing text, check out our web tutorials.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Alcott, Louisa May		Alcott, Louisa May
Alger, Horatio, Jr.		Alger, Horatio, Jr.
Andersen, H. C. (Hans Christian)		Andersen, H. C. (Hans Christian)
Austen, Jane		Austen, Jane
Balzac, Honoré de		Balzac, Honoré de
Barrie, J. M. (James Matthew)		Barrie, J. M. (James Matthew)
Baum, L. Frank (Lyman Frank)		Baum, L. Frank (Lyman Frank)
Blackwood, Algernon		Blackwood, Algernon
Brontë, Charlotte		Brontë, Charlotte
Buckley, Theodore Alois		Buckley, Theodore Alois
Bunin, Ivan Alekseevich		Bunin, Ivan Alekseevich
Burnett, Frances Hodgson		Burnett, Frances Hodgson
Burroughs, Edgar Rice		Burroughs, Edgar Rice
Burton, Richard Francis, Sir		Burton, Richard Francis, Sir
Carroll, Lewis		Carroll, Lewis
Cervantes Saavedra, Miguel de		Cervantes Saavedra, Miguel de
Chekhov, Anton Pavlovich		Chekhov, Anton Pavlovich
Chesterton, G. K. (Gilbert Keith)		Chesterton, G. K. (Gilbert Keith)
Christie, Agatha		Christie, Agatha
Collins, Wilkie		Collins, Wilkie
Conrad, Joseph		Conrad, Joseph
Dante Alighieri		Dante Alighieri
Darwin, Charles		Darwin, Charles
Defoe, Daniel		Defoe, Daniel
Dickens, Charles		Dickens, Charles
Doré, Gustave		Doré, Gustave
Dostoyevsky, Fyodor		Dostoyevsky, Fyodor
Doyle, Arthur Conan		Doyle, Arthur Conan
Dumas, Alexandre		Dumas, Alexandre
Emshwiller, Ed		Emshwiller, Ed
Franklin, Benjamin		Franklin, Benjamin
Garnett, Constance		Garnett, Constance
Gibbon, Edward		Gibbon, Edward
Goethe, Johann Wolfgang von		Goethe, Johann Wolfgang von
Goldfrap, John Henry		Goldfrap, John Henry
Grimm, Jacob		Grimm, Jacob
Grimm, Wilhelm		Grimm, Wilhelm
Hapgood, Isabel Florence		Hapgood, Isabel Florence
Hardy, Thomas		Hardy, Thomas
Haslett, Elmer		Haslett, Elmer
Hawthorne, Nathaniel		Hawthorne, Nathaniel
Henty, G. A. (George Alfred)		Henty, G. A. (George Alfred)
Hesse, Hermann		Hesse, Hermann
Homer		Homer
Hoskins, Gayle Porter		Hoskins, Gayle Porter
Hugo, Victor		Hugo, Victor
Ibsen, Henrik		Ibsen, Henrik
Irving, Washington		Irving, Washington
James, Henry		James, Henry
Jowett, Benjamin		Jowett, Benjamin
Joyce, James		Joyce, James
Kafka, Franz		Kafka, Franz
Kelley, Leo P.		Kelley, Leo P.
Kemble, E. W. (Edward Windsor)		Kemble, E. W. (Edward Windsor)
Kipling, Rudyard		Kipling, Rudyard
Lang, Andrew		Lang, Andrew
Lockhart, Caroline		Lockhart, Caroline
London, Jack		London, Jack
Maude, Aylmer		Maude, Aylmer
Maude, Louise		Maude, Louise
Maupassant, Guy de		Maupassant, Guy de
Melville, Herman		Melville, Herman
Milton, John		Milton, John
Montgomery, L. M. (Lucy Maud)		Montgomery, L. M. (Lucy Maud)
Morley, Henry		Morley, Henry
Nesbit, E. (Edith)		Nesbit, E. (Edith)
Nietzsche, Friedrich Wilhelm		Nietzsche, Friedrich Wilhelm
Orban, Paul		Orban, Paul
Ormsby, John		Ormsby, John
Plato		Plato
Poe, Edgar Allan		Poe, Edgar Allan
Pope, Alexander		Pope, Alexander
Potter, Beatrix		Potter, Beatrix
Russell, Bertrand		Russell, Bertrand
Scott, Walter		Scott, Walter
Shakespeare, William		Shakespeare, William
Shaw, Bernard		Shaw, Bernard
Shelley, Mary Wollstonecraft		Shelley, Mary Wollstonecraft
Smith, Geoffrey Bache		Smith, Geoffrey Bache
Stevenson, Robert Louis		Stevenson, Robert Louis
Stoker, Bram		Stoker, Bram
Swift, Jonathan		Swift, Jonathan
Thoreau, Henry David		Thoreau, Henry David
Tolkien, J. R. R. (John Ronald Reuel)		Tolkien, J. R. R. (John Ronald Reuel)
Tolstoy, Leo, graf		Tolstoy, Leo, graf
Townsend, F. H. (Frederick Henry)		Townsend, F. H. (Frederick Henry)
Trollope, Anthony		Trollope, Anthony
Twain, Mark		Twain, Mark
Verne, Jules		Verne, Jules
Voltaire		Voltaire
Weir, Harrison		Weir, Harrison
Wells, H. G. (Herbert George)		Wells, H. G. (Herbert George)
Wharton, Edith		Wharton, Edith
Wilde, Oscar		Wilde, Oscar
Wodehouse, P. G. (Pelham Grenville)		Wodehouse, P. G. (Pelham Grenville)
Wrenn, Charles L. (Charles Lewis)		Wrenn, Charles L. (Charles Lewis)
Wyllie, David		Wyllie, David
Zola, Émile		Zola, Émile
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gutenburg_Text

Issues

Dataset (1.7Gigabytes)

About

Releases

Packages

Contributors 2

rcdm-uga/Gutenberg_Text

Folders and files

Latest commit

History

Repository files navigation

Gutenburg_Text

Issues

Dataset (1.7Gigabytes)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages