Fork me on GitHub

Project Gutenberg ebooks

Project Gutenberg is one of the oldest open data projects on the web. Founded in 1971, it uses volunteers to transcribe out-of-copyright books and makes them available as text files to download.

There are now over 42,000 books in their collection, including classics like Shakespeare, Dickens and Arthur Conan Doyle.

The catalogue index of the books is available in RDF format. There is also an RSS feed for newly-added books.

Metadata for each book is expressed using the Dublin Core schema, and includes things like title, language, author(s), date published and subject.

Dataset Size

42,000+ individual books available.


Most of the books are out of copyright in the USA. Other jurisdictions may vary.

Contact information

See the Project Gutenberg contact information page.

Is this article out-of-date? Update it via GitHub.