Sam Wilson has been offering pull requests on
Thyrza by George Gissing. It has been very helpful to get feedback on our very nascent documentation (see Tiago's thread on about the wiki).
Questions have come up regarding file naming, both past and future. Here is a quick review (that I ought to put in the new
documentation repo):
(<book_id> in the following examples stands for the numerical book ID given to each work by Project Gutenberg. It shows up in the end of the github repo name)
Project Gutenberg file layout:
+ '<book_id>.txt' - the canonical source file for a book
+ '<book_id>_h/' - a directory containing the html version of the book (optional)
+ '<book_id>_8.txt' - (optional) an ascii-only edition of the book file
+ 'pg<book_id>.rdf' - an RDF metadata file from PG (added to the book folder by us, created by PG)
+ 'old/' - (optional) PG old editions of released books
GITenberg files created on upload:
+ 'README.rst' - a Readme file for each repo with a simple intro
+ 'LICENSE' - a copy of the PG license/footer
+ 'CONTRIBUTING.rst' - instructions on how to contribute to this github repo
Files we will be adding:
+ '<book_id>.asciidoc' - a book file that has been converted to asciidoc
+ 'metadata.yml' - a yaml file serialization of available metadata for the book
The questions that Sam (and later @rdhyee) bring up are:
+ when creating the .asciidoc file, do we delete the original .txt file?
+ Should we keep PG's 'old/' directory of book editions? They are now stored in git history and can be recalled.
+ If we have finished asciidoc conversion of a book, should we keep the '<book_id>_h' folder of html?
Thoughts/opinions on these questions (or others related to filenames/structure?)
--Seth