This folder contains all the editable content for the site, in XML files, along with associated audio, images, etc. Although the content files they have the extension .xml, these files are actually XHTML5 documents rooted on the div element. The idea is that they remain as simple as possible to edit; the surrounding site stuff (banner, menu, footer etc.) is only added at build time.
These files are stored in site/js/search, and there are (currently) over 12,000 of them. The process works like this: Each XML file in the content directory is tokenized. Each token is stemmed using a Python 3 implementation of the Porter Stemming Algorithm. A JSON file is created for each unique stemmed token. It is named with the token plus the extension .json, and stored in site/js/search/. That file is populated with an object for each document which contains the token, along with the title of the document and the number of occurrences of the token in that document. When a search is launched from the search page on the site, this is what happens: The search string is tokenized. Each token is stemmed using a JavaScript implementation of Porter Stemming. The browser requests a JSON file for each stemmed token. When it has received all the available files (or rejections for files which don't exist), it compiles the results into a list of documents. It sorts the document lists based on two criteria: how many of the search terms appear in the document, and how many instances of the search terms appear in the document. The objective is to prioritize documents which contain as many of the search terms as possible, and then rank those by how many instances they actually have.
Now give it a conformant filename. Filenames should use ONLY letters, numbers, and underscores; never include punctuation, spaces, quotation marks or any similar character. Filenames should follow the camelback pattern, starting with a lower-case letter and marking each word-boundary by an uppercase; file extensions should always be lower-case: wentworthHouse.jpg makeKeatsGreatAgain.jpg haydonByWNicholson_c1820.jpg You will notice that there are lots of images in the folders which do not follow these rules. These are evil and will be renamed when there is time to make the changes.
35fe9a5643