Here I propose the following:
1. a new database schema for text and audio data.
2. a simple URL schema to access that data.
This is a first attempt. Please comment or amend it as you see fit.
For the database, I propose the following schema:
- Text is the abstract concept of a text, such as the Mahabharata.
- Edition is a specific version of a text, such as the Mahabharata critical edition.
- Section is the unit of organization: sargas, kāṇḍas, etc.
- Block is the unit of content: verses, paragraphs, etc. It has two types: TextBlock and AudioBlock
Relationships:
- A Text has one or more Editions.
- An Edition has one or more Sections.
- A Section has one or more Blocks.
Rough schemas:
- Text:
- id: int
- slug: str
- title: str
- language: str (Sanskrit, Hindi, English, ...)
- type: TextType (an enum)
- default_edition_id: foreign_key(Edition.id)
- Edition:
- id: int
- slug: str
- title: str
- language: str
- structure: JSON string (see notes below)
- text_id: foreign_key(Text.id)
- Section:
- id: int
- slug: str
- title: str
- edition_id: foreign_key(Edition.id)
- Block:
- id: int
- slug: str
- title: str
- type: BlockType (an enum)
- a block with TextBlockType has the extra text column "content" which stores an XML blob.
- a block with AudioBlockType has the extra string column "media" which stores a UUID referring to an item on our media server.
- edition_id: foreign_key(Edition.id)
- section_id: foreign_key(Section.id)
Notes on edge cases:
- Many texts will have just one edition or just one section. Even so, we will store them in this schema.
- Some texts have hierarchical sections (e.g. the Ramayana). Rather than manage this hierarchy relationally, just store a JSON blob of sections in Edition.structure that arranges them hierarchically as needed. By doing so, we avoid having to deal with "grand-sections" or "great-grand-sections" in our database.
For URLs, I propose the following schema:
Some notation: $text means a text's slug, $block means a block's slug, etc. (A slug is a human-readable ID suitable for use in a URL.) Using this notation, I suggest addressing the data above as follows:
A text: /texts/$text
An edition: /texts/$text:$edition
A section: /texts/$text:$edition/$section
A block: /texts/$text:$edition/$block
$section and $block use a simple numbering scheme:
- Sections are numbered in order: 1, 2, 3, ...
- For hierarchical sections, we use 1.1, 1.2, 1.3, ... 2.1, 2.2, 2.3, ...
- Blocks are numbered according to their section:
- Generally, we use $section.1, $section.2, ...
- For header (atha ...) and footer (iti ...) elements, we can use @header and @footer
- For paragraphs with no clear numbering, we use $section.1a, $section.1b, ... where "1" is the slug of the previous verse. If no such verse exists, use $section.a, $section.b, ...
Notes:
- If ":$edition" is removed, we will use the text's default edition.
Using the Mahabharata as an example:
/texts/mahabharata (points to Text)
/texts/mahabharata:bori-1966 (points to Edition)
/texts/mahabharata/1.1 (points to Section 1.1 using the default edition)
/texts/mahabharata:bori-1966/1.1 (points to Section 1.1 using the specified edition)
/texts/mahabharata:bori-1966/1.1.1 (points to Block 1.1.1 using the specified edition)
By using "," and "-", we can specify multiple blocks at once:
/texts/ramayanam:baroda-1960/1.1.1-1.1.10 (first 10 verses of the Ramayana, Baroda edition)/texts/meghadutam/1.1,1.3 (first and third verses of the Meghaduta, default edition)