In this third message, I'll talk about how projects are converted to published texts.
A project, which typically represents a single book, can contain an unlimited number of texts within it, and these texts can also have complex relationships with each other. A simple example is an anthology that contains dozens or hundreds of smaller texts. A more complex example is a book that contains both a main text and its translation and commentary.
While we could create one project per book, doing so would be confusing and potentially lead to duplicated work. So, we need some way to publish a single project with multiple texts.
The way we do this on Ambuda is by defining a Publish config that declares basic metadata about the text and how to extract its content from the project. An example is below:
Most of the fields here should be straightforward: title is the display title, slug is the URL title, and author / genre / language are straightforward. The small green + lets us quickly define a new genre or author. (Under the hood, these are represented as separate database rows and bound to the text with a foreign key relation). Parent slug is specific to translations and commentaries, and we may move off of it in the future.
Filter is the interesting part here. Here we define a simple query language for extracting blocks of text from a project. For ease of implementation, this query language is an s-expression with simple logical operators. Examples:
(image 5 15) # Match all blocks from page image 5 to page image 15 inclusive
(image 5 15:foo) # Match all blocks from page image 5 to page image 15 inclusive (ending at the first block with label `foo`)
(tag p) # Match all blocks representing paragraphs
(label foo) # Match all blocks marked with the label "foo"
(or (image 5 15) (label foo) # Match all blocks with (image 5 15) OR (label foo)(and (image 5 15) (label foo) # Match all blocks with (image 5 15) AND (label foo)
We are refining the system as we go so that we can publish texts more easily and pleasantly. For example, an earlier version of this query language did not support the label field when defining boundaries within images, so we ended up with complex queries like:
(or (and (image 42) (label PRAN)) (image 43) (and (image 44) (label PRAN)))
Adding an optional label-based boundary makes the language more expressive and the intent easier to understand, so for this text, we can simply define (image 67:GOPI_START 71).
Arun