OpenSiddur Demo on Sefaria

Brett Lockspeiser

unread,

Jun 14, 2012, 1:41:29 PM6/14/12

to opensid...@googlegroups.com, sefar...@googlegroups.com

Hi all,

Here are some more concrete thoughts on steps towards using Sefaria's source sheet builder to demo the experience of custom Siddur building using OpenSiddur's texts.

I've put up some info about the Sefaria API at www.sefaria.org/developers . The GET section is only tangentially relevant, but can give some picture of how Sefaria works under the hood. The POST section describes what needs to happen to get new texts into Sefaria.

To get a working demo we need:

A named list of Siddur components to work with (this list can start small for a minimal initial prototype)
For each component, if it does not already exist in Sefaria (i.e., not Tanach and Mishna), a description of its structure. Components will need to be small enough that they have a uniform structure. I imagine most components at this size will only have a single level of structure, so this description may just end up looking like ["Line"] (if a component is composed of lines) or ["Blessing"] (if it's composed of blessings) or whatever the appropriate name is. These descriptions will need to be posted to /api/index/ or entered manually in the site.
Each text will then need to be transformed into an array of strings and posted to /api/texts/
Once this is done, the Source Sheet builder can be used as normal to add, order, title and comment on all these components, making it possible to build and save a complete Siddur. As discussed, I would not recommend trying to store a whole Siddur in one sheet (which is loaded as one page), but rather break it up sections. A static table of contents can be made linking to each individual sheet. That way also a list of Siddurim and a nicely formatted table of contents for each Siddur could be hosted on opensiddur.org and only link out to sefaria.org for the contents.

I think the hard part here is 1 and 2 -- figuring out how to fit the complex, annotated structures that you have into Sefaria's simplistic data model. Details will be lost. For example, "Mourner's Kadidsh" is an appropriate component. Is it clear in general how it should be segmented? Handling instructional texts like "recited by the congregation" is tricky as well, as I don't think it ought to be included in the text of the Kaddish itself given there's no way to differentiate it. A current work around is to build such instructions into the Siddur as 'comments'. e.g., include Kaddish line 1-3, then comment "Congregation reads and Mourner responds", then include Kaddish line 4-8.

In any case, you guys are the Siddur experts, so love to hear what you think. I believe this proposal is doable, but I think it will take some real work to coordinate and I am sure we will run into difficult cases with the text and then missing features / bugs in the source sheet builder. If we want to move forward I suggest we move right into steps (1) & (2) maybe in a Google Doc / Spreadsheet to start looking at specifics.

Thanks,

Brett

Efraim Feinstein

unread,

Jun 14, 2012, 2:48:22 PM6/14/12

to sefar...@googlegroups.com, opensid...@googlegroups.com

Hi, and thanks for posting!
(and Hi, Sefaria-dev, I'm Efraim, lead developer over at Open Siddur)

I have a half-written email in my drafts folder about this that I never got around to completing.

On 06/14/2012 01:41 PM, Brett Lockspeiser wrote:

Hi all,

Here are some more concrete thoughts on steps towards using Sefaria's source sheet builder to demo the experience of custom Siddur building using OpenSiddur's texts.

I've put up some info about the Sefaria API at www.sefaria.org/developers . The GET section is only tangentially relevant, but can give some picture of how Sefaria works under the hood. The POST section describes what needs to happen to get new texts into Sefaria.

Ideally, I'd like texts to be able to be transferred bidirectionally.

What Sefaria offers *right now* is a great UI (and a great test-bed UI, even if it's not feature-complete from the Open Siddur sense).

What Open Siddur offers is a server system intended to handle very complex texts. Open Siddur internally stores in XML, which means that our documents do not necessarily have uniform structure. Fortunately, in the first pass, you can almost *map* an XML document onto uniform structure. Some of the issues are below.

I'm attaching a current schema documentation snapshot, which should at least give you an idea of what elements exist and what they can contain. Unfortunately, the documentation is not completely filled in (particularly the human-readable part). If anyone intends to work on this, I can make some first-pass documentation a priority. The docs up at wiki.jewishliturgy.org are out of date since I just reviewed and finalized the schema.

I'm also attaching 5 other files:
- an annotation document (this is textual annotation, there's also conceptual annotation, but I don't have any ready examples)
- a bibliographic record
- Psalms 1 (which demonstrates both a stream of text and multiple hierarchies)
- the entire book of Psalms (which demonstrates a resource that is just combined resources)
- a contributor record

Apologies for the large attachments.

I know you already have a Tanach, but it's our demo text too. :-)

To get a working demo we need:

A named list of Siddur components to work with (this list can start small for a minimal initial prototype)

I *think* that what you call a "component", I've been calling a "resource" using XML database terminology. Everyone else calls it a "file". The key features are demonstrated in the 2 Psalms examples:

- the header has enough information to figure out the source and who's responsible for activity, eg, transcription. There's also a revision history, but that doesn't really exist until the documents get in the db.

- a stream of text (conveniently called a streamText) made up of small segments which should be "minimal units of meaning" (say, 1-5 words). This can be mapped onto a Sefaria structure with a few caveats, like:
-- you can't preserve word identity unless the text is canonized
-- kri/ktiv (which is internal in the segment), spelling regularization (the word "Yerushalayim" comes to mind, though, I don't think I marked it up in my Tanach conversion), corrections of a transcription (not relevant to Sefaria? Should it be?), divine name markup (useful, for example, if you want to create a document that is not sheimot), incidental transliteration (useful if you want to regularize transliteration across an entire published siddur).

- the concurrent hierarchic layers section: Psalms is actually quite regular, so its concurrencies (paragraphs--which are not marked up here!, verses, and line groups/lines) don't do funky things like cross boundaries, but be aware that it is entirely possible.

There are also specialized resources like annotation documents, conditional documents (describe things like "times when this prayer is said" including inline documentation), bibliographic documents (example attached), contributor documents. Another relevant document (which I don't have a ready example of) is a translation document, which, instead of having a streamText, has a parallelText element that looks like:
<parallelText>
<parallelGrp>
<ptr n="original" xml:lang="he" target="/data/original/My_Original_Text#range(se_5,se_7)"/>
<ptr n="parallel" xml:lang="en" target="/data/translation/en/My_Translation/My_Translated_Text#range(se_9,se_15)"/>
</parallelGrp>

</parallelText>

The complication here with respect to Sefaria's data model is that translation can align at any level down to the segment.

For each component, if it does not already exist in Sefaria (i.e., not Tanach and Mishna), a description of its structure. Components will need to be small enough that they have a uniform structure. I imagine most components at this size will only have a single level of structure, so this description may just end up looking like ["Line"] (if a component is composed of lines) or ["Blessing"] (if it's composed of blessings) or whatever the appropriate name is. These descriptions will need to be posted to /api/index/ or entered manually in the site.

We don't differentiate document types like "blessing." Text is text. It may be *annotated* as a blessing, if, for example, we wanted to allow searches over all blessings.

Each text will then need to be transformed into an array of strings and posted to /api/texts/

I don't think this is currently possible without some loss of information. However, that may not be such a bad thing. Some kind of structured encoding is probably better than none. Many of the actual siddurim we have were contributed in the form of MS Word documents, and would require an outlay of effort to make them workable in anyone's database.

Once this is done, the Source Sheet builder can be used as normal to add, order, title and comment on all these components, making it possible to build and save a complete Siddur. As discussed, I would not recommend trying to store a whole Siddur in one sheet (which is loaded as one page), but rather break it up sections.

Can a source sheet include other source sheets? That's basically Open Siddur's global data model (where s/source sheet/resource/g).

A static table of contents can be made linking to each individual sheet. That way also a list of Siddurim and a nicely formatted table of contents for each Siddur could be hosted on opensiddur.org and only link out to sefaria.org for the contents.

As a first database-able pass, I think this would be great, as it would resolve one major problem we have that I mentioned above: getting data database-ready in any form. My only worry is too much loss of information, requiring a second encoding pass. If we can resolve that on Sefaria's end (or come up with a hack), that would be even better.

One issue we have with our data as it stands now is that much of it is not proofread. One goal I've had for a long time is to have a Wikisource-style transcription editor where documents could be proofread (and their public domain nature could be proved) against page images.

I think the hard part here is 1 and 2 -- figuring out how to fit the complex, annotated structures that you have into Sefaria's simplistic data model. Details will be lost. For example, "Mourner's Kadidsh" is an appropriate component. Is it clear in general how it should be segmented?

It depends on the context. The title, "Mourner's Kaddish" is a heading (I think you have those). "Mourner's Kaddish" itself is a resource. In the *best* case of the XML data model, "Kaddish" is a resource and the variant types of Kaddish (and the nusach-based textual variants) just have different associated conditionals.

Handling instructional texts like "recited by the congregation" is tricky as well, as I don't think it ought to be included in the text of the Kaddish itself given there's no way to differentiate it.

Instructions (like "On Shabbat, say" or "Recited by the congregation") are both treated as annotations (human readable) and conditionals (so a computer could parse not to include the text, say, when it is not Shabbat).

A current work around is to build such instructions into the Siddur as 'comments'. e.g., include Kaddish line 1-3, then comment "Congregation reads and Mourner responds", then include Kaddish line 4-8.

How does one tell what a comment applies to? What if there is overlap between regions with instructions?
Example: Ya'aleh v'Yavo:

"On Rosh Chodesh, Yom Tov and Chol Hamoed, add:" (applies to the whole thing)
"On Rosh Chodesh -- " (applies to one line...)

In Open Siddur, an annotation can link anything with an xml:id (down to a word), though, by convention, I'd rather not link further down than a segment if I can help it.

In any case, you guys are the Siddur experts, so love to hear what you think. I believe this proposal is doable, but I think it will take some real work to coordinate and I am sure we will run into difficult cases with the text and then missing features / bugs in the source sheet builder. If we want to move forward I suggest we move right into steps (1) & (2) maybe in a Google Doc / Spreadsheet to start looking at specifics.

In the end, I would also like to have a UI for Open Siddur; I see Sefaria as a good start there too. My ideal would be to be able to use Open Siddur API calls on Sefaria's UI (understanding the caveat about the different data models). I don't really have the facility yet to determine whether that's possible or it would be just as much work fitting square pegs into circular holes as writing a UI from scratch. Thoughts on that?

Thanks again!

-- 
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

jlptei.doc.html

Notes from the Westminster Leningrad Codex.xml

The Westminster Leningrad Codex.xml

תהלים א.xml

תהלים.xml

Christopher.Kimball.xml

Brett Lockspeiser

unread,

Jun 15, 2012, 4:00:24 PM6/15/12

to sefar...@googlegroups.com, opensid...@googlegroups.com

Hi,

My understanding from Aharon was that you're interested in achieving a milestone of having a minimal working demo that people can actually use. If that's the case, I think it's important to not try to accomplish everything. I (and the sefaria code) sitll don't understand all the details of JLPTEI, so the easier step is to create a transform which simplifies the data. If you write this transform once, the same code can sit live on top of your API to allow sefaria frontend to talk to your backend as a next step, but as mentioned before, this requires branching the code.

We don't differentiate document types like "blessing." Text is text. It may be *annotated* as a blessing, if, for example, we wanted to allow searches over all blessings.

From Sefaria's point of view this is just label and only affects presentation. You could have one resource which has no structure and is just represented as a single string. Here's an example of all the data we'd need for one resource (and any data beyond this is not currently supported):

{

title: "Mourner's Kaddish",

language: "en",

sectionNames: ["Line"],

text: [

"Glorified and sanctified be God's great name throughout the world which He has created according to His will.",

"May He establish His kingdom in your lifetime and during your days, and within the life of the entire House of Israel, speedily and soon; and say, Amen.",
"May His great name be blessed forever and to all eternity.",
"Blessed and praised, glorified and exalted, extolled and honored, adored and lauded be the name of the Holy One, blessed be He, beyond all the blessings and hymns, praises and consolations that are ever spoken in the world; and say, Amen.",
"May there be abundant peace from heaven, and life, for us and for all Israel; and say, Amen.",
"He who creates peace in His celestial heights, may He create peace for us and for all Israel; and say, Amen."

]

}

Each text will then need to be transformed into an array of strings and posted to /api/texts/
I don't think this is currently possible without some loss of information.

I'm sure this will require loss of information.

Can a source sheet include other source sheets? That's basically Open Siddur's global data model (where s/source sheet/resource/g).

Not currently, but it's pretty easy to add. Sub-sources basically function this way now.

A static table of contents can be made linking to each individual sheet. That way also a list of Siddurim and a nicely formatted table of contents for each Siddur could be hosted on opensiddur.org and only link out to sefaria.org for the contents.

As a first database-able pass, I think this would be great, as it would resolve one major problem we have that I mentioned above: getting data database-ready in any form. My only worry is too much loss of information, requiring a second encoding pass. If we can resolve that on Sefaria's end (or come up with a hack), that would be even better.

I don't think I'm the right person to right code to transform JLPTEI. If you can write a JLPTEI parser in Javascript it could live in the Sefaria client.

It depends on the context. The title, "Mourner's Kaddish" is a heading (I think you have those). "Mourner's Kaddish" itself is a resource. In the *best* case of the XML data model, "Kaddish" is a resource and the variant types of Kaddish (and the nusach-based textual variants) just have different associated conditionals.

For the purposes of this proposals, no conditionals would be supported. Kaddish and Mourner's Kaddish would need to be treated as separate resources.

How does one tell what a comment applies to?

In a source sheet as they stand today, comments are just lines of text placed in some order. There's no semantics to it.

In the end, I would also like to have a UI for Open Siddur; I see Sefaria as a good start there too. My ideal would be to be able to use Open Siddur API calls on Sefaria's UI (understanding the caveat about the different data models). I don't really have the facility yet to determine whether that's possible or it would be just as much work fitting square pegs into circular holes as writing a UI from scratch. Thoughts on that?

I think the proposal I outlined is just to get a proof of concept for custom Siddur building. Having the current UI talk to you backend doesn't really achieve anything in and of itself, because the UI can't do anything with the richness of your data anyway. But it could proceed gradually. A JSON output of your API could include features that are simply ignored by the client until support is built in for them. Alternatively, if your API sticks with JLPTEI, I imagine you will want to have a javascript parser to build any kind of interactive UI anyways.

In any case, like I said before - if you want to move forward I suggest we go straight in to looking at a minimal list of resources needed for a proof of concept.

Thanks,

Brett

Efraim Feinstein

unread,

Jun 15, 2012, 6:08:40 PM6/15/12

to sefar...@googlegroups.com, Brett Lockspeiser, opensid...@googlegroups.com

Hi,

I was trying to catalog what data structures there are (trying to answer the question).

Of course, you're right in:

On 06/15/2012 04:00 PM, Brett Lockspeiser wrote:

My understanding from Aharon was that you're interested in achieving a milestone of having a minimal working demo that people can actually use. If that's the case, I think it's important to not try to accomplish everything.

Yes, that's why I emphasized a few times that "good enough" is better than nothing.

I (and the sefaria code) sitll don't understand all the details of JLPTEI, so the easier step is to create a transform which simplifies the data.

Understood. The issue I was working out by listing all the possible data structures is that Sefaria has the UI (and the ability to get texts entered quickly), which means that the primary challenge is not Open Siddur->Sefaria, which is trivial. Instead, it's Sefaria->Open Siddur. That means, we need to map the data structures in such a way that they can be *upconverted* (and re-downconverted) reproducibly.

If you write this transform once, the same code can sit live on top of your API to allow sefaria frontend to talk to your backend as a next step, but as mentioned before, this requires branching the code.

One primary question is where the glue layer should be. There are 3 possible places I can think of:
1. In Sefaria: I don't know (rather, didn't look) at how abstracted Sefaria's UI is with respect to what gets sent out by API. If it's abstracted, then the glue layer would be in Sefaria's equivalent of read() or write() calls. I think this is what you suggested below by a transform in Javascript?

2. Between Sefaria and Open Siddur: Basically, a separate gateway server that calls both APIs. Might be the most difficult way to do it since it require syncing against 2 sets of APIs.

3. In Open Siddur: In eXist-db, JSON<->XML is possible (though I'm not 100% sure if all JSON structures can map), and you can then use XQuery or XSLT to do the transforms. The implementation question here would be who would call the glue API? Do we fork Sefaria and run the UI calling the Open Siddur server, essentially implementing a minimal Sefaria API in Open Siddur?
This would be made a *lot* easier if Sefaria's UI code is actually implementation-independent. From what I've read, it sounds like it should be (since you're implementing a UI over an API). Is it?
I also don't have a sense yet of how much of the API would have to be implemented.

The question then resolves into the minimal data model mapping question.

We don't differentiate document types like "blessing." Text is text. It may be *annotated* as a blessing, if, for example, we wanted to allow searches over all blessings.

From Sefaria's point of view this is just label and only affects presentation.

OK, that's fine.

You could have one resource which has no structure and is just represented as a single string. Here's an example of all the data we'd need for one resource (and any data beyond this is not currently supported):

{

title: "Mourner's Kaddish",

language: "en",

sectionNames: ["Line"],

text: [

"Glorified and sanctified be God's great name throughout the world which He has created according to His will.",

"May He establish His kingdom in your lifetime and during your days, and within the life of the entire House of Israel, speedily and soon; and say, Amen.",
"May His great name be blessed forever and to all eternity.",
"Blessed and praised, glorified and exalted, extolled and honored, adored and lauded be the name of the Holy One, blessed be He, beyond all the blessings and hymns, praises and consolations that are ever spoken in the world; and say, Amen.",
"May there be abundant peace from heaven, and life, for us and for all Israel; and say, Amen.",
"He who creates peace in His celestial heights, may He create peace for us and for all Israel; and say, Amen."

]

}

Now I understand this a bit better. That almost maps directly into a streamText. Although, I would also ask why, if you can have Book>Chapter>Verse, you can't have Resource->Paragraph>Line? Or can you? Just having a paragraph hierarchy (or a choosable Resource>Line Group>Verse Line hierarchy) would work wonders over a flat file. Again, if a flat file is all you can serve, a flat file is better than nothing.

There's bit more metadata that I would like to make sure is retained (where the data came from and who is editing it), but that's details and I think Sefaria does have a rudimentary sourcing mechanism. There is something in there that looks like one.

Each text will then need to be transformed into an array of strings and posted to /api/texts/

I don't think this is currently possible without some loss of information.

I'm sure this will require loss of information.

OK, then the mapping challenge is to figure out how to retain as much as possible without making the Sefaria UI impossible to use. There, the devil is in the details.

Can a source sheet include other source sheets? That's basically Open Siddur's global data model (where s/source sheet/resource/g).

Not currently, but it's pretty easy to add. Sub-sources basically function this way now.

What's a sub-source? (example?)

A static table of contents can be made linking to each individual sheet. That way also a list of Siddurim and a nicely formatted table of contents for each Siddur could be hosted on opensiddur.org and only link out to sefaria.org for the contents.

As a first database-able pass, I think this would be great, as it would resolve one major problem we have that I mentioned above: getting data database-ready in any form. My only worry is too much loss of information, requiring a second encoding pass. If we can resolve that on Sefaria's end (or come up with a hack), that would be even better.

I don't think I'm the right person to right code to transform JLPTEI. If you can write a JLPTEI parser in Javascript it could live in the Sefaria client.

Once I understand how to map the data structures, it might be easier to write the transform in XSLT. This gets down to where the glue layer should live. (And, also, who's coding it.)

It depends on the context. The title, "Mourner's Kaddish" is a heading (I think you have those). "Mourner's Kaddish" itself is a resource. In the *best* case of the XML data model, "Kaddish" is a resource and the variant types of Kaddish (and the nusach-based textual variants) just have different associated conditionals.

For the purposes of this proposals, no conditionals would be supported. Kaddish and Mourner's Kaddish would need to be treated as separate resources.

OK. That's similar to some of the other conversion models we're looking at (STML, MediaWiki)

How does one tell what a comment applies to?

In a source sheet as they stand today, comments are just lines of text placed in some order. There's no semantics to it.

I'm not sure I understand this. The comment is somehow linked to what it comments on, right?

In the end, I would also like to have a UI for Open Siddur; I see Sefaria as a good start there too. My ideal would be to be able to use Open Siddur API calls on Sefaria's UI (understanding the caveat about the different data models). I don't really have the facility yet to determine whether that's possible or it would be just as much work fitting square pegs into circular holes as writing a UI from scratch. Thoughts on that?

I think the proposal I outlined is just to get a proof of concept for custom Siddur building. Having the current UI talk to you backend doesn't really achieve anything in and of itself, because the UI can't do anything with the richness of your data anyway. But it could proceed gradually. A JSON output of your API could include features that are simply ignored by the client until support is built in for them. Alternatively, if your API sticks with JLPTEI, I imagine you will want to have a javascript parser to build any kind of interactive UI anyways.

There are a number of ways to do it I have been considering:
1. Work on flattened JLPTEI files in XForms (requiring one rather easy bidirectional transform and XML on both client and server side).
2. Work on combined hierarchies in HTML using a modified existing HTML editor (allows for WYSIWYG siddur editing, which is a great feature, but also requires complete birdirectional XML<->HTML transforms, which I POCed in v.0.4.1)
3. Work in a de-novo Javascript environment
3a. in XML
3b. in XML->JS structures [this is essentially what Sefaria would be]

In any case, like I said before - if you want to move forward I suggest we go straight in to looking at a minimal list of resources needed for a proof of concept.

Agreed.

Shabbat shalom,

Efraim Feinstein

unread,

Jun 15, 2012, 6:46:48 PM6/15/12

to sefar...@googlegroups.com, Brett Lockspeiser, opensid...@googlegroups.com

PS If we do go with the idea (and I still don't have a sense for whether
it's the easiest way to go -- or the way where I can get some volunteer
help doing it) of implementing the Sefaria API over Open Siddur's
server, it would be really helpful to have an API test suite so the work
can be checked.

Shabbat shalom,

Brett Lockspeiser

unread,

Jun 18, 2012, 4:20:14 PM6/18/12

to opensid...@googlegroups.com, sefar...@googlegroups.com

I (and the sefaria code) sitll don't understand all the details of JLPTEI, so the easier step is to create a transform which simplifies the data.

Understood. The issue I was working out by listing all the possible data structures is that Sefaria has the UI (and the ability to get texts entered quickly), which means that the primary challenge is not Open Siddur->Sefaria, which is trivial. Instead, it's Sefaria->Open Siddur. That means, we need to map the data structures in such a way that they can be *upconverted* (and re-downconverted) reproducibly.

I'm not sure this makes sense to me. Putting your texts into Sefaria will involve data loss. It should not be considered an alternative datastore of your texts - it should just be considered a presentation level. If you want to get text out of sefaria, the API or just taking data dumps will let you do that, but why try to reconstruct lost data when you still have it in your db?

If you write this transform once, the same code can sit live on top of your API to allow sefaria frontend to talk to your backend as a next step, but as mentioned before, this requires branching the code.

The more I think through this branched model, the more complex it starts to seem. It's certainly possible, but I should make clear that I am not signing up to be the owner of a Sefaria branch for OpenSiddur (would love to, but am a little busy as you can imagine :) ). If it's just a matter of changing the target URLs I can host a different source sheet page for this (sefaria.org/opensiddur), but if this is going anywhere it will eventually be more than that.

One primary question is where the glue layer should be. There are 3 possible places I can think of:
1. In Sefaria: I don't know (rather, didn't look) at how abstracted Sefaria's UI is with respect to what gets sent out by API. If it's abstracted, then the glue layer would be in Sefaria's equivalent of read() or write() calls. I think this is what you suggested below by a transform in Javascript?

Yes, I was imaging something that function like:

$.get("http://www.jewishlitergy.org/path/to/Genesis.1", function(data) {

var parsed = JLPTEIParser(data);

var text = parsed.text();

console.log(text[0]) // "In the beginning God created the heave and the earth."

});

2. Between Sefaria and Open Siddur: Basically, a separate gateway server that calls both APIs. Might be the most difficult way to do it since it require syncing against 2 sets of APIs.

3. In Open Siddur: In eXist-db, JSON<->XML is possible (though I'm not 100% sure if all JSON structures can map), and you can then use XQuery or XSLT to do the transforms. The implementation question here would be who would call the glue API? Do we fork Sefaria and run the UI calling the Open Siddur server, essentially implementing a minimal Sefaria API in Open Siddur?
This would be made a *lot* easier if Sefaria's UI code is actually implementation-independent. From what I've read, it sounds like it should be (since you're implementing a UI over an API). Is it?

I don't know exactly what this means. If you have an API that does exactly what our API does, you could just swap out host names. If you're hosting the app you then also have to deal with a backed for storing sheets themselves, and the Django auth system...

I also don't have a sense yet of how much of the API would have to be implemented.

Unfortunately, I think the answer is basically all of it minus POSTs to /api/texts and GET /api/

The more I write the more I think this isn't a good idea...If you want to use OS backend I think the only sane path is just take the JS/HTML/CSS of the source sheet builder independent of Sefaria and work on it until it speaks your own language.

You could have one resource which has no structure and is just represented as a single string. Here's an example of all the data we'd need for one resource (and any data beyond this is not currently supported):

{

title: "Mourner's Kaddish",

language: "en",

sectionNames: ["Line"],

text: [

"Glorified and sanctified be God's great name throughout the world which He has created according to His will.",

"May He establish His kingdom in your lifetime and during your days, and within the life of the entire House of Israel, speedily and soon; and say, Amen.",
"May His great name be blessed forever and to all eternity.",
"Blessed and praised, glorified and exalted, extolled and honored, adored and lauded be the name of the Holy One, blessed be He, beyond all the blessings and hymns, praises and consolations that are ever spoken in the world; and say, Amen.",
"May there be abundant peace from heaven, and life, for us and for all Israel; and say, Amen.",
"He who creates peace in His celestial heights, may He create peace for us and for all Israel; and say, Amen."

]

}

Now I understand this a bit better. That almost maps directly into a streamText. Although, I would also ask why, if you can have Book>Chapter>Verse, you can't have Resource->Paragraph>Line? Or can you? Just having a paragraph hierarchy (or a choosable Resource>Line Group>Verse Line hierarchy) would work wonders over a flat file. Again, if a flat file is all you can serve, a flat file is better than nothing.

These divisions apply to an individual text. So you can have Resource>Paragraph>Line (or Monster>Taco>Birdie for that matter) if the text is first divided into a number of units called "Resource" which are in turn divided into "Paragraphs". There's not currently higher level text groups where you would say "Torah" is composed of Books>Chapters>Verses. We just say Bereishit is a text, it is composed of Chapter>Verse and it belongs to a category called Torah. Not sure what you mean by "Flat file". Texts are stored in nested groups (arrays of arrays of array of strings, say) but currently the API only serves them in chunks (chapter, daf, etc.) at the level of individual strings or arrays of strings.

There's bit more metadata that I would like to make sure is retained (where the data came from and who is editing it), but that's details and I think Sefaria does have a rudimentary sourcing mechanism. There is something in there that looks like one.

Yeah, I wrote this in a hurry and forget "versionTitle" and "versionSource" apply to each text version, as well there's some more info in the text index doc (which applies to multiple versions) such as title variants and categories.

How does one tell what a comment applies to?

In a source sheet as they stand today, comments are just lines of text placed in some order. There's no semantics to it.

I'm not sure I understand this. The comment is somehow linked to what it comments on, right?

re: sub-sources as well, take a look at : http://sefaria.org/sheets/6

A basic source sheet looks like this:

{

title: "My Sheet"

sources: [

{ref: "Job 2:3"},

{comment: "His this is a comment"}

{ref: "Genesis 3:4-6", subsources: [ {ref: "Amos 2:3"}, {comment: "This is a sub comment"}

]

}

Comment objects just so some text in a particular place. subsources, which can include comments, are displayed underneath a source and are indented.

OK, thanks for all the thoughts. If any is interested in following this develop Efraim has started a doc here:

https://docs.google.com/document/d/1AJ91L5eiVqxkTa8qrkigTCpswDjraydC1tn_mjJo1q8/edit

Cheers,

Brett

-- 
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

--
You received this message because you are subscribed to the Google Groups "opensiddur-tech" group.
To post to this group, send email to opensid...@googlegroups.com.
To unsubscribe from this group, send email to opensiddur-te...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/opensiddur-tech?hl=en.

Efraim Feinstein

unread,

Jun 18, 2012, 7:43:12 PM6/18/12

to sefar...@googlegroups.com, Brett Lockspeiser, opensid...@googlegroups.com

Brett,

I think we're talking past each other a bit by delving too deeply into
the tech too quickly.

Before I respond any further, I should ask about the big picture in your
view:

If your proof-of-concept goal is completed, what would be the results?

Brett Lockspeiser

unread,

Jun 18, 2012, 8:00:15 PM6/18/12

to Efraim Feinstein, sefar...@googlegroups.com, opensid...@googlegroups.com

Probably a good idea to step back here, thanks...

I am responding here to Aharon's request (as I understood it) to try to use Sefaria UI to achieve the OpenSiddur milestone of having a usable demo experience of custom Siddur building.

The result would be having something that you could send to your users and say, this is just an initial demo, but try it out and let us know what you think. This would not stand in place of a dedicated OpenSiddur UI, but it could give you more clarity about your design decisions and requirements for when you do build your own UI (which may or may not make sense to do as a fork of Sefaria client side code, this would be a step towards figuring that out).

So maybe I've misunderstood your interests. Is this something you're interested in or are you thinking about something else?

Thanks,
Brett

Efraim Feinstein

unread,

Jun 19, 2012, 12:59:28 PM6/19/12

to Brett Lockspeiser, sefar...@googlegroups.com, opensid...@googlegroups.com

Hi,

On 06/18/2012 08:00 PM, Brett Lockspeiser wrote:
> Probably a good idea to step back here, thanks...
>
> I am responding here to Aharon's request (as I understood it) to try
> to use Sefaria UI to achieve the OpenSiddur milestone of having a
> usable demo experience of custom Siddur building.
>
> The result would be having something that you could send to your users
> and say, this is just an initial demo, but try it out and let us know
> what you think. This would not stand in place of a dedicated
> OpenSiddur UI, but it could give you more clarity about your design
> decisions and requirements for when you do build your own UI (which
> may or may not make sense to do as a fork of Sefaria client side code,
> this would be a step towards figuring that out).
>
> So maybe I've misunderstood your interests. Is this something you're
> interested in or are you thinking about something else?

I understand why we're talking past each other. The question I was
working on was "what would it take to use the Sefaria UI with an Open
Siddur backend as a testing UI for bidirectional editing that implements
a minimal set of concepts?" since what's killing me is the lack of UI/UI
coders and you have UI and UI coders. Obviously, the two problems
require very different types of answers/time investments.

The condition I would try to enforce would be that when we do extract
the data from a data dump, we're not missing anything essential. I think
the only issues we'll probably need to work out involve linkage to
metadata and more cultural issues (which may or may not be important -
there are ways we can make them less important - as I haven't wrapped my
head around how Sefaria stores variant texts of the same thing).

In terms of tech, if Sefaria can handle arrays of unstructured text,
then we'll just have to work with arrays of unstructured text. (When you
have a hammer, everything looks like a nail?)

It seems like it could be worthwhile (and, probably not much work).
Aharon -- I'll defer to you on what you think.

Aharon Varady

unread,

Jun 20, 2012, 3:25:46 PM6/20/12

to opensid...@googlegroups.com, Brett Lockspeiser, sefar...@googlegroups.com

On Tue, Jun 19, 2012 at 7:59 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:

It seems like it could be worthwhile (and, probably not much work). Aharon -- I'll defer to you on what you think.

It's certainly worthwhile to try and get something working on Sefaria now, if it's not too much work. I'll be keen to learn what more is needed to get this going for both our projects.

Aharon

Efraim Feinstein

unread,

Jun 20, 2012, 3:29:15 PM6/20/12

to opensid...@googlegroups.com, Aharon Varady, Brett Lockspeiser, sefar...@googlegroups.com

Hi,

I think most of the "work" will be in transcription guidelines/standards more than anything technical.

Brett asked for a "list of components" -- can you give us an example of what you mean by a "component?"

Brett Lockspeiser

unread,

Jun 21, 2012, 12:45:17 PM6/21/12

to opensid...@googlegroups.com, Aharon Varady, sefar...@googlegroups.com

On Wed, Jun 20, 2012 at 12:29 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:

Hi,

On 06/20/2012 03:25 PM, Aharon Varady wrote:

On Tue, Jun 19, 2012 at 7:59 PM, Efraim Feinstein <efraim.f...@gmail.com> wrote:

It seems like it could be worthwhile (and, probably not much work). Aharon -- I'll defer to you on what you think.

It's certainly worthwhile to try and get something working on Sefaria now, if it's not too much work. I'll be keen to learn what more is needed to get this going for both our projects.

I think most of the "work" will be in transcription guidelines/standards more than anything technical.

Brett asked for a "list of components" -- can you give us an example of what you mean by a "component?"

"Kaddish", "Mourner's Kaddish", "Ashrei, "Birchat haShachar" are all the kinds of things I mean by "components". These are the named "chunks" of text that a user could include in a Siddur (just as, in the source sheet builder, you name a source to include it). They'd need to have uniform structure, meaning they are either a single string of text or are an array of strings. I'm assuming that this level of "chunking" is somewhat new to your data model, which is why I think having a list is the first step in checking that this is possible and will produce a desirable result.

I'm still a little unsure if we're on the same page here though... Could we take this conversation to Skype to make sure?

Thanks,

Brett

-- 
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

Marc Stober

unread,

Jun 21, 2012, 7:21:53 PM6/21/12

to opensid...@googlegroups.com

Sorry to just jump in on this conversation in progress but here's my thought:

What you want is a UI that can pull texts out of the OpenSiddur XML database and let a web user build a siddur.

I do not think the goal for the "minimum viable product" needs to involve saving anything back to OpenSiddur. I know this is on of Efraim's goals, but simply letting a user save locally and print would be start.

What do you think?

- Marc

--
marcs...@gmail.com ~ www.marcstober.com ~ twitter: marcstober

Efraim Feinstein

unread,

Jun 24, 2012, 7:22:26 PM6/24/12

to opensid...@googlegroups.com, Marc Stober

Hi,

On 06/21/2012 07:21 PM, Marc Stober wrote:
> Sorry to just jump in on this conversation in progress but here's my
> thought:

Jumping into conversations is what we want. If we didn't want comments,
we'd hold them in private :-)

>
> What you want is a UI that can pull texts out of the OpenSiddur XML
> database and let a web user build a siddur.
>
> I do not think the goal for the "minimum viable product" needs to
> involve saving anything back to OpenSiddur. I know this is on of
> Efraim's goals, but simply letting a user save locally and print would
> be start.
>
> What do you think?

We essentially have these interlinked problems:

0. Incomplete db/API implementation: there's a complete plan for solving
it and it's making progress. The db schemas for getting basic data in to
the database are published. In about a week, all the APIs for getting
basic data in the database will be finished. *But...*

1. Aside from the Tanach, there's no data in the database. Thanks to all
of the people who transcribed texts (particularly Shmueli Gonzales,
Rallis Wiesenthal, and a host of others), we have a lot of data
available to us. Most of it is in word processor formats, which cannot
be directly imported into *any* database. Some of it is in PDFs with
extractable data (1917 JPS), and some of it is in PDFs where data
extraction is a waste of time. This is where I see Brett's plan helping
us: making completely unstructured data into incompletely structured
data. We can work with incompletely structured data! Fitzchak's effort
may also help with this. Your (Marc's) effort on the 1917 JPS will also
help. Dale's effort on the Singer siddur translation will help too.

2. No UI for users to get data out of the database: #1 is a problem for
this. In the database API plan, there are HTML outputs that could be
used directly (and 0.4.1 POC-ed it). A *very* simple read-only UI would
involve a full-text or title search of data leading to a list of texts
that contained it and some sample outputs. An even simpler one would
just present the user with all possible outputs and allow them to
select. So, on a technical level, I think #2 is relatively easy
(although difficult to make it pretty). #1 is hard and #1 is a showstopper.

3. An editing UI. This is hard, and really needs more developer effort
than I can put in simultaneously with finishing the server.

Reply all

Reply to author

Forward