Re: [Sefaria Project] Re: Large, small, raised letters, broken vav, joined kuf?

37 views
Skip to first unread message

Aharon Varady

unread,
Feb 13, 2017, 11:28:09 PM2/13/17
to sef...@googlegroups.com, Open Siddur Technical Discussion List, Efraim Feinstein
Seth, I think many interested folk will need to know that your text is in a standard and open exchange data format suitable for import into other projects. Could you provide links to the source of the data that is used for display at Hebrew wikisource and alhatorah.org?

Efraim, what do we need to import Seth's Miqra al pi hamesorah into the Open Siddur Project database? I believe that the copyrighted material included in it is shared under a copyleft license, the CC BY-SA.

cross-posting this to the opensiddur-tech discussion list.

Aharon


On Mon, Feb 13, 2017 at 10:21 PM, Seth (Avi) Kadish <skad...@gmail.com> wrote:
For an accurate digital version of the biblical text, one which fully documents all of these features and implements most of them (to the extent that Unicode currently allows), and in addition is superior for Jewish purposes to the digital source of the biblical texts currently used by Sefaria, see here:
https://en.wikisource.org/wiki/User:Dovi/Miqra_according_to_the_Mesorah

This text is open source and can be implemented within Sefaria. It has already been implemented as the base biblical text at Mikraot Gedolot AlHaTorah:
http://mg.alhatorah.org/

Avi


On Monday, February 13, 2017 at 2:07:10 PM UTC+2, Ephraim Damboritz wrote:
Hi,
Thanks for pointing these out. 

Some of these characters are not so easy to render digitally due to the limitations of both our digital source of the biblical texts and of the available digital character set (see: Unicode).  
For those that are available to us to fix, we will be looking into this issue more deeply. 

Thanks for your input. 


On Saturday, February 11, 2017 at 7:14:20 AM UTC+2, Isaac Mayer wrote:
Hi. Just noticed that the Sefaria Tanakh with Ta'amei Miqra text is missing a lot of unique scribal things. I can't find any large, small, or raised lettters, and I can't find any joined kuf or broken vav. I attached screenshots of examples of each of these cobbled from different online manuscripts, as well as the Sefaria text w/o them. Just wondering if someone's working on adding these. (The Ezra dots and the inverted nuns are there.)

Here's a website that lists most of them.

Thanks

--
You received this message because you are subscribed to the Google Groups "Sefaria Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sefaria+unsubscribe@googlegroups.com.
To post to this group, send email to sef...@googlegroups.com.
Visit this group at https://groups.google.com/group/sefaria.
For more options, visit https://groups.google.com/d/optout.



--
Aharon Varady, M.C.P., M.A.J.Ed.
Community Planner, Educator

Seth (Avi) Kadish

unread,
Feb 15, 2017, 12:18:04 AM2/15/17
to Sefaria Project, opensid...@googlegroups.com, efr...@opensiddur.org
Hi, the source of the Tanakh that is displayed at Hebrew Wikisource and at alhatorah.org is found in the database at this link.
Kol tuv,
Avi


On Tuesday, February 14, 2017 at 6:28:09 AM UTC+2, Aharon Varady wrote:
Seth, I think many interested folk will need to know that your text is in a standard and open exchange data format suitable for import into other projects. Could you provide links to the source of the data that is used for display at Hebrew wikisource and alhatorah.org?

Efraim, what do we need to import Seth's Miqra al pi hamesorah into the Open Siddur Project database? I believe that the copyrighted material included in it is shared under a copyleft license, the CC BY-SA.

cross-posting this to the opensiddur-tech discussion list.

Aharon

On Mon, Feb 13, 2017 at 10:21 PM, Seth (Avi) Kadish <skad...@gmail.com> wrote:
For an accurate digital version of the biblical text, one which fully documents all of these features and implements most of them (to the extent that Unicode currently allows), and in addition is superior for Jewish purposes to the digital source of the biblical texts currently used by Sefaria, see here:
https://en.wikisource.org/wiki/User:Dovi/Miqra_according_to_the_Mesorah

This text is open source and can be implemented within Sefaria. It has already been implemented as the base biblical text at Mikraot Gedolot AlHaTorah:
http://mg.alhatorah.org/

Avi


On Monday, February 13, 2017 at 2:07:10 PM UTC+2, Ephraim Damboritz wrote:
Hi,
Thanks for pointing these out. 

Some of these characters are not so easy to render digitally due to the limitations of both our digital source of the biblical texts and of the available digital character set (see: Unicode).  
For those that are available to us to fix, we will be looking into this issue more deeply. 

Thanks for your input. 


On Saturday, February 11, 2017 at 7:14:20 AM UTC+2, Isaac Mayer wrote:
Hi. Just noticed that the Sefaria Tanakh with Ta'amei Miqra text is missing a lot of unique scribal things. I can't find any large, small, or raised lettters, and I can't find any joined kuf or broken vav. I attached screenshots of examples of each of these cobbled from different online manuscripts, as well as the Sefaria text w/o them. Just wondering if someone's working on adding these. (The Ezra dots and the inverted nuns are there.)

Here's a website that lists most of them.

Thanks

--
You received this message because you are subscribed to the Google Groups "Sefaria Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sefaria+u...@googlegroups.com.

To post to this group, send email to sef...@googlegroups.com.
Visit this group at https://groups.google.com/group/sefaria.
For more options, visit https://groups.google.com/d/optout.

Marc Stober

unread,
Oct 23, 2019, 5:02:04 PM10/23/19
to sef...@googlegroups.com, Seth (Avi) Kadish, opensid...@googlegroups.com, efr...@opensiddur.org
Hi all, I know I'm reviving a very old thread here! Is the best way currently to use the source text from the Miqra al pi ha-Mesorah to get it from the Google sheets?

It feels like it would be nice to have the text rendered into something like CSV or XML with the templates processed, so it that I don't have to re-code that, but also what is on wikisource is fully processed into a display form that it not ideal for programmatic use also. I can picture something like this being even in the form of files on Github or something.

Let it goes without saying - much gratitude to Seth (Avi) for the work on this wonderful resource!

Kol tuv, Marc
Marc Stober
Jewish educator, cantorial student, and more...

Seth (Avi) Kadish

unread,
Oct 24, 2019, 1:03:45 AM10/24/19
to opensiddur-tech
Hi, it would be great if the text could be rendered into such a form. The best way to get it there is from the Google Sheets spreadsheet, which divides the text into it's various basic elements.

I imagine that the basic rendering work would be automatic. The one thing that would take require some patience and careful work is in deciding how to code the templates which render the special forms of certain letters or words (most do simple things like documenting variants, but some do more complicated things). Each template is really just a bit of coding, usually fairly simple coding, but that coding is currently in Mediawiki code, and probably needs to be in a more universal, widely understood format.

I would be happy to assist in terms of explaining the templates and their functions, how the text is currently structured, and what the requirements are for fully representing Masoretic text in Unicode.

Avi
To unsubscribe from this group and stop receiving emails from it, send an email to sef...@googlegroups.com.

To post to this group, send email to sef...@googlegroups.com.
Visit this group at https://groups.google.com/group/sefaria.
For more options, visit https://groups.google.com/d/optout.

Efraim Feinstein

unread,
Oct 24, 2019, 1:31:19 AM10/24/19
to opensid...@googlegroups.com, Efraim Feinstein
On Mon, Feb 13, 2017 at 8:28 PM Aharon Varady <aharon...@gmail.com> wrote:
Seth, I think many interested folk will need to know that your text is in a standard and open exchange data format suitable for import into other projects. Could you provide links to the source of the data that is used for display at Hebrew wikisource and alhatorah.org?

Efraim, what do we need to import Seth's Miqra al pi hamesorah into the Open Siddur Project database? I believe that the copyrighted material included in it is shared under a copyleft license, the CC BY-SA.

Someone to write the code to import it. It's not difficult. It does take some time. I would be happy to work with anyone who wanted to do it. 

As of now, I'm trying to replace some ancient parts of the app with something workable so I can finally release the current develop branch that gets rid of the "segmentation" concept and makes it possible to describe to someone how to write TEI for Open Siddur without resorting to vague descriptions like "and you have to segment the text at the smallest unit of meaning".
 

Seth (Avi) Kadish

unread,
Oct 25, 2019, 2:48:43 AM10/25/19
to opensiddur-tech
One more thing regarding this: It would be a good thing if, in the years to come, all of the "upkeep" for this text project were to be kept in one place, rather than in duplicate locations. Even though I don't have the skills to participate in the coding, I still make ongoing corrections to the Hebrew text when errors are found or where the documentation needs to be improved. Currently, I do that in the spreadsheet, and then make a parallel correction at Hebrew Wikisource. (I also send a note to mg.alhatorah.org so that it will be corrected there too.) Theoretically, the spreadsheet text could be periodically uploaded to Wikisource, and then all the corrections would happen automatically, and there would be only one place that manual corrections need to be made. But this requires a programmer to run a Mediawiki bot, and there is no one to do that currently, so whenever there is a correction I make it in both place simultaneously.

In light of this, it seems that if the project were coded as Marc proposes, maybe at Github, it would still be a good thing if updates could be made in a single place. Github might be a better platform for this than a spreadsheet, but it raises some questions:
1. Could I continue to make corrections to the Hebrew text in a coded project?
2. Might the coding make it more difficult to make textual corrections and updates in Hebrew, given that the Hebrew text will be imbedded within a mass of code in Latin letters?
3. How might the number of places that a manual correction is made be reduced, ideally to one, and not raised to three?

Shabbat shalom,
Avi

Aharon Varady

unread,
Oct 25, 2019, 3:04:12 AM10/25/19
to Open Siddur Technical Discussion List
I would certainly appreciate a note of any correction you make, as I've used your Miqra al Pi haMesora for Tzemah Yoreh's Supplementary Hypothesis color-coded parsing of the Masoretic text. Mainly I want to keep the parsing I made in a re-usable form. Currently, it's all up in opensiddur.org and in github.com/aharonium/opensiddur.org 

--
You received this message because you are subscribed to the Google Groups "opensiddur-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensiddur-te...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opensiddur-tech/52f6b50e-4f4d-45b6-8907-87304bb21d4b%40googlegroups.com.


--
Aharon Varady, M.C.P., M.A.J.Ed.
Community Planner, Educator
Pronouns: He/him/his

Amanda J. Rush

unread,
Oct 25, 2019, 4:07:44 AM10/25/19
to opensid...@googlegroups.com

Is json an option? It wouldn't handle the templating of course, but is readable by a lot more stuff including humans.


Amanda

You received this message because you are subscribed to the Google Groups "opensiddur-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensiddur-te...@googlegroups.com.

Seth (Avi) Kadish

unread,
Oct 25, 2019, 5:14:21 AM10/25/19
to opensiddur-tech
All corrections can be found, of course, in the history of revisions for the Google Sheet.

Plus there is now an organized list of corrections that have been made since 12 Elul 5759 (end of the summer). I got a lot of feedback this summer from someone who did a very careful check of Torah, Haftarot, and 5 Megillot, and resulted in improvements to the text. See for instance the expansion of documentation for Kohelet 12:4 that I updated this morning.

In terms of coding options, I'm not don't know how any particular kind of code would "handle" the templates. But any code to be used must be able to at the very least mark and identify the templates. Otherwise, if the templates are not included, then what the code contains will not fully represent the features of the masoretic Bible.
To unsubscribe from this group and stop receiving emails from it, send an email to sef...@googlegroups.com.

To post to this group, send email to sef...@googlegroups.com.
Visit this group at https://groups.google.com/group/sefaria.
For more options, visit https://groups.google.com/d/optout.


--
Marc Stober
Jewish educator, cantorial student, and more...

--
You received this message because you are subscribed to the Google Groups "opensiddur-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensid...@googlegroups.com.

Marc Stober

unread,
Dec 2, 2019, 12:11:08 PM12/2/19
to opensid...@googlegroups.com
Hi Avi,

Been a busy month but wanted to pick this thread back up.

Can you share with me the Mediawiki code that handles the templates? Is that online somewhere?

Yes, I think being able to update things in one place is a good goal and having a way to incorporate those changes automatically into other projects is a good, too. I'm not sure whether doing the editing as a Google Sheet or at Github makes more sense, either could probably work, and I think at Github it would be done in data files, so I wouldn't worry too much about it being mixed with with (Latin character) code.

In the meantime I wrote a script to get the data out of Google Sheets in CSV which is a start to making it more programmatically accessible: https://github.com/marcstober/miqra-scripts/blob/master/downloadcsv.py. I notice that it doesn't include the links on the README sheet, but hopefully in includes everything else. This could either be part of a process that gets edits from the Google sheet into other systems, or could be used to migrate the "source of truth" elsewhere if you wanted to do that.

Thanks, Marc

--
You received this message because you are subscribed to the Google Groups "opensiddur-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensiddur-te...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opensiddur-tech/52f6b50e-4f4d-45b6-8907-87304bb21d4b%40googlegroups.com.

Seth (Avi) Kadish

unread,
Dec 7, 2019, 11:56:58 AM12/7/19
to opensiddur-tech
Shavua tov Marc, and thanks so much!

There is a full list of templates, with direct links to their template pages, inside the spreadsheet itself (it's the first sheet on the right before "Torah"). When you go to a template page, simply click on "edit" to see the Mediawiki code for it. For the vast majority of them, the code is short and simple.

I'm open to suggestions about where the best long-term "home" for this material might be. I like the idea of a Hebrew data file where Latin-character-code doesn't mix things up. Do you know of an example at Github of something like that?

In terms of a script to get the data out of the Google Sheets, see Erel's old code here (which was used to upload the material at Wikisource). Maybe it will be useful to you in some way.

Avi
To unsubscribe from this group and stop receiving emails from it, send an email to opensid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages