KJV in JSON

893 views
Skip to first unread message

scruffian

unread,
Jun 18, 2010, 5:21:50 AM6/18/10
to Open Scriptures
Hi All

I have a copy of the KJV bible in xml format, which I spent some time
converting to JSON. However I just realised there was an error in the
way I have done it and I have lost some data.

Before I go back and do this again, just wondered if anyone already
has this, or have a better way of doing it than using find and replace
with regular expressions!

Thanks
Ben

Efraim Feinstein

unread,
Jun 18, 2010, 12:58:52 PM6/18/10
to open-sc...@googlegroups.com
Hi,

On 06/18/2010 05:21 AM, scruffian wrote:
> I have a copy of the KJV bible in xml format, which I spent some time
> converting to JSON. However I just realised there was an error in the
> way I have done it and I have lost some data.
>

If your data starts as XML, you can convert it using:
- XSLT (XML Stylesheet Language - Transformations)
- SAX (Simple API for XML)
- Iterating through nodes in a DOM
- Iterating using a different library (like Python's ElementTree)

Writing your own XML parser using regular expressions is reinventing the
wheel.

--
---
Efraim Feinstein
Lead Developer
Open Siddur Project
http://opensiddur.net
http://wiki.jewishliturgy.org

b...@boticca.com

unread,
Jun 18, 2010, 4:56:50 PM6/18/10
to Open Scriptures
Thanks. My bible is XML . I have no idea how to start with these
tools...! Any help would be appreciated.

On Jun 18, 5:58 pm, Efraim Feinstein <efraim.feinst...@gmail.com>
wrote:

Efraim Feinstein

unread,
Jun 18, 2010, 5:26:00 PM6/18/10
to open-sc...@googlegroups.com
On 06/18/2010 04:56 PM, b...@boticca.com wrote:
> Thanks. My bible is XML . I have no idea how to start with these
> tools...! Any help would be appreciated.
>

For XSLT (1.0), a good place to start is W3Schools
<http://www.w3schools.com/xsl/default.asp>. There are a number of XSLT
1.0 interpreters out there, and there are XSLT libraries that interface
directly with other languages. For standalone XSLT 2.0 or XSLT 2.0 as a
Java library, the best open source solution is Saxon HE.

For the others, what library to use and how to interface with it really
depends on what your favorite language is. Google is your friend.

>> If your data starts as XML, you can convert it using:
>> - XSLT (XML Stylesheet Language - Transformations)
>> - SAX (Simple API for XML)
>> - Iterating through nodes in a DOM
>> - Iterating using a different library (like Python's ElementTree)
>>

--
---
Efraim Feinstein
Lead Developer

Nigel Chapman

unread,
Jun 18, 2010, 8:40:17 PM6/18/10
to open-sc...@googlegroups.com
Hi Efraim,

If you know PHP (or similar languages), SimpleXML is probably the
easiest approach. It turns the XML into one big nested array, and you
just loop through it.

http://php.net/manual/en/book.simplexml.php

Nigel.

> --
> You received this message because you are subscribed to the Google Groups
> "Open Scriptures" group.
> To post to this group, send email to open-sc...@googlegroups.com.
> To unsubscribe from this group, send email to
> open-scriptur...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-scriptures?hl=en.
>
>

David Troidl

unread,
Jun 19, 2010, 12:06:44 PM6/19/10
to open-sc...@googlegroups.com
Hi Ben,

Yes find and replace can be a great tool, but when I converted the
Strong's Hebrew dictionary to JSON, I used XSLT. It gives you a little
better control. Depending on the complexity of your text, it may not be
too difficult to make up a quick style sheet.

Peace,

David

senihr

unread,
Jul 8, 2010, 9:05:47 PM7/8/10
to Open Scriptures
Yes! I was about to do the same thing using python and beautifulsoup,
except I don't have the xml yet.

Race you. I'll PM you the results when I get done.

Cheers,
Jeff

Weston Ruter

unread,
Jul 8, 2010, 9:17:23 PM7/8/10
to open-sc...@googlegroups.com, Nathan Smith
BTW, see CrossWire's KJV2006 project which has KJV in OSIS (XML) with Strong's numbers: http://crosswire.org/~dmsmith/kjv2006/

Nathan Smith (shortgoliath) mentioned he was going to look into making an importer to get the data into the Open Scriptures data models.

Weston

Patrick Altman

unread,
Jul 8, 2010, 9:30:40 PM7/8/10
to open-sc...@googlegroups.com, open-sc...@googlegroups.com, Nathan Smith
Json > XML 

IMHO 

---
Patrick Altman

[Sent from my iPhone]

Weston Ruter

unread,
Jul 8, 2010, 10:34:16 PM7/8/10
to Patrick Altman, Nathan Smith, open-sc...@googlegroups.com
Patrick: I meant that it's source data that can be converted to JSON, not that XML should be used instead.

But now I'm curious, how would you recommend representing a Biblical text like KJV in JSON? I can see how it would be done pretty logically by making each Token a JSON object and putting them into an array, but are you thinking just storing each verse in as a separate string? The difficulty comes when you've got footnotes, paragraphs, sections, lines and other constructs that overlap verse boundaries. (A problem which XML also has.)

Weston

Patrick Altman

unread,
Jul 8, 2010, 11:25:54 PM7/8/10
to Weston Ruter, Nathan Smith, open-sc...@googlegroups.com
I really have no idea what I am talking about. Don't listen to me. ;-)

James Tauber

unread,
Jul 9, 2010, 12:43:01 AM7/9/10
to open-sc...@googlegroups.com

On Jul 8, 2010, at 9:30 PM, Patrick Altman wrote:

> Json > XML

For object serialization, sure. For document markup, not so much.

James

Patrick Altman

unread,
Jul 9, 2010, 1:27:06 AM7/9/10
to open-sc...@googlegroups.com
Good point and distinction. Up until this project, most of my experience has been in object-serialization land.

scruffian

unread,
Aug 4, 2010, 10:33:11 AM8/4/10
to Open Scriptures
Ok I have been having a go at this but there are some issues. I tried
converting using this - http://www.thomasfrank.se/xml_to_json.html.
For example I started converting Genesis 1 v 1. The XML from that
source is this:

<verse osisID="Gen.1.1" sID="Gen.1.1"/><w lemma="strong:H07225">In the
beginning</w> <w lemma="strong:H0430">God</w> <w
morph="strongMorph:TH8804" lemma="strong:H0853 strong:H01254">created</
w> <w lemma="strong:H08064">the heaven</w> <w
lemma="strong:H0853">and</w> <w lemma="strong:H0776">the earth</
w>.<verse eID="Gen.1.1"/>

but surely that should be

<verse osisID="Gen.1.1" sID="Gen.1.1">...</verse eID="Gen.1.1">

for a start.

These are the results:

{
verse:{
osisid:'Gen.1.1',
sid:'Gen.1.1',
w:[
{
lemma:'strong:H07225'
},
{
lemma:'strong:H0430'
},
{
morph:'strongMorph:TH8804',
lemma:'strong:H0853 strong:H01254'
},
{
lemma:'strong:H08064'
},
{
lemma:'strong:H0853'
},
{
lemma:'strong:H0776'
}
]
},
eid:'Gen.1.1'
}

Obviously not that helpful! This raises a tough question - how can we
represent verse level data in a JSON structure, whilst maintaining
strongs numbers etc?

How about something like this:


{
verse:{
osisid:'Gen.1.1',
sid:'Gen.1.1',
w:[
{
lemma:'strong:H07225',
content:'In the beginning'
},
{
lemma:'strong:H0430',
content:'God'
},
{
morph:'strongMorph:TH8804',
lemma:'strong:H0853 strong:H01254',
content:'created'
},
{
lemma:'strong:H08064',
content:'the heavens'
},
{
lemma:'strong:H0853',
content:'and'
},
{
lemma:'strong:H0776',
content:'the earth'
}
]
}
}


This helps, but there is still the question of extra spaces,
punctuation, words with more than one strongs number....

I'd be interested to hear your ideas.

Thanks
Ben
> > open-scriptur...@googlegroups.com<open-scriptures%2Bunsu...@googlegroups.com>
> > .

david...@aol.com

unread,
Aug 4, 2010, 1:59:45 PM8/4/10
to open-sc...@googlegroups.com
Hi Ben,


On 8/4/2010 10:33 AM, scruffian wrote:
Ok I have been having a go at this but there are some issues. I tried
converting using this - http://www.thomasfrank.se/xml_to_json.html.
For example I started converting Genesis 1 v 1. The XML from that
source is this:

<verse osisID="Gen.1.1" sID="Gen.1.1"/><w lemma="strong:H07225">In the
beginning</w> <w lemma="strong:H0430">God</w> <w
morph="strongMorph:TH8804" lemma="strong:H0853 strong:H01254">created</
w> <w lemma="strong:H08064">the heaven</w> <w
lemma="strong:H0853">and</w> <w lemma="strong:H0776">the earth</
w>.<verse eID="Gen.1.1"/>

but surely that should be

<verse osisID="Gen.1.1" sID="Gen.1.1">...</verse eID="Gen.1.1">
No.  The reason separate start and end tags, called milestones, are used is because the chapter and verse structure doesn't always match the section and paragraph structure, that is the recommended way of marking up OSIS bibles.  And no content, like eID="Gen.1.1", is allowed in a closing tag, by the XML specification.
Yes, there are a lot of questions, just for chapter and verse markup.  Then what if you also want paragraphs?  I'm not sure what the answer is.

Peace,

David
Reply all
Reply to author
Forward
0 new messages