how to convert boards.ie (SIOC formatted) data into some database

122 views
Skip to first unread message

shahzad

unread,
Oct 12, 2012, 3:52:48 AM10/12/12
to icwsm...@googlegroups.com

Greetings

I have obtained boards.ie data set from ICWSM.

I want to use thread and post conversation text for mining purpose. For this I need all threads with respective posts in some database.

But the data is in SIOC format.

Kindly suggest me how to convert this SIOC formatted data in to some database like mysql or sql server.

Regards

Jodi Schneider

unread,
Oct 12, 2012, 7:11:33 AM10/12/12
to icwsm...@googlegroups.com, sioc...@googlegroups.com
Hi Shahzad,

If your main need is to query the data, the best sort of database to use would be a triple store [1]. Triple stores are designed for RDF data such as that in SIOC. It is possible to convert from relational databases to RDF databases; there are a variety of tools, and the W3C RDB2RDF group webpage might be helpful in identifying a suitable mapping tool [2].

Documentation on SIOC may also be helpful:

Perhaps others will have additional advice for you?

Best,

Jodi


--
You received this message because you are subscribed to the Google Groups "icwsm-data" group.
To view this discussion on the web visit https://groups.google.com/d/msg/icwsm-data/-/xP5OG1-dvycJ.
To post to this group, send email to icwsm...@googlegroups.com.
To unsubscribe from this group, send email to icwsm-data+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/icwsm-data?hl=en.

shahzad

unread,
Oct 15, 2012, 12:26:19 AM10/15/12
to icwsm...@googlegroups.com, sioc...@googlegroups.com, jschn...@pobox.com

Thank you Sir for prompt reply.

My need is not only to query data but to get the thread / post data of boards.ie in some relational data base (mysql) or in MS-EXCEL  because I have to compute text similarities among all threads and their respective posts.Therefore these posts must be present in some relational database format because I have to reference each post with its respective thread using foreign key concept.

Regards

shahzad

unread,
Oct 15, 2012, 12:29:10 AM10/15/12
to icwsm...@googlegroups.com, jschn...@pobox.com

Greetings


My need is not only to query data but to get the thread / post data of boards.ie in some relational data base (mysql) or in MS-EXCEL  because I have to compute text similarities among all threads and their respective  posts.Therefore these posts must be present in some relational database  format because I have to reference each post with its respective thread  using foreign key concept.

Regards


On Friday, October 12, 2012 4:11:37 PM UTC+5, Jodi Schneider wrote:

Jodi Schneider

unread,
Jun 13, 2013, 4:51:37 AM6/13/13
to Amendra Shrestha, sioc...@googlegroups.com, icwsm...@googlegroups.com
Amendra,

On Thu, Jun 13, 2013 at 9:48 AM, Amendra Shrestha <amendra...@gmail.com> wrote:
You can make your own parser and store only the necessary tags from RDF files and store in some relational database.
I have also used the data from boards.ie for my master thesis. I have parsed the SIOC format file in java using SAX parser and stored into MySQL tables.

Perhaps you'd be willing to share the code? I'm sure others would find it useful.

-Jodi

Jodi Schneider

unread,
Jun 13, 2013, 6:57:00 AM6/13/13
to Amendra Shrestha, shahzad, sioc...@googlegroups.com, icwsm...@googlegroups.com
Shahzad, if you're still interested in the boards.ie data, the code below should help you:
http://board-ie-parser.googlecode.com/svn/trunk/

Amendra parsed the boards.ie SIOC files into MySQL tables using Java with the SAX parser. 

Thanks, Amendra! It would also be great to get a link to your Master's thesis when it's done.

-Jodi


On Thu, Jun 13, 2013 at 11:32 AM, Amendra Shrestha <aminsh...@gmail.com> wrote:
I have hosted the code in google code. You can find the code in following link:

http://board-ie-parser.googlecode.com/svn/trunk/

You need to make some changes in database file as your database name.
Thank You.
Reply all
Reply to author
Forward
0 new messages