Is it possible to convert MARC to dubline-core for each record.

396 views
Skip to first unread message

Avishek Paroi

unread,
Mar 10, 2015, 8:22:20 AM3/10/15
to pym...@googlegroups.com
Hi , 
I am newbie I want to know if it is possible to extract extract data from MARC file and create dublin-core metadata xml  for each record .My MARC file has 102051 records 

Avishek Paroi

unread,
Mar 10, 2015, 11:13:07 AM3/10/15
to pym...@googlegroups.com
I need it because i have to use it in Dspace.(sorry for any grammatical mistake )

Becky Yoose

unread,
Mar 10, 2015, 2:26:05 PM3/10/15
to pym...@googlegroups.com
Hello Avishek,

Not related to pymarc, but an alternative to explore... You might want to check out MarcEdit - http://marcedit.reeset.net/downloads. ME allows for you to turn MARC records into DC with the MARC Tools function.

There is an active community around ME if you run into any issues; you can sign up for their listserv at http://listserv.gmu.edu/cgi-bin/wa?A0=marcedit-l

Thanks,
Becky

---------------------------------
Becky Yoose
Discovery and Integrated Systems Librarian
Grinnell College


--
You received this message because you are subscribed to the Google Groups "pymarc Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymarc+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Nelson

unread,
Mar 11, 2015, 12:12:18 PM3/11/15
to pym...@googlegroups.com
Hi Avishek,
I create a GIST, https://gist.github.com/jermnelson/b1d908044f02032d2953, with a Python script you can run from the command-line that takes a file-path to a MARC file,  iterates through the file, generates a MARC XML and then uses the Library of Congress MARCXML to RDF Simple Dublin Core XSLT (http://www.loc.gov/standards/marcxml/xslt/MARC21slim2RDFDC.xsl) to transform the MARC XML to DC RDF XML (there are other MARC XML to DC stylesheets that might be closer to you what you need with DSpace) that is then saved to a local directory as separate file. You'll need to install lxml (http://lxml.de/) for XSLT processing. 

As written, the script processes about 72 MARC records per second on my Windows workstation, so you should be able to convert your 102051 records in about an half-hour (of course your performance may vary depending on your computer's speed, memory, etc.)

Let me know if you have problems. 

Jeremy Nelson
Metadata and Systems Librarian
Colorado College

Avishek Paroi

unread,
Mar 13, 2015, 6:04:06 AM3/13/15
to pym...@googlegroups.com
Hi Jeremy Nelson,
      Thanks for your help. I used your script its working perfect . I run in python 2.7 where i have to made only one chance in code ( urllib is not in python 2.7 ) .it takes 13 minutes .As you said it not the for mat i was looking for .I tried other xslt but not worked i am attached the structure see if u can help me  to find right xslt for my need or do i have to change the generated dc xml file . I need to convert the mrc file  to this format(attached) so that i can  import to the dspace .thanks again for you valuable help.

 The type i am looking for ..(can't upload the snapshot )

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<dublin_core>
<dcvalue element ="contributor" qualifier="author">Sarkar,  Prof. S.</dcvalue>
<dcvalue element ="contributor" qualifier="author">Basu,  Prof. Anupam</dcvalue>
<dcvalue element ="identifier" qualifier="uri">http://nptel.ac.in/courses/106105077/</dcvalue>
<dcvalue element ="description" qualifier="tableofcontents">1. Introduction to Artificial Intelligence; 2. State Space Search; 3. Informed Search; ...</dcvalue>
<dcvalue element ="language" qualifier="iso">eng</dcvalue>
<dcvalue element ="relation" qualifier="haspart">1. Introduction to Artificial Intelligence; 2. State Space Search; 3. Informed S</dcvalue>
<dcvalue element ="source" qualifier="none">ndf</dcvalue>
<dcvalue element ="subject" qualifier="none">Computer Science and Engineering</dcvalue>
<dcvalue element ="subject" qualifier="none">Artificial Intelligence</dcvalue>
<dcvalue element ="title" qualifier="none">Artificial Intelligence</dcvalue>
<dcvalue element ="type" qualifier="none">Video</dcvalue>
<dcvalue element ="format" qualifier="difficultylevel">default</dcvalue>
<dcvalue element ="type" qualifier="typeoflearningmaterial">videoLecture</dcvalue>
<dcvalue element ="relation" qualifier="ispartof" />
</dublin_core>
Reply all
Reply to author
Forward
0 new messages