Importing CSV file with special characters into DSpace.

1,001 views
Skip to first unread message

Xiping Liu

unread,
Mar 10, 2016, 12:18:05 PM3/10/16
to DSpace Technical Support
Hello everyone, 

A few months ago we started a project of cleaning up our electronic thesis and dissertation records from DSpace. We exported our data from Dspace as a csv file and after the cleanup we are ready to import the data back into Dspace. But we noticed that some of the names (accent marks and quotes) in our data are not showing correctly (I am assuming the encoding is not set correctly in the very beginning after we export). But since we have already done our clean up in our file, it will be really painful to go back and re export the file from Dspace (so we can set the encoding correctly this time) and redo all the editing. I wonder is there any way we can correct the encoding after we import the data back into Dspace? Or any suggestions to solve this problem? I have attached a small sample of our data. 

Your help is greatly appreicated. 

Xiping 


test1.csv

euler

unread,
Mar 10, 2016, 10:18:07 PM3/10/16
to DSpace Technical Support
Hi Xiping,

My suggestion is to import or upload your (cleaned-up) csv file into Google spreadsheets. I tested your sample data and accent marks are retained (not 100% though). See my attached screenshot. All you have to do now is to find and replace all the � characters. After cleaning it up, download as CSV and then import it back to DSpace. Take note that once you already downloaded it as CSV, refrain from editing it in MS Excel because in my experience, it will mess up your encoding again. It may seem a tedious task but I would rather do it this way than start all over again.

Hope this helps.

Good luck and best regards,
euler
accents.PNG

helix84

unread,
Mar 11, 2016, 5:34:30 AM3/11/16
to Xiping Liu, DSpace Technical Support
You should be able to simply re-import the file with the correct
encoding and only changes will be applied. You will even see a list of
the changes.

Perhaps it will help you to know that import encoding may be specified
this way (the locale used must be already present/generated on your
system):

LC_ALL=sk_SK.UTF8 /dspace/bin/dspace import-metadata ...

I think it's only been tested with UTF-8 recently, but you can try and see.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech...@googlegroups.com.
> To post to this group, send email to dspac...@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.

Xiping Liu

unread,
Mar 15, 2016, 1:16:08 PM3/15/16
to DSpace Technical Support
Hi euler,

Thanks for your suggestion. I tried uploading our cleaned csv into google spreadsheet and as you said most of the accent marks are retained. The problem is depending where � is, it represents different special characters, in name field it can be Š or Å. In abstract field, it represents ". So we can't do a one stop find and replace. 

We ended up identifying all the names with special characters in our cleaned up file (out of 941 records we found 36 records with names of special characters so not too bad) and have the same list of names with correct encoding in a second column and do a find and replace. 

We were suggested by a colleague on using open office to open up the csv file right after export from Dspace to set the encoding right (for future reference). It's good to know that Googe spreadsheet has the same function. 

Thanks again for your help. 
Xiping

Ali Mansoor

unread,
Mar 28, 2016, 11:40:20 AM3/28/16
to DSpace Technical Support
Dear,

Open the CSV file in the notepad & go to the save as option & below their is a encoding option then select the encoding to the UTF-8 & save.Upload again that csv file in the dspace & see the changes if not then check your tomcat server.xml file at the "connector" weather the Encoding has set to the UTF-8 or not, if not then set for UTF-8 & restart the tomcat.

Fahad Ubaid

unread,
Feb 11, 2020, 8:59:10 AM2/11/20
to DSpace Technical Support
Asalam-o-Aliakum/ Greetings Ali,

I am new user of dspace & facing problem in importing metadata from csv file to the dspace. 

whenever I import a data from csv  to the dspace, It change the filed location for example value of author field goes into the date field, and so on. For the reference picture is attached.

can you help me in this regard,

I am very thankful if you help me
prob.png
new meta data.csv

Paul Münch

unread,
Feb 24, 2020, 3:57:11 AM2/24/20
to dspac...@googlegroups.com
Hello,

welcome in the DSpace community. Are you sure that you filled the metadata fields with the correct values? DSpace doesn't prove which value is in which field. On the screenshot it looks like they get out of their places. Maybe it is a misplaced comma only.

Kind regards,

Paul Münch

Am 11.02.20 um 10:54 schrieb Fahad Ubaid:
--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

geneviev...@anu.edu.au

unread,
Feb 24, 2020, 5:10:27 PM2/24/20
to DSpace Technical Support
Hi Fahad,

Are you using the "Improt metadata" or "Batch import" method?  If using the Batch Import method I would suggest looking at your bte.xml and you will need to use the fields as defined, and in the order defined there (this is the one where you select the type input data).  With the Import metadata (where you just upload a csv) then I'm not sure what the the problem is.  Personally for CSV imports I would recommend the Import metadata over Batch import.

Regards,

Genevieve
Reply all
Reply to author
Forward
0 new messages