[Dspace-tech] Importing XML records with unicode characters

3 views
Skip to first unread message

Larry Hansard

unread,
Aug 24, 2015, 2:15:18 PM8/24/15
to dspac...@lists.sourceforge.net
I'm trying to import XML records that have unicode characters. This is an example of one of the errors:

java.sql.SQLException: ERROR: Unicode >= 0x10000 is not supported

at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131)
at org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505)
at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:197)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:233)
at org.dspace.storage.rdbms.DatabaseManager.execute(DatabaseManager.java:1004)
at org.dspace.storage.rdbms.DatabaseManager.update(DatabaseManager.java:482)
at org.dspace.content.Item.update(Item.java:1239)
at org.dspace.content.InstallItem.installItem(InstallItem.java:184)
at org.dspace.content.InstallItem.installItem(InstallItem.java:90)
at org.dspace.app.itemimport.ItemImport.addItem(ItemImport.java:476)
at org.dspace.app.itemimport.ItemImport.addItems(ItemImport.java:334)
at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:282)
java.sql.SQLException: ERROR: Unicode >= 0x10000 is not supported

Does anyone have a work around for this problem?

Thanks -- Larry

Larry Hansard
Georgia Tech
Library Systems
404-894-4585


Christine Moulen

unread,
Aug 24, 2015, 2:15:19 PM8/24/15
to Larry Hansard, dspac...@lists.sourceforge.net
I hope there's a better solution, but my work around so far has been to
find the offending character and change it to something the importer will
accept.
E.g. change those quotation marks that curl in the opposite direction from
normal to regular quote marks. Changing < and > to &lt; and &gt; etc...

If you have to, any character ought to be able to be marked up in this
way. Quoting W3C documentation at:
http://www.w3.org/TR/2003/NOTE-unicode-xml-20030613/

Characters are denoted using the notation used in the Unicode Standard,
i.e. an optional U+ followed by their hexadecimal number, using at least 4
digits, such as "U+1234" or "U+10FFFD". In XML or HTML this could be
expressed as "&#x1234;" or "&#x10FFFD;".

But I don't know if this is the only way to handle it.

Christine
>-------------------------------------------------------
>SF email is sponsored by - The IT Product Guide
>Read honest & candid reviews on hundreds of IT Products from real users.
>Discover which products truly live up to the hype. Start reading now.
>http://productguide.itmanagersjournal.com/
>_______________________________________________
>DSpace-tech mailing list
>DSpac...@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/dspace-tech


Christine Moulen
Library Systems Manager
MIT Libraries, E25-131
617-253-0757, fax 617-253-8894
orb...@mit.edu


Reply all
Reply to author
Forward
0 new messages