Jargon/MySql metadata accentuated character update issue

18 views
Skip to first unread message

Pascal Heus

unread,
Jan 13, 2017, 12:37:06 PM1/13/17
to sup...@irods.org, irod...@googlegroups.com
All:

We're trying to solve an issue we're encountering when updating resource metadata through Jargon which contains certain accentuated characters.
This is with Jargon 4.0.2. / iRODS 4.1.7 / MySql 5.5.4.6 on Ubuntu, and having similar problem on a prod server with iRODS 4.1.9  / MySQL 5.1.73 on RHEL.

Basically we seem to be able to insert new metadata values such as "teèéêëst" without any problem, but we we try to update to for example "test", we see a Jargon SQL error in the logs (see below) whereby the "delete" operation fails as a like statements complains about  a mix of collations.

Now noted that Jargon is configured to use UTF-8 but the underlying MySql database icat database and tables have encoding set to latin1/ latin1_general_cs
We tried to:
(1) changed the icat / table settings to utf8mb4/  utf8mb4_unicode_ci (using alter table ... convert SQL command)
(2) changed the icat / table settings to utf8 /  utf8_general_ci (using alter table ... convert SQL command)
(3) set Jargon to ISO-8859-1
but these do not seem to solve the issue and resulted in other errors or discrepancies.
With utf8 / utf8_general_ci, some change seem to go through without any error, but the underlying database values are actualy not updated.

Is this a known issue? Is there a check list of "settings" you could provide us for MySQL (in particular) as well as icat, jargon, odbc, iRODS, etc.?
Something is misconfigured but can't seem to find away to fix it. Any suggestion would be appreciated.

best
*P

Java logs example:
2017-01-12 23:03:10,788 [TP-Processor7] ERROR org.irods.jargon.core.pub.CollectionAOImpl - jargon exception modifying AVU metadata
org.irods.jargon.core.exception.CatalogSQLException: Catalog SQL error
        at org.irods.jargon.core.connection.IRODSErrorScanner.checkSpecificCodesAndThrowIfExceptionLocated(IRODSErrorScanner.java:190)
        at org.irods.jargon.core.connection.IRODSErrorScanner.inspectAndThrowIfNeeded(IRODSErrorScanner.java:123)
        at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.processMessageInfoLessThanZero(AbstractIRODSMidLevelProtocol.java:1172)
        at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:663)
        at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.readMessage(AbstractIRODSMidLevelProtocol.java:629)
        at org.irods.jargon.core.connection.IRODSMidLevelProtocol.irodsFunction(IRODSMidLevelProtocol.java:235)
        at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.irodsFunction(AbstractIRODSMidLevelProtocol.java:174)
        at org.irods.jargon.core.connection.AbstractIRODSMidLevelProtocol.irodsFunction(AbstractIRODSMidLevelProtocol.java:571)
        at org.irods.jargon.core.pub.CollectionAOImpl.modifyAVUMetadata(CollectionAOImpl.java:732)
        at org.irods.jargon.core.pub.CollectionAOImpl.modifyAvuValueBasedOnGivenAttributeAndUnit(CollectionAOImpl.java:689)
    ...


iRODS logs
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL at cllDisconnect, count: 6
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL: begin ...
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL: SET SESSION autocommit=0 ...
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL: SET SESSION sql_mode='ANSI,ST ...
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL: SET character_set_client = ut ...
Jan 13 02:48:26 pid:28667 NOTICE: Warning, pending SQL: SET character_set_results = u ...
Jan 13 02:48:26 pid:28667 NOTICE: Agent exiting with status = 0
Jan 13 02:48:26 pid:25427 NOTICE: Agent process 28667 exited with status 0
Jan 13 02:48:26 pid:25427 NOTICE: Agent process 28669 started for puser=rods and cuser=rods from 68.57.116.108
Jan 13 02:48:27 pid:28669 NOTICE: chlDeleteAVUMetadata cmlExecuteNoAnswerSql delete failure -819000
Jan 13 02:48:27 pid:28669 NOTICE: chlDeleteAVUMetadata cmlExecuteNoAnswerSql(rollback) succeeded
Jan 13 02:48:27 pid:28669 NOTICE: rsModAVUMetadata: rcModAVUMetadata failed
Jan 13 02:48:27 pid:28669 NOTICE: bindVar[1]=76120
Jan 13 02:48:27 pid:28669 NOTICE: bindVar[3]=t�st
Jan 13 02:48:27 pid:28669 NOTICE: bindVar[4]=
Jan 13 02:48:27 pid:28669 NOTICE: _cllExecSqlNoResult: SQLExecDirect error: -1 sql:delete from R_OBJT_METAMAP where object_id=? and meta_id IN (select meta_id from R_META_MAIN where meta_attr_name = ? and meta_attr_value = ? and (meta_attr_unit = ? or meta_attr_unit IS NULL) )
Jan 13 02:48:27 pid:28669 NOTICE: SQLSTATE: S1010
Jan 13 02:48:27 pid:28669 NOTICE: SQLCODE: 0
Jan 13 02:48:27 pid:28669 NOTICE: SQL Error message: [unixODBC][Driver Manager]Function sequence error
Jan 13 02:48:27 pid:28669 NOTICE: SQLSTATE: S1000
Jan 13 02:48:27 pid:28669 NOTICE: SQLCODE: 1267
Jan 13 02:48:27 pid:28669 NOTICE: SQL Error message: [unixODBC][MySQL][ODBC 5.1 Driver][mysqld-5.5.46-0ubuntu0.12.04.2]Illegal mix of collations (latin1_general_cs,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
Jan 13 02:48:27 pid:28669 NOTICE: chlDeleteAVUMetadata cmlExecuteNoAnswerSql delete failure -806000
Jan 13 02:48:27 pid:28669 NOTICE: chlModAVUMetadata cmlExecuteNoAnswerSql(rollback) succeeded
Jan 13 02:48:27 pid:28669 NOTICE: rsModAVUMetadata: rcModAVUMetadata failed
Jan 13 02:48:27 pid:28669 NOTICE: readAndProcClientMsg: received disconnect msg from client
Jan 13 02:48:27 pid:28669 NOTICE: Agent exiting with status = 0
Jan 13 02:48:27 pid:25427 NOTICE: Agent process 28669 exited with status 0
Jan 13 02:48:27 pid:25427 NOTICE: Agent process 28671 started for puser=rods and cuser=rods from 68.57.116.108
Jan 13 02:48:27 pid:28671 NOTICE: bindVar[1]=76120
Jan 13 02:48:27 pid:28671 NOTICE: bindVar[3]=t�st
Jan 13 02:48:27 pid:28671 NOTICE: bindVar[4]=
Jan 13 02:48:27 pid:28671 NOTICE: _cllExecSqlNoResult: SQLExecDirect error: -1 sql:delete from R_OBJT_METAMAP where object_id=? and meta_id IN (select meta_id from R_META_MAIN where meta_attr_name like ? and meta_attr_value like ? and (meta_attr_unit like ? or meta_attr_unit IS NULL) )
Jan 13 02:48:27 pid:28671 NOTICE: SQLSTATE: S1010
Jan 13 02:48:27 pid:28671 NOTICE: SQLCODE: 0
Jan 13 02:48:27 pid:28671 NOTICE: SQL Error message: [unixODBC][Driver Manager]Function sequence error
Jan 13 02:48:27 pid:28671 NOTICE: SQLSTATE: S1000
Jan 13 02:48:27 pid:28671 NOTICE: SQLCODE: 1267
Jan 13 02:48:27 pid:28671 NOTICE: SQL Error message: [unixODBC][MySQL][ODBC 5.1 Driver][mysqld-5.5.46-0ubuntu0.12.04.2]Illegal mix of collations (latin1_general_cs,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'
Jan 13 02:48:27 pid:28671 NOTICE: chlDeleteAVUMetadata cmlExecuteNoAnswerSql delete failure -806000
Jan 13 02:48:27 pid:28671 NOTICE: chlDeleteAVUMetadata cmlExecuteNoAnswerSql(rollback) succeeded
Jan 13 02:48:27 pid:28671 NOTICE: rsModAVUMetadata: rcModAVUMetadata failed
Jan 13 02:48:28 pid:28671 NOTICE: readAndProcClientMsg: received disconnect msg from client


--
The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. Any unauthorized disclosure is strictly prohibited. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

Conway, Mike

unread,
Jan 13, 2017, 12:56:58 PM1/13/17
to sup...@irods.org, irod...@googlegroups.com

Hi, this is something (like Oracle) that needs visibility in CI, or at minimum to have a test rig at RENCI to try out.  Jargon unit tests are currently run against Postgres, as they are not really meant to test iRODS itself (but can actually do a good job of it 😉).


I went back and verified that, save for some URL encoding utilities that by spec are meant to be UTF-8, all references to encoding are picked up from jargon.properties.  That should make it relatively easy to establish an environment with the proper char sets involved.


So I'd suggest we need to address this first by standing up a vm with iRODS + MySql set to consistent char sets and run the full suites, I can add appropriate test cases into the unit test suites to provide accentuated avu metadata operations.


Michael Conway

Senior Data Scientist - RENCI, iRODS Consortium

Research Assistant - DataNet Federation Consortium

michael...@unc.edu

http://www.linkedin.com/in/michaelcconway


UNC Chapel Hill, Chapel Hill, NC




From: Pascal Heus <pasca...@metadatatechnology.com> on behalf of Pascal Heus <pasca...@mtna.us>
Sent: Friday, January 13, 2017 12:37 PM
To: sup...@irods.org; irod...@googlegroups.com
Subject: [iROD-Chat:15976] Jargon/MySql metadata accentuated character update issue
 
--
--
"iRODS: the Integrated Rule-Oriented Data-management System; A community driven, open source, data grid software solution" https://www.irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat

---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Heus

unread,
Feb 2, 2017, 8:44:23 AM2/2/17
to irod...@googlegroups.com
All:

This is a follow-up on the issue below reported a couple of weeks ago around not being able to update metadata properties containing accentuated characters (e.g. éèê) when using iRODS with MySql, and the MySql server/tables are configured with a Latin1 character set (which is the default).

We have published a simple Jargon Java project reproducing the problem with our server. This is available at:
https://bitbucket.org/mtnaus/irods-mysql-encoding-issue
Main source code also attached (but you'll need to checkout from Git repository to compile and run)

The errors we come down to the following errors being reported by MySql ODBC:

Illegal mix of collations (latin1_general_cs,IMPLICIT) and (utf8_general_ci,COERCIBLE)

An underlying issue we uncovered is that iRODS returns AVU values seem to be incorrectly encoded/represented. For example, éèê comes back as ���
This can also be somewhat seen with imeta as follows:
# Create a test collection
imkdir testcoll
imeta add -C testcoll testprop2 éèê
imeta ls -C testcoll
# --> this returns ???
# The value in the r_meta_main MySql table is however correct
The same is seen through Jargon which result in some commands to fail.

We have a platform that is expected to go in production shortly, and this has become a significant issue preventing the launch. We understand that the problem is likely the encoding discrepancy between MySql (Latin1) and iRODS/Jargon (UTF8), but reinstalling from scratch or upgrading is not really an option (populated database, platform certification, and other technical constraint). We also tried to convert the r_meta_main table to UTF-8 and this did not resolve the issue, rather introduced new ones, so we're not sure this is the solution.

We would appreciate any inputs/suggestion you might have as a potential resolution or work around. Let us know if you can reproduce or if you need additional information/clarifications on the issue.

best
*P
App.java
Reply all
Reply to author
Forward
0 new messages