SQLServer XML and UTF-16 charset issue reading xml as String

393 views
Skip to first unread message

Denis Miorandi

unread,
May 27, 2016, 6:28:05 AM5/27/16
to jOOQ User Group
Probably is something known that SqlServer store xml type as UTF-16. You can insert UTF-8 but select always extract UTF-16 xml field.
I've got a binding to convert query xml field into String, see below...
I would like this binder to generate String as new String(bytes,"UTF-16") instead of default, I suppose default is something like new String(bytes).
I want also to do this just on sqlserver but not on other databases, cause this binding is in my parent pom (all db kinds).

How can I achieve this? Is it possible manage two difference forcedType choosing by database engine? Or is is possible to write a binder that define
different constructor of String on different db engines? Other possibilities?


In my maven configuration I've got 

<customType>
<name>XmlStringBinding</name>
<type>java.lang.String</type>
<binding>it.clesius.jooq.bindings.XmlStringBinding</binding>
</customType>

<forcedType>
    <name>XmlStringBinding</name>
    <expression>(\w+\.)?(?!UDT_PREFIX|PROC_PREFIX)\w+(\.\w+)?</expression>
    <types>XML</types>
</forcedType>

and java code for binding is:

public class XmlStringBinding implements Binding<Object, String> {
	/**
	 * Generated UID
	 */
	private static final long serialVersionUID = 358789452467943117L;

	@Override
	public Converter<Object, String> converter() {
		return new Converter<Object, String>() {
			private static final long serialVersionUID = 5005180607894678030L;

			@Override
			public String from(Object databaseObject) {
				return null; // Not needed in the example
			}

			@Override
			public Object to(String userObject) {
				return null; // Not needed in the example
			}

			@Override
			public Class<Object> fromType() {
				return Object.class;
			}

			@Override
			public Class<String> toType() {
				return String.class;
			}
		};
	}

	@Override
	public void sql(BindingSQLContext<String> ctx) throws SQLException {
		ctx.render().sql("?");
	}

	@Override
	public void register(BindingRegisterContext<String> ctx) throws SQLException {
		ctx.statement().registerOutParameter(ctx.index(), Types.SQLXML);
	}

	@Override
	public void set(BindingSetStatementContext<String> ctx) throws SQLException {
		if (ctx.value() == null)
			ctx.statement().setObject(ctx.index(), null);
		else
			ctx.statement().setString(ctx.index(), ctx.value());
	}

	@Override
	public void set(BindingSetSQLOutputContext<String> ctx) throws SQLException {
		if (ctx.value() == null)
			ctx.output().writeObject(null);
		else
			ctx.output().writeString(ctx.value());
	}

	@Override
	public void get(BindingGetResultSetContext<String> ctx) throws SQLException {
		ctx.value(ctx.resultSet().getString(ctx.index()));
	}

	@Override
	public void get(BindingGetStatementContext<String> ctx) throws SQLException {
		ctx.value(ctx.statement().getString(ctx.index()));
	}

	@Override
	public void get(BindingGetSQLInputContext<String> ctx) throws SQLException {
		ctx.value(ctx.input().readString());
	}
}

Lukas Eder

unread,
May 27, 2016, 7:38:05 AM5/27/16
to jooq...@googlegroups.com
Hi Denis,

Interesting, I didn't know that about SQL Server. Doesn't it consult the encoding specified in the XML's leading processing instruction? Or the byte order mark, if any?

In any case, you're on the right track with jOOQ's Binding, that's where your logic will go. In particular, you'll have to implement your logic in all three get() methods for ResultSet, Statement, and SQLInput (although the latter is currently not used with SQL Server). Since all of these parameter objects (such as BindingGetResultSetContext) extend org.jooq.Scope, you can access the SQLDialect, or the SQLDialect family directly from it:


This will allow you to implement a single data type binding for XML types that can work with any database.

I hope this helps. If you have any further specific questions about this, please do not hesitate to ask, and I'll be very happy to help.

Best Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Denis Miorandi

unread,
May 27, 2016, 9:31:59 AM5/27/16
to jOOQ User Group
Ok about your suggestion. If a binding is needed that could be a good general solution. In my case, I could get an SQLXml field an get stream from it with the right encoding...

About SqlServer, I give you some more info from my experience...
I've observed that any xml file that has passed via a sqlserver comes in UTF-16 Little Endian with BOM
, so that you have to take care about reading , if it's a file using UTF-16 encoding i.e. 
FileUtils.readFileToString(new File("src/test/resources/testUtf16.xml"),"UTF-16")
this take care to clean string from BOM, zero bytes (due to 2 bytes encoding), and convert right *grave and others special characters...

If you need insteadto insert into sqlserver via mgmtstudio an UTF-8 xml you need to remove 
<?xml version="1.0" encoding="utf-8"?>
part or to set encoding="utf-16" otherwise is refused.
Also sql server doesn't manage cdata sections in right way cause it remove cdata and convert it in an xml encoded section and is not symmetric on reading...
All in all sql server has not a good xml implementation, but I need it...

Lukas Eder

unread,
May 27, 2016, 11:06:20 AM5/27/16
to jooq...@googlegroups.com
Hi Denis,

Oh my. That looks tricky on the SQL Server side. Thanks for all that info. I'll try to stay clear of XML in the database, then :)

In any case, let me know if you run into any issues with the Binding implementation.

Cheers,
Lukas
Reply all
Reply to author
Forward
0 new messages