[ADVISORY] Protocol error when a query string contains Unicode surrogate code units

48 views
Skip to first unread message

Olivier Michallat

unread,
Oct 7, 2019, 3:44:15 PM10/7/19
to java-dri...@lists.datastax.com
Hi,

We have detected a rare but critical issue in the driver (JAVA-2475). We are working on a hotfix release (OSS 4.2.2 / DSE 2.2.2) that will be available in the next few days.

In the meantime, here's a detailed explanation and an easy workaround:

The issue manifests when a query string contains Unicode surrogate code units. These are characters in the range \uD800 to \uDBFF, that are used in the representation of supplementary characters in the UTF-16 encoding. The driver miscalculates the encoded size of those characters, which leads to a corrupted protocol frame with the wrong message size. This can result in either:
  • a request timeout
  • or a server error mentioning an inexistent protocol version, for example:
    com.datastax.oss.driver.api.core.servererrors.ProtocolError: Invalid or unsupported protocol version (0); supported versions are (3/v3, 4/v4, 5/v5-beta)


This only happens if the characters are directly in the query string (most likely because of inlined text literals). For example, the following requests are vulnerable:


// Simple statement with values inlined in the query string:
session.execute(
    SimpleStatement.newInstance("INSERT INTO test.foo (id,t) VALUES (0, '\uD83C\uDF55')"));


// Same with execute(String) shortcut:
session.execute("INSERT INTO test.foo (id,t) VALUES (0, '\uD83C\uDF55')");


// Built query with inlined literals:
session.execute(
    insertInto("test", "foo")
        .value("id", literal(0))
        .value("t", literal("\uD83C\uDF55"))
        .build());



However, if the characters appear in values that are provided separately from the query, you are safe. The following examples are NOT vulnerable:


// Simple statement with values provided separately:
session.execute(
    SimpleStatement.newInstance(
        "INSERT INTO test.foo (id,t) VALUES (?, ?)", 0, "\uD83C\uDF55"));


// Built query with bind markers, values provided separately:
session.execute(
    insertInto("test", "foo")
        .value("id", bindMarker())
        .value("t", bindMarker())
        .builder()
        .addPositionalValue(0)
        .addPositionalValue("\uD83C\uDF55")
        .build());

// Prepared statement
PreparedStatement pst = session.prepare("INSERT INTO test.foo (id,t) VALUES (?, ?)");
session.execute(pst.bind(0, "\uD83C\uDF55"));



Note that providing the values separately is a best practice anyway, as it avoids other issues like CQL injection. We also recommend prepared statements for queries that are executed often in your application (see the manual).


--

Olivier Michallat

Driver & tools engineer, DataStax

Olivier Michallat

unread,
Oct 7, 2019, 6:29:32 PM10/7/19
to java-dri...@lists.datastax.com
Small correction: the range of surrogate characters \uD800 to \uDFFF.

--

Olivier Michallat

Driver & tools engineer, DataStax


Reply all
Reply to author
Forward
0 new messages