Case insensitive sorting using createSubscribeQuery

53 views
Skip to first unread message

naser.zan...@gmail.com

unread,
Oct 1, 2018, 5:15:21 PM10/1/18
to ShareJS
I'm trying to subscribe to a query which is supposed to sort the result.

The following query sorts the result, but it's case sensitive.
connection.createSubscribeQuery(collection, { $sort: { title: 1 } });

How can I make it sort the result case insensitive?

Thanks in advance.

Rachael Stedman

unread,
Oct 2, 2018, 2:55:31 PM10/2/18
to ShareJS
Are you using Mongo as your database? I imagine the limitation of the sort comes from whether you have a case insensitive index in your database?
Here's what I found in the Derby docs on the query subscribe format - it does say "The sharedb-mongo adapter supports most MongoDB queries that you could pass to the Mongo find() method. See the Mongo DB query documentation and the query selectors reference." which implies to me that the sort result would depend on what Mongo supports.
https://derbyjs.com/docs/derby-0.10/models/queries#mongodb-query-format

And it looks like Mongo only added case insensitive indexes in 3.4 (https://docs.mongodb.com/manual/core/index-case-insensitive/) so you might also want to check your MongoDB version.

Anyhow, I hope this helps! Apologies I don't have a more definitive answer!

Best,
Rachael

Eric Hwang

unread,
Oct 2, 2018, 9:57:51 PM10/2/18
to sha...@googlegroups.com
Rachael's correct in that the query format is dependent on the database adapter, e.g. sharedb-mongo.

Looking into it a little bit, I can think of two options of doing a case-insensitive sort for queries via Share, one that's possible today with some application-layer work and one that requires a small bit of work in sharedb-mongo. There may be other approaches I haven't thought of.

You could also make a shallow copy and do the sort in the client after fetching the data, though that only works if you're not using a limit on the query, and you'd need to re-copy and re-sort on any changes to the query results.


Option 1 - Add a meta property for a normalized string

Summary - This is possible with sharedb/sharedb-mongo today. It lets you fully control the normalization in code, but it results in some extra data stored in the DB. It'd require a migration for any existing documents pre-normalization or when changing the normalization algorithm, which is extra developer overhead.

With this approach, you'd add server middleware that sets a normalized string form of the field onto the Share meta property, and then you'd sort by that normalized field.

The middleware would look something like this:

backend.use('commit', function(request, next) {
  var data = request.snapshot.data || {};
  if (request.collection === 'my_collection') {
    request.snapshot.m.myTitleLowerCase = (data.myTitle || '').toLowerCase();
    // Note: Consider using toLocaleLowerCase(locale) if you know the locale of the text.
  }
  next();
});

Informative note - It sets the normalized string underneath the meta property, m, because Share meta properties don't undergo OT and currently don't get sent to clients at all. You can't do the same thing with fields under snapshot.data because then the client issuing the commit will be unaware of the server-side change pre-commit.

Then, you'd sort like this - you sort on _m.myTitleLowerCase because sharedb-mongo puts the meta property on _m:

connection.createSubscribeQuery(collection, { $sort: { '_m.myTitleLowerCase': 1 } });


Option 2 - Use Mongo collation

Summary - This would require a small addition to sharedb-mongo before it would work, and it also requires Mongo 3.4+. You can do it in place without adding more data to the documents, though indexes do take some extra disk space. Also, Mongo only lets you specify one collation per query, so the collation would also affect filtering on string fields, for any query that used collation.

Starting from the page on case-insensitive indexes that Rachael linked (thanks Rachael!), looks like Mongo 3.4 added support for collation when comparing text strings.

With a plain Mongo client, you'd do so using cursor.collation:

db.my_collection.find(query)
  .sort({myTitle: 1})
  .limit(1)
  .collation({locale: 'en_US', strength: 1 /*or 2*/});

For performance, you'd probably want a case-insensitive index on that field, as linked above.

In sharedb-mongo, you generally specify query modifiers using dollar-sign prefixed query properties, like $sort or $hint. So it would look something like:

connection.createSubscribeQuery(collection, {
  $sort: { myTitle: 1 },
  $collation: {locale: 'en_US', strength: 1 /*or 2*/},
});

However, sharedb-mongo doesn't implement $collation. Yet!

It wouldn't be too hard to add it. It involves adding an entry to this map, with associated unit/integration tests.

The one slightly tricky part with the tests is that Share runs its tests against Mongo server 2.6, 3.6, and 4.0, and collation won't work when running against 2.6. I believe it should be possible to check the MONGODB_VERSION env variable set in Travis and conditionally run specific test cases that way.

I'll write up a GitHub issue tomorrow to track the feature request and link to it from this thread. If you'd like to try coding up the sharedb-mongo change in the meantime, go ahead! Should be small enough that I'd feel comfortable handling PR review/merge/publish, though for new functionality or major changes, us newer contributors/maintainers generally check with the main project owner Nate first, and he's out until next week.


Option 3 - Shallow copy and sort client-side

Don't have time to flesh this out right now, but if you're subscribing to all matching data - in other words, not using a limit - then you could do the sorting client-side on a shallow copy.

- Listen to query event(s) for changes to query results: https://github.com/share/sharedb#class-sharedbquery
- Shallow copy query results, sort using a case-insensitive comparison function, use the sorted copy.


Those are what I can think of at the moment. Anyone with other ideas?

-Eric

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Hwang

unread,
Oct 4, 2018, 4:04:05 PM10/4/18
to sha...@googlegroups.com
Filed an issue in sharedb-mongo to track adding $collation:

Reply all
Reply to author
Forward
0 new messages