consolidated diff

18 views
Skip to first unread message

Mark Brown

unread,
Apr 14, 2016, 12:29:04 PM4/14/16
to Thai Linux/FOSS developers
Hi,

I'm a bit conscious of the number of diffs I'm sending, so attached is a consolidated diff of all the changes I'd like to propose. It addresses the issues of sending error messages to stderr, specifying the dictionary location, and ensuring the dictionary is only loaded once.

The first of these is addressed in the way that we discussed, namely to return a negative error code if the dictionary fails to load.

For allowing applications to set the dictionary location, and for avoiding potentially loading the dictionary multiple times when multi-threading, I've found a way to kill two birds with one stone: adding a function th_ensure_dict_loaded() which forces the dictionary to load if it is not already loaded. It takes an argument that specifies the path, but if this is NULL the existing strategy is used. Multi-threaded applications, when they first need to call word-breaking functions, can call this function first to load the dictionary, using a mutex to ensure only one thread does so. After that, calls to word breaking functions can be safely performed without the mutex. Since synchronisation is handled at a higher level, this doesn't need libthai to depend on pthreads or anything like that.

How does this look to you? Any changes that you'd like to see?

Regards,
Mark Brown

libthai-r596.diff

Theppitak Karoonboonyanan

unread,
May 6, 2016, 6:58:15 AM5/6/16
to thai-linux...@googlegroups.com
Hi,

Sorry again for a late reply. I've been planning to move the
repository from linux.thai.net to github lately, due to more
restricted access policy by the hosting data center.
I'll update the status to the list soon when everything is settled.

Meanwhile, let's update my thoughts on the issues.

On Thu, Apr 14, 2016 at 11:29 PM, Mark Brown <mark...@gmail.com> wrote:

> I'm a bit conscious of the number of diffs I'm sending, so attached is a
> consolidated diff of all the changes I'd like to propose. It addresses the
> issues of sending error messages to stderr, specifying the dictionary
> location, and ensuring the dictionary is only loaded once.

In fact, I prefer separated patches, as it's easier to review & commit.
I can take them from your previous mails, if there's no change
for each one.

> The first of these is addressed in the way that we discussed, namely to
> return a negative error code if the dictionary fails to load.

My thought is not quite finalized for this. Let me return to it later.

> For allowing applications to set the dictionary location, and for avoiding
> potentially loading the dictionary multiple times when multi-threading, I've
> found a way to kill two birds with one stone: adding a function
> th_ensure_dict_loaded() which forces the dictionary to load if it is not
> already loaded. It takes an argument that specifies the path, but if this is
> NULL the existing strategy is used. Multi-threaded applications, when they
> first need to call word-breaking functions, can call this function first to
> load the dictionary, using a mutex to ensure only one thread does so. After
> that, calls to word breaking functions can be safely performed without the
> mutex. Since synchronisation is handled at a higher level, this doesn't need
> libthai to depend on pthreads or anything like that.

This is a good idea. It has brought me to another step:
to get rid of static resources completely, just like how the
break pools were done previously.

The new thread-safe interface should be like this:

ThBrk *brk = th_brk_new (dictpath); // dictpath == NULL for defaults
int pos[N];
int res = th_brk_brk (brk, str, pos, N);
th_brk_delete (brk);

And the dictionary should be made per-instance data, not global static.

The drawback is that the dictionary sharing rate can be less than
when it was static. It's now up to the client to manage the sharing,
along with the mutex on loading/unloading resources.

The old th_brk() function should be deprecated but is still provided
as a wrapper to the new API for backward compatibility.

Regards,
-Thep.
--
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/

Mark Brown

unread,
May 13, 2016, 4:02:21 PM5/13/16
to Thai Linux/FOSS developers
Hi Thep,


On Friday, May 6, 2016 at 8:58:15 PM UTC+10, Theppitak Karoonboonyanan wrote:
On Thu, Apr 14, 2016 at 11:29 PM, Mark Brown <mark...@gmail.com> wrote:

> I'm a bit conscious of the number of diffs I'm sending, so attached is a
> consolidated diff of all the changes I'd like to propose. It addresses the
> issues of sending error messages to stderr, specifying the dictionary
> location, and ensuring the dictionary is only loaded once.

In fact, I prefer separated patches, as it's easier to review & commit.
I can take them from your previous mails, if there's no change
for each one.

Yes feel free, although they don't all merge cleanly because I was exploring a few different ideas. But I like your plan below, and I'll be happy to provide a pull request for that.
 
The new thread-safe interface should be like this:

  ThBrk *brk = th_brk_new (dictpath); // dictpath == NULL for defaults
  int pos[N];
  int res = th_brk_brk (brk, str, pos, N);
  th_brk_delete (brk);

And the dictionary should be made per-instance data, not global static.

Is failure to load indicated by th_brk_new returning NULL? That would work for me.

Regards,
Mark

Theppitak Karoonboonyanan

unread,
May 15, 2016, 2:09:28 AM5/15/16
to thai-linux...@googlegroups.com
Hi Mark,
Yes, I meant so.

Regards,
Reply all
Reply to author
Forward
0 new messages