unitrie - A unicode Trie in Go with threaded wildcard lookups and huffman coded child indexes

457 views
Skip to first unread message

Andreas Nilsson

unread,
Oct 3, 2013, 4:58:18 PM10/3/13
to golan...@googlegroups.com
Hi all,
I've put together a unicode Trie as part of an experiment with brute forcing cross words.
Nothing fancy but maybe someone will find use for it.
Any comments are most welcome as this is my first substantial piece of Go code and I need all the help I can get :-)

peace,
Andreas

https://github.com/andreas-gone-wild/unitrie

Chris Monson

unread,
Oct 5, 2013, 9:54:20 AM10/5/13
to golan...@googlegroups.com
That looks pretty cool! Welcome to the fun of substantial Go code. :-)

One of my first projects in Go was to build a Scrabble suggestion algorithm. It also needed a dictionary that allows for arbitrary wildcard searches (both with fixed positions and partially-constrained positions for blanks that come next to other words). Ultimately I rejected a Trie as a data structure for it and went with a Mealy Machine instead, which was a really fun bit of computer science to implement. It's really fast and the dictionary is very small (the 1.3MB "Words with Friends" dictionary becomes 530KB, and it loads almost instantly because its in-memory format is almost identical to its on-disk format).

I've been wanting to open-source the mealy machine implementation anyway, since it's pretty general. It just works on slices of bytes. In fact, I just uploaded it to Google Code Hosting, so you can take a gander and play around to your heart's content:


The "mealy" package has the super basic mealy machine in it. It supports serialization. The "compiledict" program uses a mealy machine to convert a (sorted) text file into a mealy machine and save it to a file. The "scrabble" project actually uses the mealy machine to do scrabble-type searches, and is a really good example of how you can set up your own arbitrarily crazy constraints to do searches using the mealy machine.

One thing that I regret, because I was a real Go novice at the time (and probably still am, truth be told), is that I did a lot of the work by passing channels out of functions to use as iterators. If you choose to not exhaust these iterators completely (e.g., by breaking out early from a "for ... range" loop), then a channel leaks. I haven't had a lot of motivation to change that because the program is intended to run really fast and just exit, but it *is* a deficiency in the way that the library is designed. I'll probably fix that later on.

Chris Monson

unread,
Oct 5, 2013, 9:57:35 AM10/5/13
to golang-nuts
Oops - those links aren't right. You can get to the projects by going to http://code.google.com/p/entrogo though. I'll have to figure out how to make the modules accessible on their own.


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/BJHTLYwFMb8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Monson

unread,
Oct 5, 2013, 10:07:58 AM10/5/13
to golang-nuts
Fortunately, the paths I gave actually work fine with go get. Whew. First time for everything.

Tango

unread,
Oct 5, 2013, 10:37:26 AM10/5/13
to golan...@googlegroups.com
GPL license?

Jan Mercl

unread,
Oct 5, 2013, 11:40:17 AM10/5/13
to Tango, golang-nuts
On Sat, Oct 5, 2013 at 4:37 PM, Tango <teg...@gmail.com> wrote:
> GPL license?

Yep. Why do you ask?

-j

Andreas Nilsson

unread,
Oct 6, 2013, 6:40:30 PM10/6/13
to golan...@googlegroups.com
Thanks; hadn't run across the Mealy Machine before. I'm having quite a bit of success with the trie though. I've used it to implement a general solver that solves a 13x20 grid with 59 words and 120 unknown variables in 10s using a 5Mbyte/400k word dictionary: https://github.com/andreas-gone-wild/gocrosswise. And a small project to convert any text file into a sorted word list: https://github.com/andreas-gone-wild/dictwiz.

/Andreas
Reply all
Reply to author
Forward
0 new messages