Meeting summary notes for: Monday, November 10, 7pm, CubeSpace

9 views
Skip to first unread message

Igal Koshevoy

unread,
Nov 13, 2008, 12:02:19 PM11/13/08
to pdxfunc
We had a great meeting with 10 attendees, and despite not having a
pre-planned agenda, managed to cover two hours of content during the
official meeting, plus another two at the Side Door afterwards.

Also a quick reminder -- the "WestSide Polyglot Programmers" is a group
of really bright engineers that meet in Beaverton and often talk about
topics that overlap with those in pdxfunc, and I think Phil will talk a
bit about OCaml there. They have a meeting on Monday, November 24, 2008
from 7–9pm [see http://calagator.org/events/1250456089 ].

The pdxfunc meeting topics included:

1. Bart Massey presented "thimk", his Haskell implementation of a
command-line program for checking the spelling of a single word. It
provides high-quality guesses based on finding the best sound-alike
words. The project is at http://wiki.cs.pdx.edu/forge/thimk.html --
suggestions and contributions are welcomed!

During his talk, Bart quickly covered:
- cabal: Haskell-based build system [ http://www.haskell.org/cabal/ ]
- hackage: Haskel package repository [
http://hackage.haskell.org/packages/hackage.html ]
- parseargs: Bart's command-line option parser library that uses a
Haskell-like DSL
- edit-distance: Library for describing the difference between two words
- phonetic-code: - Bart's implementations of popular algorithms like
Soundex and Phonix for identifying similar-sounding words [
http://wiki.cs.pdx.edu/forge/phonetic-code.html ]
- sqlite: Don Stewart's C language library wrapper using Haskell's
foreign-function interface (FFI)

The thimk app is structured as a series of progressive filters, each of
which narrows down a list of candidates based on progressively more
accurate, and thus more expensive, algorithms. Using this approach made
it easy to incrementally add features and optimizations to application
as they were needed. The program, if I recall correctly, starts by
trying simpler Soundex, Phonix and such algorithms, and then uses a
weighted edit-distance algorithm against the reduced set of candidate
words to find the best matches. The program also uses a database to
store the computed codes (e.g., soundex), which help it provide
instantaneous results once the cache is populated.

However, implementing the soundex/phonix algorithm proved more
challenging in Haskell than with an imperative language like Perl
because it was impossible to simply alter strings in place to create the
reduced sound-alike representations with simple regexps -- e.g.
normalize the substring "ght" to "t". With a functional language, it was
necessary to generate new data through transformation using a
custom-written pattern-matching language similar to regexps. Yet despite
this, Haskell was still a very productive language because it made it
simple to compose the program incrementally, provided type safety,
eliminated need to worry about memory allocation, etc.

2. OCaml "Batteries Included" [ http://batteries.forge.ocamlcore.org/ ]
and Haskell "Platform" [
http://www.haskell.org/haskellwiki/Haskell_Platform ]: These are
recently initiated efforts to provide standard bundles of stable,
high-quality libraries that can be shipped with these languages. The
intent is to provide the right set of good libraries so that users can
build typical, practical applications without having to hunt down
libraries. The phrase "batteries included" originated from Python, whose
standard library covers everything from asynchronous processing to zip
files.

3. Joys and horrors of XSLT: Leif talked about his efforts to create a
mobile SMS interface to Calagator.org by passing its contents through
XSLT, which is Turing complete mostly functional programming language
for transforming XML document. So while it does it's intended task well
and is well supported on all relevant platforms, it can be incredibly
tedious to do even simple tasks with it. Unfortunately, Leif hit a snag
with Calagator not sanitizing character encoding (e.g., mixture of UTF-8
and CP1252 on the same page), which we'll try to resolve at this
weekend's Calagator code sprint -- -- event details at:
http://calagator.org/events/1250456115

4. Julian Kongslie may talk about his work on building a state-based
search next time.

5. Ed Borasky may talk about the R, a programming language and
environment for statistical computing and graphing.

6. There was was probably more interesting stuff said on the other end
of the table at Side Door that I didn't hear.

7. Doing a review of mldonkey (OCaml) or xmonad (Haskell) has been
relegated to a "rainy day" option if we have a meeting where we find
ourselves with unallocated time, unless someone wants to volunteer to
lead these.

Thanks again for attending, presenting and discussing.

See you next time on our usual second-Monday-of-the-month schedule on
Monday, December 8th, 7pm at CubeSpace, 622 SE Grand, Portland, Oregon

-igal

Reply all
Reply to author
Forward
0 new messages