FSharpx data structures

188 views
Skip to first unread message

Jack Fox

unread,
Nov 29, 2012, 5:49:30 PM11/29/12
to pragmatic-functional...@googlegroups.com
Jon, thanks for starting this interesting group and adding so much content.

On the pragmatic side, I'd like to call attention to the FSharpx OS project and its collection of nearly 30 functional data structures, many from Okasaki's book. I just published this fsharpx data structures discussion elicting feedback on how best to move forward growing and maintaining the library.

Jon Harrop

unread,
Nov 29, 2012, 11:19:57 PM11/29/12
to pragmatic-functional...@googlegroups.com
FSharpx is an interesting project and one that I have been meaning to look at for a long time. Great to see people collaborating on this kind of project!

If you'd like to push this for wider adoption, I have lots of advice. Firstly, I'd give it a clear objective and a more appropriate name (e.g. F# collections: a library of useful data structures). Secondly, I'd strip it right down. Okasaki's work was superb but also very academic. His code is often fragile (e.g. stack overflows) and, practically speaking, is 90% redundant. For example, he describes about a dozen heap implementations when, in practice, you're never likely to need more than just a pairing heap (although, ironically, that one is the least-well-understood theoretically!). Many people have now ported all of them to F# and uploaded them to Github which is nice but a long way from the polished user-friendly library the community would benefit the most from. I'd like to see a library with just one decent concrete implementation of each kind of collection. Also, the FSharpx docs talk about putting a Queue interface on all queues but I'd not bother. It introduces code/docs complexity, virtual dispatch which is slow and 99% of users will never need more than one queue anyway.

In an ideal world it would be great to have a library like FSharpx that contains absolutely everything but, in practice, I think it would be better to have more libraries with clearly-defined and attainable goals. In fact, perhaps we should enumerate the kinds of libraries we'd like to see? Also, I think there is a very strong argument for getting core stuff into F# itself. I think the F# standard library should augment Set and Map with a Queue, (pairing) Heap and a functional array. That would be *really* useful.

I had no idea the PowerPack had been abandoned. My company and some of our clients have lots of production code using the PowerPack. Not sure where to go with that. I'd like to see the stable core bits (e.g. rationals) put into the F# standard library that is bundled with Visual Studio.

FWIW, our commercial libraries (F# for Numerics and F# for Visualization) use a technique I developed to give F# structural typing using reflection:


The result is objectively not ideal but it is the best you're likely to see on .NET and is certainly very useful in practice. Specifically, I provide structural interfaces and use reflection to build implementations of the interface from real libraries that are found (at run-time, using reflection) to have adhered to the interface. So it will look for a rational type with Numerator and Denominator properties that are BigInts and call them for you, regardless of the library being used. This allows our commercial code to use the PowerPack when possible without actually being dependent upon it. For example, if you pull in the PowerPack yourself and ask F# for Visualization to visualize a value that is a matrix of arbitrary-precision rationals that you created using the PowerPack, our library will do so without it having to reference the PowerPack itself and it will typeset and display that matrix of fractions for you. Suffice to say, there are lots of practical applications of this on .NET to the extent that I'd even push for structural typing in the language itself.

Regarding purely functional data structures, my thoughts are that they are inherently very slow (I say ~10x slower as a rule of thumb) so the first thing people should do when optimizing production code is swap them out for imperative data structures where possible (which is why I think optimizing GCs for purely functional data structures at the cost of the performance of imperative data structures is counter-productive for most users in practice). Consequently, I think their main value is simplicity so I'd focus on ease of use as top priority for them.

There are several other important areas where F# has huge potential:

Metaprogramming. ML was bred for this and F# inherits that but we are nowhere near taking full advantage of this. We should be able to write Yacc-style parsers in our code and have them compiled at run-time using reflection instead of the archaic two-stage compilation fsyacc uses.

Graphics. WPF gives us a great foundation for GUIs and visualization. When I used Linux everything was command-line driven and writing a GUI was just a dream. For me, F# realized that dream and I now write GUIs in F# for every other program I do. I've just written two GUI applications to demo to a major client. One of them is basically a complete business rules engine with WPF-based UI in under 1,000 lines of F#. I think that's awesome but I'm still just scratching the surface. I think XAML is an easy target here. F# is not only more concise but also type safe. FSharpx has data structures but what support for visualizing them? If you implement ToString should you also implement ToWPFControl? We have structural pretty printers to ASCII so why not pretty print to WPF controls? Shouldn't a math library include editable typeset mathematics these days?

Are there any other subjects where F# has huge unrealized potential?

I have some comments on some of what is on that page you cited:

"Adopting a practice from Haskell documentation, I put the time complexity for members and let bindings at the beginning of their intellisense documentation". Fantastic! C++ STL got this right, IMO.

"if it is not O(1), it is not a property". What about O(lg n)? I don't think anyone will cry foul because the MinimumElement property of a red-black tree is logarithmic time.

"Hint: if the type implements TryUncons or TryUnsnoc". The name "snoc" is "cons" written backwards, which was the name used for a core operation in Lisp 50 years ago. That's a really obtuse and user-unfriendly academic name for "append" (or, maybe better, "push_back" from C++).

"If a structure’s method can throw an exception, include a TryMethodName method that returns option". Absolutely!

"Every data structure should have a length member (even if length is O(n))". Arrays use "Length" but everything else on .NET seems to use "Count".

"In theory data structures implementing the same paradigm, e.g. queue, should implement an interface
the exposed union does provide one more handle for the determined hacker to be inventive with". I don't think you want to do that. ML's higher-order modules are perfect for what you are describing. You can parameterize one data structure over another data structure. Everything is done at compile time with full type safety and no run-time overhead. Beautiful. With .NET interfaces, nothing is done at compile time, everything is deferred to run-time, nothing gets properly type checked so you compromise your design (you want MyQueue.Enqueue to return a MyQueue, not an IQueue) and start pulling in obscure forms of generic constraints in a desperate attempt to solve your problem using what is, objectively, the wrong tool for the job. Your underlying data structure is an array. Behind the interface your user just wants to know the length. That's just reading 1 word from memory. Except that interface is now in your way. .NET gets the v-table, looks up the dynamic branch location and jumps. Your expensive new CPU flushes its deep superscalar pipeline and starts reloading instructions from scratch. All because you decided to use a .NET interface. Don't get me wrong though. Interfaces in .NET do have practical applications. Provided your architect is a prophet. ;-)

Cheers,
Jon.

Jack Fox

unread,
Nov 29, 2012, 11:34:53 PM11/29/12
to pragmatic-functional...@googlegroups.com
Great information, Jon. Lots to work on here. I will see you Monday in S.F.

Jack

Jon Harrop

unread,
Nov 30, 2012, 1:09:37 AM11/30/12
to pragmatic-functional...@googlegroups.com

Incidentally, you might even aspire to replace some of the .NET collections because they are quite poorly implemented in parts. For example, the System.Collections.Generic.Queue does “dynamic” integer modulo so it is several times slower than necessary.

 

Cheers,

Jon.

 

--
 
 

Jack Fox

unread,
Nov 30, 2012, 1:09:02 PM11/30/12
to pragmatic-functional...@googlegroups.com, jonathand...@googlemail.com
I think we're gaining critical mass to move forward with reorganization of FSharpx.DataStructures. Please continue the discussion on this thread https://github.com/fsharp/fsharpx/issues/169

Don Syme

unread,
Nov 30, 2012, 3:57:00 PM11/30/12
to pragmatic-functional...@googlegroups.com
(I know this is a bit off topic since this is not an F# discussion list per se, my apologies. The F# open source google group is probably a better forum)
 
Hi all,
 
Jon and Jack mention the F# Power Pack. We're currently starting to discuss the best way take the functionality forward. The current collection of DLLs, components and code are stable and will stay in place indefinitely and we'll continue to use the CodePlex site to drop copies of the F# compiler - it's more a question of finding a next step to enable the community to move forward.
 
I think FSharpx is a reasonable way to move some components forward, though  I share some of your concerns below. I've advised in the fsharpx discussions (here) that they should not make the library a "kitchen sink" (and i nparticular not include any math functionality) but instead focus on independent packages that implement:
  • commmunity provided extensions to existing FSharp.Core functionality Async, Seq, Event, Observable, Quotations etc. (likely one independent package)
  • functional programming collections (in one independent package)
  • type providers (each in an independent package)
  • some additional core computer science functionality like STM (perhaps in an independent package)
One repository creating multiple libraries seems to give quite good efficiency as the test, continuous integration, documentation, issues and forums can be shared.  However of course each package needs to be carefully crafted for design coherence and maximum independence.
 
Regards
Don

Jack Fox

unread,
Dec 12, 2012, 12:39:05 PM12/12/12
to pragmatic-functional...@googlegroups.com
Last call for comments on re-org of FSharpx collections and data structures from the F# community call for comments
Reply all
Reply to author
Forward
0 new messages