[Haskell-cafe] How to implement a source-sink pattern

47 views
Skip to first unread message

Roland Lutz

unread,
Apr 1, 2015, 3:31:42 PM4/1/15
to haskel...@haskell.org
Hi!

I'm trying to implement a source-sink type of pattern where there are a
number of sinks, each connected to exactly one source, and a number of
sources, each connected to zero or more sinks. The program consists of
some modules, each defining their own sources and sinks. To illustrate
this, here's what this would look like in C:


/* global.h */

struct source {
char *state;
/* some more fields */
}

struct sink {
struct source *source;
char *state;
/* some more fields */
}

struct sink **get_sinks_for_source(struct source *source);

/* module_a.c */

struct source a_source, another_source;
struct sink foo, bar, baz;

...
foo.source = &a_source;
...


Since getting the list of sinks for a source is a common operation, I'd
probably define some kind of reverse map which is updated when a sink is
remapped to a new source.

I tried to rebuild this in Haskell, but the result is ridiculously
complicated. Since I can't use pointers, I had to define an ordinal type
class to enumerate the sources and sinks and use this to look up the
actual data from the world state. But then I couldn't define the Sink
type properly as I can't do:

data SinkInfo = Source a => SinkInfo { sinkSource :: a,
sinkState :: String }

I have to add the source type as a type attribute, leading to a world
state with as many parameters as there are sinks. Also, I couldn't figure
out how to implement a function like

sinksForSource :: Source a => WorldState p q r -> a -> [Sink b => b]

since the source types won't match the type of the lookup key, and the
sinks can be from different modules. Now here is the actual code
(comments indicate where I intend to split it into individual files):


{- Global.hs -}

data SourceInfo = SourceInfo { sourceState :: String }

class Eq a => Source a where
getSourceInfo :: WorldState p q r -> a -> SourceInfo

data SinkInfo a = SinkInfo { sinkSource :: a, sinkState :: String }

class Sink a where
getSinkInfo :: Source x => WorldState x x x -> a -> SinkInfo x
-- should allow different arguments to WorldState

{- ModuleA.hs -}

data ModuleASource = ASource | AnotherSource deriving (Eq)

instance Source ModuleASource where
getSourceInfo world ASource = aSource $ moduleAState world
getSourceInfo world AnotherSource = anotherSource $ moduleAState world

data ModuleASink = Foo | Bar | Baz

instance Sink ModuleASink where
getSinkInfo world Foo = foo $ moduleAState world
getSinkInfo world Bar = bar $ moduleAState world
getSinkInfo world Baz = baz $ moduleAState world

data ModuleAState p q r = ModuleAState { aSource :: SourceInfo,
anotherSource :: SourceInfo,
foo :: SinkInfo p,
bar :: SinkInfo q,
baz :: SinkInfo r }

sinksForSourceInModuleA :: Source x => ModuleAState x x x -> x -> [ModuleASink]
-- should allow different arguments to ModuleAState and return Sink b => [b]
sinksForSourceInModuleA (ModuleAState _ _ foo bar baz) source =
(if sinkSource foo == source then [Foo] else []) ++
(if sinkSource bar == source then [Bar] else []) ++
(if sinkSource baz == source then [Baz] else [])

{- Main.hs -}

data WorldState p q r = WorldState { moduleAState :: ModuleAState p q r }

initState :: WorldState ModuleASource ModuleASource ModuleASource
initState = WorldState $ ModuleAState (SourceInfo "a source init state")
(SourceInfo "another source init state")
(SinkInfo ASource "foo init state")
(SinkInfo ASource "bar init state")
(SinkInfo AnotherSource "baz init state")

remapBar :: WorldState p q r -> WorldState p ModuleASource r
remapBar (WorldState a) =
WorldState $ a { bar = SinkInfo AnotherSource (sinkState $ bar a) }

sinksForSource :: Source x =>
WorldState x x x -> x -> [ModuleASink]
-- should allow different arguments to ModuleAState and return Sink b => [b]
sinksForSource (WorldState a) source = sinksForSourceInModuleA a source

main :: IO ()
main = let before = initState
after = remapBar before in
do
putStrLn $ "Number of sinks for another source before: " ++
(show $ length $ sinksForSource before AnotherSource)
putStrLn $ "Number of sinks for another source after: " ++
(show $ length $ sinksForSource after AnotherSource)


There are some problems with this code:
* I couldn't figure out how to resolve the circular references which are created by splitting this into individual files.
* Source and sink ordinals can't be mixed between modules, so I wouldn't be able to add a module B anyway.
* Looking up the source/sink states by ordinal keys is kind of cumbersome but works.
* To get the list of sinks connected to a given source, the whole world state has to be polled. I expect this to be grossly inefficient (unless Haskell does some magic here) but I'm not sure how to add a cache in a consistent way.

I can't believe this should be so complicated in Haskell, so I guess I'm
trying to do this in an un-Haskell-ish way, or maybe there's something
obvious I haven't seen. I'd be happy about any suggestions.

Roland

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Frank Staals

unread,
Apr 1, 2015, 5:33:12 PM4/1/15
to Roland Lutz, haskel...@haskell.org
Roland Lutz <rl...@hedmen.org> writes:

> Hi!
>
> I'm trying to implement a source-sink type of pattern where there are a number
> of sinks, each connected to exactly one source, and a number of sources, each
> connected to zero or more sinks. The program consists of some modules, each
> defining their own sources and sinks. To illustrate this, here's what this
> would look like in C:


Hey Roland,

So essentially you want a data structure for some kind of bipartite
graph. The most haskelly way to do that would probably to define the
graph to be simply define the Bipartite graph to be a pair of Maps, and
define functions to add/delete nodes and edges to the graph that make
sure that the two maps keep in sync. This would give you something like:

import qualified Data.Map as M

data MySource = MySource { sourceState :: String , % and any other data specific to sources }

data MySink = MySink { sinkState :: String, % and whatever else sinks have}

data BiGraph src snk = BiGraph { sourceToSinkMap :: M.Map src [snk]
, sinkToSourceMap :: M.Map snk src
}

addEdge :: (src,snk) -> BiGraph src snk -> BiGraph src snk
addEdge (src,snk) (BiGraph m1 m2) = BiGraph (M.update src (snk :) m1)
(M.insert snk src m2)
% make sure to check that snk
% does not already occur in
% m2 etc.


you essentially get your 'sinksForSource' functions for free:

sinksForSource :: src -> BiGraph src snk -> [snk]
sinksForSource src = M.lookup src . sourceToSinkMap

In this model you cannot direclty mix the sources and sinks from
different modules. I.e. a 'BiGraph MySource MySink' cannot be used to
also store a (MySecondSource,MySecondSink) pairs. If you do want that,
you would need some wrapper type that distinguishes between the various
'MySink', 'MySecondSink', versions.

Note that instead of building such a graph type yourself you might also
just want to use some existing graph library out there (i.e. fgl or
so).

Hope this helps a bit.

--

- Frank

Roland Lutz

unread,
Apr 2, 2015, 1:45:14 PM4/2/15
to Frank Staals, haskel...@haskell.org
On Wed, 1 Apr 2015, Frank Staals wrote:
> So essentially you want a data structure for some kind of bipartite
> graph.

Yes, with the additional constraint that the vertices in one partite set
(the "sinks") each connect to exactly one edge.

> The most haskelly way to do that would probably to define the graph to
> be simply define the Bipartite graph to be a pair of Maps, and define
> functions to add/delete nodes and edges to the graph that make sure that
> the two maps keep in sync.

This was actually my first approach, but I couldn't find appropriate key
and value types to be stored in the map. Since the vertices are
well-known global objects, it doesn't make much sense to store more than a
handle here. But how do I connect the handle back to the data structure?

> In this model you cannot direclty mix the sources and sinks from
> different modules. I.e. a 'BiGraph MySource MySink' cannot be used to
> also store a (MySecondSource,MySecondSink) pairs. If you do want that,
> you would need some wrapper type that distinguishes between the various
> 'MySink', 'MySecondSink', versions.

That's one of the points that trouble me. How would such a wrapper look
like?

I experimented a bit with your code (see below). I noticed that I have to
specify "Ord src =>" and "Ord snk =>" in multiple places. Is there a way
to state that type arguments for BiGraph always have to be instances of
Ord?

Roland



import qualified Data.List as L
import qualified Data.Map as M

data BiGraph src snk = BiGraph {
sourceToSinkMap :: M.Map src [snk],
sinkToSourceMap :: M.Map snk src
} deriving Show

collectKeys :: Eq a => a -> M.Map k a -> [k]
collectKeys a = M.keys . M.filter (== a)

applyToPair :: (k -> a) -> k -> (k, a)
applyToPair f a = (a, f a)

initializeGraph :: Ord src => [src] -> M.Map snk src -> BiGraph src snk
initializeGraph srcs m2 =
BiGraph (M.fromList $ map (applyToPair $ (flip collectKeys) m2) srcs) m2

updateEdge :: Ord src => Ord snk =>
(src, snk) -> BiGraph src snk -> BiGraph src snk
updateEdge (src, snk) (BiGraph m1 m2) =
if M.notMember src m1 then error "updateEdge: invalid source" else
if M.notMember snk m2 then error "updateEdge: invalid sink" else
let oldsrc = m2 M.! snk in
BiGraph (M.adjust (snk :) src $ M.adjust (L.delete snk) oldsrc m1)
(M.insert snk src m2)

sinksForSource :: Ord src => src -> BiGraph src snk -> [snk]
sinksForSource src = (M.! src) . sourceToSinkMap

Frank Staals

unread,
Apr 6, 2015, 4:40:35 AM4/6/15
to Roland Lutz, haskel...@haskell.org
Roland Lutz <rl...@hedmen.org> writes:

> On Wed, 1 Apr 2015, Frank Staals wrote:
>> So essentially you want a data structure for some kind of bipartite graph.
>
> Yes, with the additional constraint that the vertices in one partite set (the
> "sinks") each connect to exactly one edge.
>
>> The most haskelly way to do that would probably to define the graph to be
>> simply define the Bipartite graph to be a pair of Maps, and define functions
>> to add/delete nodes and edges to the graph that make sure that the two maps
>> keep in sync.
>
> This was actually my first approach, but I couldn't find appropriate key and
> value types to be stored in the map. Since the vertices are well-known global
> objects, it doesn't make much sense to store more than a handle here. But how
> do I connect the handle back to the data structure?

A vertex (source/sink) is not uniquely coupled with a graph; it may be
in more than one graph. So, there is no easy way to define a function
with type 'Vertex -> Graph'. In other words, you should pass around
the graph data structure if you need connectivity information.

>> In this model you cannot direclty mix the sources and sinks from
>> different modules. I.e. a 'BiGraph MySource MySink' cannot be used to
>> also store a (MySecondSource,MySecondSink) pairs. If you do want that,
>> you would need some wrapper type that distinguishes between the various
>> 'MySink', 'MySecondSink', versions.
>
> That's one of the points that trouble me. How would such a wrapper
> look like?

Assuming that the set of different sink types is fixed and known at
compile time, simply:

data Sink = SinkA ModuleA.Sink
| SinkB ModuleB.Sink
| ...


If the sink types are not known then you need either Existential type or
something like an open-union. An existentialtype will look something
like:

data Sink where
Sink :: IsASink t => t -> Sink

however, if you have a value of type Sink, you cannot recover what exact
type of sink it was (i.e. if it was a ModuleA.Sink or a ModuleB.Sink):
you only know that it has the properties specified by the IsASink
typeclass.

With open-unions you should be able to recover the exact type. However
that is a bit more complicated. I don't have concrete experience with
them myself, so others might be more helpful on that front.


> I experimented a bit with your code (see below). I noticed that I have to
> specify "Ord src =>" and "Ord snk =>" in multiple places. Is there a way to
> state that type arguments for BiGraph always have to be instances of Ord?
>
> Roland

Depends a bit, if you wish to keep the functions polymorphic in the
source an sink types (src and snk) then you have to keep them. If you
fill in the concrete types at hand, and you give them Ord instances,
then (obviously) you don't have to keep the Ord constraints ;)

Last note: If your source and sink types don't have a proper Ordering,
you can switch to using (integer) explicit vertexIds and
IntMaps. Similar to the API of fgl. However, in that case you have to do
the bookkeeping of which vertex has which vertexId yourself.

> updateEdge :: Ord src => Ord snk =>
> (src, snk) -> BiGraph src snk -> BiGraph src snk
> updateEdge (src, snk) (BiGraph m1 m2) =

FWI: you would normally write multiple class constraints like (Ord src,
Ord snk) =>, instead of Ord src => Ord snk =>. I'm kind of surprised the
latter is still allowed.

Regards,

--

- Frank

Mario Blažević

unread,
Apr 6, 2015, 11:09:41 AM4/6/15
to haskel...@haskell.org
On 04/01/2015 03:31 PM, Roland Lutz wrote:
> Hi!
>
> I'm trying to implement a source-sink type of pattern where there are
> a number of sinks, each connected to exactly one source, and a number
> of sources, each connected to zero or more sinks. The program
> consists of some modules, each defining their own sources and sinks.
> To illustrate this, here's what this would look like in C:
>
>
> /* global.h */
>
> struct source {
> char *state;
> /* some more fields */
> }
>
> struct sink {
> struct source *source;
> char *state;
> /* some more fields */
> }
>
> struct sink **get_sinks_for_source(struct source *source);
>
> /* module_a.c */
>
> struct source a_source, another_source;
> struct sink foo, bar, baz;
>
> ...
> foo.source = &a_source;
> ...
>

One important thing you didn't state is: which parts of these data
structures are immutable? The process of moving any data structure from
C to Haskell depends on the answer, but otherwise can usually be done in
a relatively mechanical fashion:

1. everything that is immutable should be shifted as deeply as possible,
the mutable containers containing immutable ones;
2. map the data structures over: struct to a record, immutable array to
an immutable array (or list or map or whatever, depending on the access
pattern and performance requirements), mutable array to a mutable array;
3. map immutable non-null pointers to just the data structure they're
pointing to, other immutable pointers to Maybe, mutable non-null
pointers to STRef, other mutable pointers to STRef Maybe.
4. use runST to hide the whole mess behind a pure interface as much as
possible.

The result is unlikely to be optimal or elegant, but this process
can get you a working implementation in Haskell. Once there, start
refactoring the algorithms in and you'll likely be able to simplify both
the data structure and the whole program. Take care to start with strong
types and they will prevent you from doing anything stupid while
refactoring.

On a related note: I have no idea what Sink and Source are supposed
to be for, but it's possible that pipes and conduits already provide it.
Reply all
Reply to author
Forward
0 new messages