Subscription storage

10 views
Skip to first unread message

Wintermoose

unread,
Oct 9, 2010, 2:21:03 PM10/9/10
to aspComet
A would like to add some common mapping from channel to list of
clients, so that replies sent to channel don't need to search through
every client's subscription list. I know keeping 2 parallel structures
(this list, plus the existing client's subscription list) in sync is
pain with multi-threaded code - unless we want to take big fat locks
everywhere - but it will be important for many clients connected at
once.

But if I start messing with this, it would be perhaps nice to
implement support for the wildcard subscriptions, as specified by
bayeux. I was thinking about possible data structures for it

a) just a list that has to be searched (like my implementation of the
server-side channel events). I think this is suitable only for max. 10
patterns or so, thus ok for the server-side subscriptions, but not for
clients.

b) prefix tree. A tree where every node represents one segment of the
path, and holds 2 lists (asterix, double asterix). When travelling the
tree, we gather all double-asterix members on the way (up to final
level minus 1) and asterix members in final level - 1

Variants: put non-wildcard subscriptions into the tree as well, or
have 2 separate trees (but that's travelling it twice)

c) single dictionary. When sending message to something like /chat/
public/1/whisper, the code does these lookups:
/chat/public/1/whisper
/chat/public/1/*
/chat/public/1/**
/chat/public/**
/chat/**
/**
And merges the lists.

Variants: 1 dictionary for all, or 3 dictionaries (static, single *,
double *) or 1+1

When deciding between b and c, I feel more inclined towards the later.
I believe the performance of the basic lookup will be more or less
identical, and I think single dictionary will be easier to maintain.
In the case of he trees ,we would need some code that kills branches
no longer needed now and then, which can get hairy with the 2 or 3
lists involved. On the other hand, with the tree variants it may be
possible to do more local locks for updates, helping the performance
a bit.

What do you think?

best regards
// Robert

Symon Rottem

unread,
Oct 9, 2010, 2:56:04 PM10/9/10
to aspc...@googlegroups.com
Honestly, I don't know what the best solution is here, but I'm not in a situation where I need to worry about that level of scalability yet, so for me it's a premature optimization.  Of course, as soon as I do have that many connected clients I'll need to think about it... :)

I do think that wildcard support is needed, however.

Cheers,

Symon.

Neil Mosafi

unread,
Oct 10, 2010, 2:41:01 PM10/10/10
to aspComet
Hmm, we definitely need to add support for wildcards. There's already
a ChannelName class that should work for this means and I see you've
created a ChannelPattern one (which I've not had time to look at yet)
is this doing a similar thing?

Now I'm not too sure that we want to try to optimise the storage of
subscriptions within the framework itself. I would like the scope of
AspComet to remain very focused on solving the messaging between
browser and server and implementing the Bayeux protocol (at which it's
still far from perfect).

There is a fairly generic interface
IClientRepository.WhereSusbribedTo("channel") which is used internally
for publishing messages to clients subscribed to channels, so an
application could provide its own ClientRepository with a more
optimised way of doing this, but I think each application will want
their own way of doing it, depending on the context, maintaining the
relationship between channel and client internally or in some database
or distributed cache.

As an aside I would really like to avoid the framework taking any
locks of any kind unless completely necessary, and leave it up to
applications to manage synchronisation.

What do you guys think?


On Oct 9, 7:56 pm, Symon Rottem <s.rot...@gmail.com> wrote:
> Honestly, I don't know what the best solution is here, but I'm not in a
> situation where I need to worry about that level of scalability yet, so for
> me it's a premature optimization.  Of course, as soon as I do have that many
> connected clients I'll need to think about it... :)
>
> I do think that wildcard support is needed, however.
>
> Cheers,
>
> Symon.
>
> Symon Rottemhttp://blog.symbiotic-development.com

Wintermoose

unread,
Oct 10, 2010, 3:23:35 PM10/10/10
to aspComet

> Hmm, we definitely need to add support for wildcards.  There's already
> a ChannelName class that should work for this means and I see you've
> created a ChannelPattern one (which I've not had time to look at yet)
> is this doing a similar thing?

Yeah, sort of. The difference is that ChannelName stores channel name
and matches against various patterns, while ChannelPattern stores
pattern and matches against multiple names. The second is what you
need for the wildcard support, and it allows the class to be more
efficient at the matching.


> Now I'm not too sure that we want to try to optimise the storage of
> subscriptions within the framework itself.  I would like the scope of
> AspComet to remain very focused on solving the messaging between
> browser and server and implementing the Bayeux protocol (at which it's
> still far from perfect).
>
> There is a fairly generic interface
> IClientRepository.WhereSusbribedTo("channel") which is used internally
> for publishing messages to clients subscribed to channels, so an
> application could provide its own ClientRepository with a more
> optimised way of doing this, but I think each application will want
> their own way of doing it, depending on the context, maintaining the
> relationship between channel and client internally or in some database
> or distributed cache.

I didn't want the special storage for some advanced app usage, but for
sending message to all clients subscribed to a given channel, which I
think is quite a core functionality of the server? After all, even
ForwardingHandler needs it.

I know we have the WhereSubscribedTo method which handles it
currently, but the way is not very effective - if you imagine 1000
clients and say 20 channels, then every message going to one of the
channels have to go through all the 1000 clients (and potentially
multiple subscriptions per client). And that's gonna be even worse
when IsSubscribedTo will have to do the wildcard matching.

I know I can always implement it as customization of the
InMemoryClientRepository/Client classes, but I thought it's probably
useful for all the uses of the server.

> As an aside I would really like to avoid the framework taking any
> locks of any kind unless completely necessary, and leave it up to
> applications to manage synchronisation.

Yes I agree, but if the storage is implemented as part of the server,
we have to expect the possibility that 2 clients write to it at the
same time (subscribe/unsubscribe)

Neil Mosafi

unread,
Oct 10, 2010, 8:33:13 PM10/10/10
to aspComet

On Oct 10, 8:23 pm, Wintermoose <robert.gol...@gmail.com> wrote:
> > Hmm, we definitely need to add support for wildcards.  There's already
> > a ChannelName class that should work for this means and I see you've
> > created a ChannelPattern one (which I've not had time to look at yet)
> > is this doing a similar thing?
>
> Yeah, sort of. The difference is that ChannelName stores channel name
> and matches against various patterns, while ChannelPattern stores
> pattern and matches against multiple names. The second is what you
> need for the wildcard support, and it allows the class to be more
> efficient at the matching.

Still don't quite get the difference between ChannelName and
ChannelPattern. Why do we need both classes?

Say a client subscribes to "foo/*" and there is a publish to "foo/
bar". You either

A) create a ChannelName for "foo/bar" and see if that matches against
"foo/*", or
B) create a ChannelPattern for "foo/*" and see if that matches against
"foo/bar"

Same thing right?
Hmm you're probably right in the long run. I just don't want to have
to second guess the users of the framework and leave it as open as
possible for applications to provide their own infrastructure.

For example, there's essentially going a many-many relationship
between "channel" and "client" and something needs to maintain that
relationship. One would imagine in a very large site there'll be some
kind of web farm going on, therefore the list of clients will need to
be stored in some database or distributed cache rather than in memory
on the web server. So having an internal storage mechanism would not
be right for those needs. The ClientRepository implementation may end
up having to go to a database to ask to select the clients which are
subscribed to a particular channel (you probably don't want the
matching logic to be happening in SQL as that's nasty but you get the
picture!)

> > As an aside I would really like to avoid the framework taking any
> > locks of any kind unless completely necessary, and leave it up to
> > applications to manage synchronisation.
>
> Yes I agree, but if the storage is implemented as part of the server,
> we have to expect the possibility that 2 clients write to it at the
> same time (subscribe/unsubscribe)

Yes - only if there's a global list of channels and clients (e.g. not
per client as we have now)

Wintermoose

unread,
Oct 11, 2010, 3:36:46 AM10/11/10
to aspComet

> Still don't quite get the difference between ChannelName and
> ChannelPattern.  Why do we need both classes?
>
> Say a client subscribes to "foo/*" and there is a publish to "foo/
> bar".  You either
>
> A) create a ChannelName for "foo/bar" and see if that matches against
> "foo/*", or
> B) create a ChannelPattern for "foo/*" and see if that matches against
> "foo/bar"
>
> Same thing right?

Functionally yes, but if pattern is stored in the class, then the
match can be more effective (compare the Matches() methods in
ChannelPattern and ChannelName)

>
> Hmm you're probably right in the long run.  I just don't want to have
> to second guess the users of the framework and leave it as open as
> possible for applications to provide their own infrastructure.
>
> For example, there's essentially going a many-many relationship
> between "channel" and "client" and something needs to maintain that
> relationship.  One would imagine in a very large site there'll be some
> kind of web farm going on, therefore the list of clients will need to
> be stored in some database or distributed cache rather than in memory
> on the web server.  So having an internal storage mechanism would not
> be right for those needs.  The ClientRepository implementation may end
> up having to go to a database to ask to select the clients which are
> subscribed to a particular channel (you probably don't want the
> matching logic to be happening in SQL as that's nasty but you get the
> picture!)

Ok I see your point, if people will replace the repository and/or
client classes with alternative implementation (instead of just
extension), it's better to keep them relatively basic and independent
on each other.

I brought up the question about subscriptions mainly as a way to
discuss the algorithm, it's not yet top priority for me either - I
will need to create some stress tests first. But when (if) I get to
it, I will try to implement the storage as a plugin replacement of
client and client repository - when it's properly tested and working,
we can include them in the core or in the sample section as
alternatives. Then we'll also see, if the interface as defined today
is flexible enough, or we need to expose more.

// Rob
Reply all
Reply to author
Forward
0 new messages