Cache data in session / Query whether it has been subscribed to a channel?

1,287 views
Skip to first unread message

mst...@googlemail.com

unread,
Dec 19, 2011, 6:10:40 PM12/19/11
to Faye users
I have two questions:

1.) Is it possible to keep/cache some simple data within the scope of
a subscribe/unsubscribe session?

2.) Does the API support a query asking whether at least one client
has subscribed to channel xyz?

Thanks!

I appreciate the excellent work!

James Coglan

unread,
Dec 19, 2011, 6:15:36 PM12/19/11
to faye-...@googlegroups.com
On 19 December 2011 23:10, mst...@googlemail.com <mst...@googlemail.com> wrote:
1.) Is it possible to keep/cache some simple data within the scope of
a subscribe/unsubscribe session?

I'm not sure what you mean by 'cache' here -- could you give more details on what your application needs to do?
 
2.) Does the API support a query asking whether at least one client
has subscribed to channel xyz?

No, but you can easily track this yourself by listening to :subscribe/:unsubscribe events:

faye.bind :subscribe do |client_id, channel|
  # increment count for channel
end

faye.bind :unsubscribe do |client_id, channel|
  # decrement count for channel
end

mst...@googlemail.com

unread,
Dec 19, 2011, 6:43:42 PM12/19/11
to Faye users

On Dec 20, 12:15 am, James Coglan <jcog...@gmail.com> wrote:


> On 19 December 2011 23:10, mstu...@googlemail.com <mstu...@googlemail.com>wrote:
>
> > 1.) Is it possible to keep/cache some simple data within the scope of
> > a subscribe/unsubscribe session?
>
> I'm not sure what you mean by 'cache' here -- could you give more details
> on what your application needs to do?

Socket.IO (http://socket.io/#how-to-use -> "Storing data associated to
a client" (somwhere in the middle)) allows to store data associated
with a client in the session. Of course, storing/cleaning the data
could by managed on the application layer, but application logic could
be saved if Faye manages the data lifecycle, because it knows best
when a session starts and ends. Such a feature would make sense for
storing simple data..

> > 2.) Does the API support a query asking whether at least one client
> > has subscribed to channel xyz?
>
> No, but you can easily track this yourself by listening to
> :subscribe/:unsubscribe events:
>
> faye.bind :subscribe do |client_id, channel|
>   # increment count for channel
> end
>
> faye.bind :unsubscribe do |client_id, channel|
>   # decrement count for channel
> end

Sure, but it requires me keeping tracking of this data. Faye has to
keep track of this data anyway, so reusing it by using some query
interface instead of duplicating it at the application layer doesn't
feel DRY.

James Coglan

unread,
Dec 19, 2011, 7:12:04 PM12/19/11
to faye-...@googlegroups.com
On 19 December 2011 23:43, mst...@googlemail.com <mst...@googlemail.com> wrote:
Socket.IO (http://socket.io/#how-to-use -> "Storing data associated to
a client" (somwhere in the middle)) allows to store data associated
with a client in the session. Of course, storing/cleaning the data
could by managed on the application layer, but application logic could
be saved if Faye manages the data lifecycle, because it knows best
when a session starts and ends. Such a feature would make sense for
storing simple data..

Faye does expose lifecycle events that tell you when a session begins and ends:

server.bind :handshake do |client_id|
  # client ID was just created
end

server.bind :disconnect do |client_id|
  # client ID session ended
end

I'll mention the usual caveat which is that I recommend not coupling your application too tightly to your messaging tools. Often a Faye client will not correspond directly to a domain entity, for example a user will visit many pages and therefore use many Faye sessions as they move around the site. You may find your use case is better modelled purely by sending messages, rather than listening to implementation details.

> > 2.) Does the API support a query asking whether at least one client
> > has subscribed to channel xyz?

Sure, but it requires me keeping tracking of this data. Faye has to
keep track of this data anyway, so reusing it by using some query
interface instead of duplicating it at the application layer doesn't
feel DRY.

There are three reasons this is not in Faye's core:

1. It is not necessary to support Bayeux semantics, and wanted to know it is a sign of a design problem.
2. It requires that engines support it. Constraints that aren't a direct result of the required messaging semantics ought to be avoided.
3. If it were included, it would probably cause bugs.

The first is the most important: it's just not necessary in a pub/sub messaging protocol for any party to know how many clients will receive a message. If you find that it matters to the publisher how many subscribers there are or what they will do with the message, you shouldn't be using a pub/sub system, you should be using a protocol that allows you to directly address the subscribers.

Because of this, I don't want it to be a constraint on engine implementations. Because it needs to be very easy to swap out this part of the stack, fewer constraints is a desirable goal. It means, for example, that an engine can store subscribers without requiring that they be in an easily countable format.

Finally, if it were added, it would probably give you misleading results. It would have to be exposed through an async API, making it likely the count will change between it being calculated and that value being yielded to your code.

The only *accurate* way to implement this would be to have publish() somehow yield the number of clients the message was routed to. But again, this is not part of the Bayeux protocol, it is not necessary, and it imposes constraints on the engine that, for example, would make it harder to build a distributed backend.

By monitoring the count yourself in your application code, you get to decide what makes sense for your application. Do you care about race conditions? Persistence? Should you implement it by listening to the engine or using a heartbeat system on top of the messaging protocol? I've run into all these concerns before and there isn't one answer to them.

Sorry that was a bit of a lecture, but I think this is one of those interesting things that looks simple, but leaks implementation details in all sorts of problematic ways. If you give us more details of what your application does, I can probably give a more concrete set of design tips.

James Coglan

unread,
Dec 19, 2011, 7:15:37 PM12/19/11
to faye-...@googlegroups.com
On 20 December 2011 00:12, James Coglan <jco...@gmail.com> wrote:
server.bind :handshake do |client_id|
  # client ID was just created
end

server.bind :disconnect do |client_id|
  # client ID session ended
end

It occurs to me, if you're comparing with Socket.IO, you probably wanted JavaScript:

server.bind('handshake', function(clientId) {
  // client ID was just created
});

server.bind('disconnect', function(clientId) {
  // client ID session ended
});

Toby Green

unread,
Dec 19, 2011, 7:17:05 PM12/19/11
to faye-...@googlegroups.com
James -  quick question on this.

when is the disconnect event fired? Just after page close, or does it have a built in time-out feature?

Kind regards (and fantastic work!)

Toby
--
Toby Green

James Coglan

unread,
Dec 19, 2011, 7:20:45 PM12/19/11
to faye-...@googlegroups.com
On 20 December 2011 00:17, Toby Green <tob...@gmail.com> wrote:
when is the disconnect event fired? Just after page close, or does it have a built in time-out feature?

It fires either when the client explicitly disconnects, or the page is closed, or the engine times it out due to inactivity, i.e. not receiving a /meta/connect for some time. 

mst...@googlemail.com

unread,
Dec 19, 2011, 8:27:43 PM12/19/11
to Faye users
Hi James,

I just started with pub/sub and thus, don't have any experience yet,
but what you are saying sounds all reasonable to me. So, I take that
as an advice.

I don't want to go in too much detail what I am working on, but I can
give an example which is very close from the "problem I am trying to
solve" point of view.

So, assuming we have a group calendar and 10 users have access to it
and view it at the same time. Now, user one is creating a new event
and I want to broadcast this event to the other users,
So, when a user views the calendar, it subscribes to the relevant
channel. I was thinking about using a resource-like style for channel
naming, for example, the channel for viewing the calendar could be
called "calendars/#calendar_id/show". Each user having access to
calendar with id #calendar_id subscribes to channel "calendars/
#calendar_id". If one user adds a new event, it is published to a
different channel, called "event/#event_id/create", with the
javascript to add the event to the DOM as payload. There is no client
who has subscribed to this channel, but I thought it's cool to listen
to this event on the server side, identifying it by using some pattern
matching and then publish the payload to "calendars/#calendar_id/
show". One might say why not publishing to channel "calendars/
#calendar_id/show" directly. To me it seems important to decouple the
creation of the calendar event from the rest, so I can later publish
to multiple channels, e.g. besides publishing to "calendars/
#calendar_id/show" also publishing to "users/#user_id/activity_stream/
show" and so on.

The reason why I asked (question 1) whether I can store data in the
session managed by Faye was to further improve the implementation by
only publishing data to clients who are actually affected by the
calendar change (the new event). For instance, if a user is viewing
week 35 and the new calendar event is week 36, it doesn't make sense
to send any data related to the new event to the user, because it
won't be displayed anyway. So, if any client would send some meta data
refering to its current view (simply start date and end date) and I
could store this data in the session, then I could filter new events
and only forward them if it makes sense. But yes, this would require
each client to subscribe to a different channel, so one could call
them "user_id:calendars/#calendar_id/show", basically namespacing them
using #user_id. One the server side, I would then have to publish to
each channel, by iterating over all users associated with a specific
calendar.

The reason why I asked whether I can query a certain channel (question
2) was related to the fact that I don't have to have to publish to a
certain channel / user namespaced channel, e.g. "1234:calendars/
#calendar_id/show", if user with id 1234 hasn't subscribed to this
channel...

So, what do you think? Am I overcomplicating things here?

Feedback is very much appreciated...

James Coglan

unread,
Dec 20, 2011, 3:46:46 PM12/20/11
to faye-...@googlegroups.com
On 20 December 2011 01:27, mst...@googlemail.com <mst...@googlemail.com> wrote:
So, assuming we have a group calendar and 10 users have access to it
and view it at the same time. Now, user one is creating a new event
and I want to broadcast this event to the other users,
So, when a user views the calendar, it subscribes to the relevant
channel. I was thinking about using a resource-like style for channel
naming, for example, the channel for viewing the calendar could be
called "calendars/#calendar_id/show". Each user having access to
calendar with id #calendar_id subscribes to channel "calendars/
#calendar_id". If one user adds a new event, it is published to a
different channel, called "event/#event_id/create", with the
javascript to add the event to the DOM as payload. There is no client
who has subscribed to this channel, but I thought it's cool to listen
to this event on the server side, identifying it by using some pattern
matching and then publish the payload to "calendars/#calendar_id/
show". One might say why not publishing to channel "calendars/
#calendar_id/show" directly. To me it seems important to decouple the
creation of the calendar event from the rest, so I can later publish
to multiple channels, e.g. besides publishing to "calendars/
#calendar_id/show" also publishing to "users/#user_id/activity_stream/
show" and so on.

I think it's worth moving some responsibilities around here. You're basically using Faye as an RPC transport, sending JavaScript to execute various side effects when an event is created. This means you need to add new event dispatches on the sender side every time you add a new piece of UI.

A better approach is to just treat channels as topics that related to changes in your data model. So, /calendars/:id is a good one -- any part of the UI can listen to this to find out when the calendar changes, and do its bit of work to update the UI. The parts of code you currently have that listen to /calendars/:id/show, /users/:id/activity_stream/show should all just be listening to data changes that affect them, not receiving pieces of JavaScript to execute.

So, rather than publishing JS to many channels, just publish data to one channel and have the UI components that care about these changes subscribe to them.

(By the way, unless your channels are authenticated using an extension, executing arbitrary JavaScript you received over a Faye channel is an XSS hole, which is another reason to avoid it.)

The reason why I asked (question 1) whether I can store data in the
session managed by Faye was to further improve the implementation by
only publishing data to clients who are actually affected by the
calendar change (the new event). For instance, if a user is viewing
week 35 and the new calendar event is week 36, it doesn't make sense
to send any data related to the new event to the user, because it
won't be displayed anyway. So, if any client would send some meta data
refering to its current view (simply start date and end date) and I
could store this data in the session, then I could filter new events
and only forward them if it makes sense.

This is an interesting use case, but you're kind of reinventing the wheel -- rather than implement your own routing, let Faye deal with it: you could have the client subscribe to /weeks/35, then unsubscribe when the view changes and subscribe to another channel. On the network, this will be about as much work as sending metadata to the server, and lets you delete a lot of routing code.

The reason why I asked whether I can query a certain channel (question
2) was related to the fact that I don't have to have to publish to a
certain channel / user namespaced channel, e.g. "1234:calendars/
#calendar_id/show", if user with id 1234 hasn't subscribed to this
channel...

The effect of publishing to a channel with no subscribers is that you make a very fast network call (especially fast if you have a socket connection to the server). I really wouldn't worry about this micro-optimisation unless you measure it becoming a problem.

So, what do you think? Am I overcomplicating things here?

A little. Like I say, try to treat channels as topics related to your data, rather than to UI side effects. If you find yourself naming a channel after an effect you want to happen on the receiver side, rather than a change on the sender side, you may have a design problem. And look for ways to segment the channels cleanly if you're worried about sending updates to clients that won't do anything with them.

Hope that helps. 

mst...@googlemail.com

unread,
Dec 20, 2011, 5:08:12 PM12/20/11
to Faye users
> A better approach is to just treat channels as topics that related to
> changes in your data model. So, /calendars/:id is a good one -- any part of
> the UI can listen to this to find out when the calendar changes, and do its
> bit of work to update the UI.

Ok, but this would require much more logic on the client side. Data
model changes would have to published as raw data, e.g. JSON, send to
the client and it would have to decide whether and how to process it.

I actually think this isn't really Rails like, e.g. take a simple
Model update send asynchronuously to the server. In the response,
wouldn't Rails just return JS responsible for updating particular
parts of the UI rather than sending just raw data and let the UI do
the rest?

> The parts of code you currently have that
> listen to /calendars/:id/show, /users/:id/activity_stream/show should all
> just be listening to data changes that affect them, not receiving pieces of
> JavaScript to execute.

Yes, but I don't see a way to avoid event dispatching on the server
side. Let's take the example of some users viewing the calendar (1)
and some other users viewing their personal profile showing an
activity stream (2).

It makes sense to me that (1) subscribe to channel "calendar/:id" but
what about (2)? Users can have zero or more calendars, so when they
view their profile and thus activity stream, they would have to
subscribe to many channels, in this scenario, one per calendar...

In the other approach, they would just have to subscribe to one
channel and what's going to be published on that channel can be nicely
configured in a central place on the server side.

Based on your experience, is event dispatching on the server side a no
go?

What you said regarding my two questions makes totally sense...

Thanks, appreciate your input...

James Coglan

unread,
Dec 20, 2011, 5:47:24 PM12/20/11
to faye-...@googlegroups.com
On 20 December 2011 22:08, mst...@googlemail.com <mst...@googlemail.com> wrote:
> A better approach is to just treat channels as topics that related to
> changes in your data model. So, /calendars/:id is a good one -- any part of
> the UI can listen to this to find out when the calendar changes, and do its
> bit of work to update the UI.

Ok, but this would require much more logic on the client side. Data
model changes would have to published as raw data, e.g. JSON, send to
the client and it would have to decide whether and how to process it.

This is actually how it's intended to be used. If you have many components that need to change because of changed data, it's more maintainable to have those components decide what to do than to have either the process changing the data or some other central controller do it. It helps decouple components from each other, which is especially important in user interfaces.

I actually think this isn't really Rails like, e.g. take a simple
Model update send asynchronuously to the server. In the response,
wouldn't Rails just return JS responsible for updating particular
parts of the UI rather than sending just raw data and let the UI do
the rest?

That is the 'Rails way', yes, and it's horrible. Having the server send executable code to another machine, which blindly accepts it and executes it is a security and maintenance nightmare. It probably sounds like I'm being a touch religious about this but it's honestly one of the very worst practises to come out of Rails and people need to stop using it.

An important part of keeping a complex modular UI maintainable is making sure components to not call each other but communicate indirectly by changing an abstract data model, with messaging mediating these changes. Sending executable code to another unrelated part of the system is a bad habit.

> The parts of code you currently have that
> listen to /calendars/:id/show, /users/:id/activity_stream/show should all
> just be listening to data changes that affect them, not receiving pieces of
> JavaScript to execute.

Yes, but I don't see a way to avoid event dispatching on the server
side. Let's take the example of some users viewing the calendar (1)
and some other users viewing their personal profile showing an
activity stream (2).

It makes sense to me that (1) subscribe to channel "calendar/:id" but
what about (2)? Users can have zero or more calendars, so when they
view their profile and thus activity stream, they would have to
subscribe to many channels, in this scenario, one per calendar...

In the other approach, they would just have to subscribe to one
channel and what's going to be published on that channel can be nicely
configured in a central place on the server side.

Based on your experience, is event dispatching on the server side a no
go?

It's not a no-go and you definitely need to strike a balance here. In this case it does make sense for the server to be involved -- presumably the set of calendars relevant to a user can change over time, so making sure each client keeps subscribed to the right channels could be tough. Then it makes sense for a server-side agent to listen to calendar updates, look up which users are interested in that calendar, and forward the message on to a user-specific channel.

The distinction between this and your earlier plan is subtle, but worth emphasizing: before, you were talking about addressing messages to specific UI components. Mapping calendar IDs to user IDs is still purely in the domain of the data model, and therefore introduces less coupling. You're saying, "if this calendar changes, then the data for these users should be considered to have changed".

I hope that makes some sense. Think in terms of expressing changes to data, and then hooking the UI onto those changes, and you'll end up with a more sensible design.

mst...@googlemail.com

unread,
Dec 20, 2011, 10:23:18 PM12/20/11
to Faye users
Cool... thanks for having taken the time to sort things out...

<That is the 'Rails way', yes, and it's horrible. Having the server
send
<executable code to another machine, which blindly accepts it and
executes
<it is a security and maintenance nightmare. It probably sounds like
I'm
<being a touch religious about this but it's honestly one of the very
worst
<practises to come out of Rails and people need to stop using it.

Based on my experience, putting JS together on the server side and
sending it to the browser just for execution
simplifies things and improves the overall maintainabiliy of the
application. One does not
has to care about managing an additional application at the client
side. Keeping all the
complex logic at the server side, using Rails sugar to put everything
together and
just do DOM manipulation at the client side speeds up development
time, at least for an initial prototype with a
moderate UI complexity.

Of course, applications are different and the above does not apply to
all of them. But since I use
Faye with Rails for my current application, I will stick with the
approach of sending JS
over the wire for now. The approach might change later if things get
more complex...


On Dec 20, 11:47 pm, James Coglan <jcog...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages