Regarding the subscription behavior of the automatic channel deletion, the old subscribers won't get the new messages when a channel is deleted and later recreated because the new channel has a different subject prefix from the one used by the old subscribers.
In other words, the old subscribers have to be restarted to get the new prefix so that they subscribe to the newly created channel.
I think this behavior isn't wrong in principle. And one of the use cases for this automatic deletion is a scenario where some publisher and subscriber components are dynamically started and later shutdown. In this case, the subscribers are managed for the duration of the scenario's activity and there are no stale subscribers.
If you think this approach is worth considering, I am happy to share my code.
Hi Aki,Regarding the subscription behavior of the automatic channel deletion, the old subscribers won't get the new messages when a channel is deleted and later recreated because the new channel has a different subject prefix from the one used by the old subscribers.Either I missed something (very possible), or this is very specific to your use case. That cannot be generalized. Say message is published in channel "foo" which has a message age limit. A subscriber starts on "foo" and gets the message. The message finally expires and with your change the channel store is deleted. Some time later, another (or same) publisher publishes on "foo". Sever will create this store channel but has still a live subscriber on that channel. Not sure if this is good.You also have the edge scenario where say that message is sent on "foo" and there is a 1 minute age limit (just for demonstration). A subscriber starts at 59 second, get that message, but processing takes a bit of time and so will acknowledge in 2 seconds. The store expires the message and deletes the channel store. Subscriber acknowledges the message: oops.
In other words, the old subscribers have to be restarted to get the new prefix so that they subscribe to the newly created channel.Again, that may be very specific to your use case but is not the general use case.
I think this behavior isn't wrong in principle. And one of the use cases for this automatic deletion is a scenario where some publisher and subscriber components are dynamically started and later shutdown. In this case, the subscribers are managed for the duration of the scenario's activity and there are no stale subscribers.So would that automatic channel deletion on last expired message be for all channels or based on some configuration? If for all, again, not sure that is acceptable for all users.
If you think this approach is worth considering, I am happy to share my code.Sure, if you have a fork/branch that has this change I would be happy to look at it.
Hi Aki,Sorry for the delay. I had a look and here are some of my comments.
- It is good that you introduce an option to: disable, only if no subscriber or force. This gives more flexibility. Unfortunately, having the process of deleting the channel done at the store level, again, introduces risk in the server. There is for instance a risk of "panic" if at the store level a channel is deleted while the server was trying to flush the store. We can't take that risk.
- I don't think that this option should be defined in store channel limit. To me, it is an option, not a limit on a channel. I understand the convenience of it, but does not seem right.
- With the same idea, I don't think that the DeleteChannel's Store API should have the "force" boolean. Based on the option, the caller should know if it can or not call DeleteChannel.
- DeleteChannel is a store API and yet only called internally. That reinforces the idea that this API should be called by the server, not the store implementation.
Again, I think we need to find a way for the channel to be deleted safely by the server. We may still need the option to delete a channel regardless of the number of subscribers, but it should be done by the server in places where it knows that no concurrent access to a store can happen or have an inconsistent state.
I hope you understand that we want this feature to be introduced in the mainline, we cannot assume that producers/consumers will always be stopped before the last message expire, etc.. we can't afford to have the server panic due to such feature.
I am thinking that server could have a timer that periodically checks the store's MsgState() for a given channel and take the decision to delete the channel (say if there is no subscriber). Still this timer would then need to be scheduled to the main protocol processing to ensure that if the action is not happening at the same time that server is trying to flush a given store. I am making some changes to the server that may facilitate that (https://github.com/nats-io/nats-streaming-server/issues/195).
Thanks!Ivan.
OK. I understand that we need to make sure that there will be no panic resulting from some concurrent channel deletion and data writing. But this could be handled in the store level and not in the server level, as the server always accesses the channel over its store interface anyway, no?
- I don't think that this option should be defined in store channel limit. To me, it is an option, not a limit on a channel. I understand the convenience of it, but does not seem right.Yes. I saw this semantic mismatch but for this prototype I took took the advantage of using this convenient approach. We could move the property to the appropriate group.
The reason for adding this force option was that this DeleteChannel method may be in the future invoked by some administrative service to support both cases conveniently, I passed this option to the method. Are you saying you would prefer to make this method always delete the channel and the caller should determine whether there are subscribers and decide to invoke this delete method (i.e., equivalent to force being true)or not (equivalent to force being false)?
As mentioned above, I was hoping to get both CreateChannel and DeleteChannel etc to be accessible from some admin service. As symmetric to CreateChannel to the store API, I thought DeleteChannel would fit in there.
I have to look into this stuff. I thought we could use the lock within the store to guard against such cases but I am probably missing something.
I also think using a specific timer for deleting a channel is preferable than using the message expiration timer which cannot handle the case where there is no message ever stored. When using another timer, we need to ensure its action doesn't interfere with the server's store flush operation and also the store's message expiration action.I'll spend some time looking into these points.
Hi Aki.OK. I understand that we need to make sure that there will be no panic resulting from some concurrent channel deletion and data writing. But this could be handled in the store level and not in the server level, as the server always accesses the channel over its store interface anyway, no?It does, but it is a matter of ordering of actions. Say the store decides that it can delete a channel because all messages have expired. You start the go routine to delete the channel (at least the way you are doing it in your branch). In the meantime, messages arrive in that channel about to be deleted. The server invokes the store interface to store those messages. The go routine doing the channel delete then execute, and then, the server call Store.Msgs.Flush() => panic.
- I don't think that this option should be defined in store channel limit. To me, it is an option, not a limit on a channel. I understand the convenience of it, but does not seem right.Yes. I saw this semantic mismatch but for this prototype I took took the advantage of using this convenient approach. We could move the property to the appropriate group.Good.
The reason for adding this force option was that this DeleteChannel method may be in the future invoked by some administrative service to support both cases conveniently, I passed this option to the method. Are you saying you would prefer to make this method always delete the channel and the caller should determine whether there are subscribers and decide to invoke this delete method (i.e., equivalent to force being true)or not (equivalent to force being false)?Yes. I would think that DeleteChannel() should not have the force flag. The caller should decide if it is safe to call this or not.
As mentioned above, I was hoping to get both CreateChannel and DeleteChannel etc to be accessible from some admin service. As symmetric to CreateChannel to the store API, I thought DeleteChannel would fit in there.I am not arguing that DeleteChannel should be in the store interface. I think it should, but to me it makes a stronger case that this should be called by the server itself, not internally by the store.
I have to look into this stuff. I thought we could use the lock within the store to guard against such cases but I am probably missing something.Store has its own lock, but as I have described above, this is not a problem of concurrent access per se, but a problem of order between events. In your approach, by the time the store decides to delete the channel, messages and or subscribers may be added to that channel. I understand why you did start the delete through a go routine (locking and such), but this is leaving room for races.
I also think using a specific timer for deleting a channel is preferable than using the message expiration timer which cannot handle the case where there is no message ever stored. When using another timer, we need to ensure its action doesn't interfere with the server's store flush operation and also the store's message expiration action.I'll spend some time looking into these points.When https://github.com/nats-io/nats-streaming-server/pull/196 is merged, it will be possible to send to the ioChannel control messages that will ensure that things are processed in order. The server could have a timer that simply put a control message in the channel and that control message could then be processed in the ioLoop.You are welcome to experiment, and I may also get to this at one point. So many things to do ;-)
Thanks,Ivan.
I used a goroutine because because the expireMsgs already owns the MsgStore lock but there is no unlocked version of MsgStore.Close. If we create an unlocked version of Close, the delete operation can be synchronously executed from that spot.
When using this approach, we either block the processing of IOLoop during the entire delete processing or mark the channel as being deleted and dispatch the delete operation and reject any store calls going into the channel.
The former approach will likely cost some performance. If we go for the latter approach, we could have an equivalent behavior without introducing a control message by setting some special status to the channel to let the IOLoop know the channel is being deleted and hence reject any store requests to the channel.