Ifaik when snapshot is used in Eventsourcing context it is much too app specific to allow any sort of middleware or database do it automatically. If you know of any such implementations I wouldn't mind taking a look at it. ☺️
--
You received this message because you are subscribed to the Google Groups "nats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to natsio+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Tim,Thank you for your interest on NATS Streaming.In term of limitations, NATS Streaming - assuming you are going to use file based store - needs file descriptors for each channel (subject)
since messages are stored to a file (more than one actually, but an active one), one for the subscriptions file on this channel. The number of channels will drive the number of needed file descriptors.You can configure the Streaming server to a given maximum number of channels/messages/subscriptions.As of now (but working on a change that should go in master soon), even the file based store keeps messages in memory, which would probably not be scalable for billions of messages. When the changes I am working on are merged, a small record per message will be kept in memory, and messages will be read from disk when needed to be delivered. Still, billions of messages probably mean quite a bit of memory.
To answer your questions, I first have to ask you few ;-)- How many channels (subjects) do you plan to use? Is it one per user? If so, how many users (therefore channels)? See limitations above.- Could you describe what a snapshot would look like in the server?
I was a little confused about what a "channel" is. I couldn't see a definition in the docs, and the terms channel and subject seem to be used interchangeably. I'm assuming they're the same and roughly equivalent to a "topic", i.e. something that clients can publish to and subscribe from. If so, we wouldn't have a lot of channels (maybe a few hundred) but they might contain a lot of messages.
We will have a lot of messages, maybe 100s of billions, maybe trillions. I think we need a solution where there's nothing in memory on a per message basis.
We need the ability to publish a message (that's the snapshot and is application defined) and somehow get a "bookmark" to that snapshot. The "bookmark" will allow us to later resume subscription from that bookmark without having to scan through all preceeding messages (inpracticle if there are many bilions), i.e. some capability to index into a position in the queue of messages.
Now, if we can get a sequence number of the message that we just published, then later ask to resume from that sequence number then maybe that will work. Is it possible to get the sequence number of a just published message?
Tim,I was a little confused about what a "channel" is. I couldn't see a definition in the docs, and the terms channel and subject seem to be used interchangeably. I'm assuming they're the same and roughly equivalent to a "topic", i.e. something that clients can publish to and subscribe from. If so, we wouldn't have a lot of channels (maybe a few hundred) but they might contain a lot of messages.Channels are subjects, but without wildcards (not supported in NATS Streaming). So yes, you can see them as topics clients can send to/receive from.Just so we are clear, you can set the maximum number of channels, and subscribers and messages per channel, so say that you set the number of messages to a very high number, those messages are never going to be removed. NATS Streaming uses a message log per channel, messages are appended and the old ones are only discarded when the limit of messages is reached.We will have a lot of messages, maybe 100s of billions, maybe trillions. I think we need a solution where there's nothing in memory on a per message basis.I have to say that I have doubts on that number of messages in a single NATS Streaming server. I am not sure how an implementation could have 0 memory per message. How the server would lookup a message (based on a sequence number) if it does not know where this message is located? Scanning the file to find it would obviously not scale ;-)
We need the ability to publish a message (that's the snapshot and is application defined) and somehow get a "bookmark" to that snapshot. The "bookmark" will allow us to later resume subscription from that bookmark without having to scan through all preceeding messages (inpracticle if there are many bilions), i.e. some capability to index into a position in the queue of messages.I am still confused about what that means.
Now, if we can get a sequence number of the message that we just published, then later ask to resume from that sequence number then maybe that will work. Is it possible to get the sequence number of a just published message?No, the published message does not have the sequence number when leaving the client. The sequence is assigned by the server. We could(?) have added the sequence in the Ack response from server to client.
Can't you let each aggregate handle it's own snapshotting? Every X events or at certain time intervals whichever comes first the aggregate can just choose to save it's current state and be done?
Maybe I am missing something?
--
But can't you use StartAtSequence?
That way you only have to track this number.
I am sure I am oversimplifying things but it seems like it could work?
To unsubscribe from this group and stop receiving emails from it, send an email to natsio+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "nats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/natsio/OetiXggcFHk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to natsio+unsubscribe@googlegroups.com.
Maybe not if the client maintains it's own state via consumption of its own events?
--
You received this message because you are subscribed to a topic in the Google Groups "nats" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/natsio/OetiXggcFHk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to natsio+un...@googlegroups.com.
>> But can't you use StartAtSequence?Well... this is what I suggested earlier in the thread, but you need to know the sequence number of when you published the snapshot and apparently that's not available.
It looks like snapshotting would have to be a DIY effort, but did you ever get the answer to whether NATS streaming could handle (store) many billions of events?
I've seen that there is a 2^32 limit on the number of subjects.
Also that there is a need for one file descriptor (fd) per subject.
Am I correct that the fd requirement is the number of subjects that are currently "in use" rather that the total number of subjects?
Where are the indices stored in the file based system? If in memory, that would seem to place a significant constraint on the number of events to be stored.
How much memory is required for storage of event indices?
KennethIt looks like snapshotting would have to be a DIY effort, but did you ever get the answer to whether NATS streaming could handle (store) many billions of events?There is no hard-coded limits, so it will depend on resources you have to your disposal.I've seen that there is a 2^32 limit on the number of subjects.Not sure where you got that limit. The subjects aka channels, are stored in a map[string].
Does NATS impose any limits on the # of subjects?
The maximum number of subjects is currently 2^32 (i.e. the max value of Go’s uint32 type). This may change in the future. The current implementation (which predates some native Go data structures) is a custom Hashmap. We will eventually move to native Go data structures as we test and verify relative performance.
Also that there is a need for one file descriptor (fd) per subject.
For file based store, there is actually now minimum 2 (one for data file, one for index file). If you have subscribers on that channel, there will be 1 more for the subscribers file.Am I correct that the fd requirement is the number of subjects that are currently "in use" rather that the total number of subjects?No. The file store always maintain the last file slice (where messages are appended) of each channel opened, which again means 2 FDs (one for data, one for index). There may be need for more if a lookup occurs for a message that is not in the current file slice. Those, however, are closed after a short period of time.Where are the indices stored in the file based system? If in memory, that would seem to place a significant constraint on the number of events to be stored.Correct. The file store maintains a map[uint64]*msgRecord. The key is the message sequence number. The value is a record that contains the offset on file, the timestamp and size of message. This is "needed" because the server may possibly need the timestamp when a subscriber start with a StartAt() with time/time delta. The message size is also recorded so that the file store can keep track of file slice overall size to satisfy file slice limits (and overall channel limits).How much memory is required for storage of event indices?The msgRecord is 20 bytes, the key being 8 bytes, it means minimum 28 bytes per message.As we always said, the current basic file store implementation has many limits (FDs, memory, etc...) that may not be suitable for some users. We welcome other store implementations (we have a store interface to facilitate new implementations). The current implementation, though, offer pretty good performance for users that do not need to store that many messages and don't have that many channels.Hope this helps!Ivan.
--
You received this message because you are subscribed to the Google Groups "nats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to natsio+unsubscribe@googlegroups.com.
Does NATS impose any limits on the # of subjects?
The maximum number of subjects is currently 2^32 (i.e. the max value of Go’s uint32 type). This may change in the future. The current implementation (which predates some native Go data structures) is a custom Hashmap. We will eventually move to native Go data structures as we test and verify relative performance.