Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
- Allowing int and guid as ids for the client.
- Remove attachments [already planned?].
- Remove identities or make them a "first class citizen" [documents?]. They are hard to manage over time.
- Invoking concurrency checks with session.Store is not really clear. [maybe just docs / intellisense issue]
- Invoking DatabasCommands through session.Advanced should default to the db of the session. Fun bugs happened there.
- Drop sql replication and instead provide a gist demonstrating how to do it with Data Subscriptions.
- Consider dropping the original patch commands and stick with js patches.
- Consider some consolidation between map/reduce and dynamic aggregation. Perhaps Data Subscriptions would be better for some cases.
- Using a transformer within a transformer is awkward.- Trying to session.Load/Query to "a different model" is awkward. Sometimes I just want to coerce the data into a different model and ignore the class in metadata. It works surprisingly well if the metadata model is not in the project.- Where you can and can't assert generic parameters in session.Include/Query is confusing.- The order of generic parameters is confusing between Queries and Transformers. I always expect Transformers parameters to be swapped.I'd like some consolidation between Stream, Data Subscriptions, Changes, LoadStartingWith, increase Take limits + Take(int.MaxValue), and deep paging, The all kinda sorta cover the same ground, but with quirks that make each of them difficult to work with deterministically.
Right now, replication is fraught with "bad choices" for failure cases. Hopefully raft [3.5?] alleviates that issue.
1) What do you mean, hard to index nested objects?
2) Can you be more specific?
3) What is missing?
inlineHibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
On Tue, Feb 9, 2016 at 6:01 PM, Kijana Woodard <kijana....@gmail.com> wrote:- Allowing int and guid as ids for the client.Yes, that is something that we want to remove.- Remove attachments [already planned?].Done
- Remove identities or make them a "first class citizen" [documents?]. They are hard to manage over time.What do you mean? You want to store this as a document like hilo?
- Invoking concurrency checks with session.Store is not really clear. [maybe just docs / intellisense issue]What do you mean?
- Invoking DatabasCommands through session.Advanced should default to the db of the session. Fun bugs happened there.Yes, that needs to be fixed.- Drop sql replication and instead provide a gist demonstrating how to do it with Data Subscriptions.Why?That means that every user will have to write their own replication system, which is decidedly non trivial, even if subscriptions handles a lot of it.
- Consider dropping the original patch commands and stick with js patches.Planned.- Consider some consolidation between map/reduce and dynamic aggregation. Perhaps Data Subscriptions would be better for some cases.Not following.
- Using a transformer within a transformer is awkward.- Trying to session.Load/Query to "a different model" is awkward. Sometimes I just want to coerce the data into a different model and ignore the class in metadata. It works surprisingly well if the metadata model is not in the project.- Where you can and can't assert generic parameters in session.Include/Query is confusing.- The order of generic parameters is confusing between Queries and Transformers. I always expect Transformers parameters to be swapped.
I'd like some consolidation between Stream, Data Subscriptions, Changes, LoadStartingWith, increase Take limits + Take(int.MaxValue), and deep paging, The all kinda sorta cover the same ground, but with quirks that make each of them difficult to work with deterministically.Can you expand on that?
Right now, replication is fraught with "bad choices" for failure cases. Hopefully raft [3.5?] alleviates that issue.
What do you mean?Minor nit - it's be nice to be able to pass null to session.Load and get null back instead of throwing an exception. I usually end up having to "check null twice" in some way.
What do you mean? You want to store this as a document like hilo?Yeah. I've settled on creating documents manually that hold identities. Makes them easier to manage, export/import, exclude, patch, etc.
- Drop sql replication and instead provide a gist demonstrating how to do it with Data Subscriptions.Why?That means that every user will have to write their own replication system, which is decidedly non trivial, even if subscriptions handles a lot of it.When there are issues, and there are many on the mailing list, you have to wait for a new release and for that to become stable, or patch it yourself. Given reliable data subscriptions, maintaining the sql statements in c# seems preferable to shipping js. Is there "enough left over" to justify a supported feature or is there "so much variety" of what people want to do, it's better to simply demonstrate a few ways via gist / sample app and let people glue the data subscription to the external store of choice. For instance, write to elastic search or to files or whatever.

- Consider some consolidation between map/reduce and dynamic aggregation. Perhaps Data Subscriptions would be better for some cases.Not following.It's not clear today how to achieve multi-level reduce. Advice is to do one level of reduce and then dynamic aggregation. That's more art than science. If multi reduce is not [easily] achievable, then maybe data subscriptions provides a clearer path forward. Fwiw, having reduce results as documents might be interesting here. The documents are mapped and possibly reduced [re-reduced]. Almost sounds like SIR at that point [see new entry below].
- Using a transformer within a transformer is awkward.- Trying to session.Load/Query to "a different model" is awkward. Sometimes I just want to coerce the data into a different model and ignore the class in metadata. It works surprisingly well if the metadata model is not in the project.- Where you can and can't assert generic parameters in session.Include/Query is confusing.- The order of generic parameters is confusing between Queries and Transformers. I always expect Transformers parameters to be swapped.Do all of these make sense?
I'd like some consolidation between Stream, Data Subscriptions, Changes, LoadStartingWith, increase Take limits + Take(int.MaxValue), and deep paging, The all kinda sorta cover the same ground, but with quirks that make each of them difficult to work with deterministically.Can you expand on that?Ultimately, all of these options represent ways to bypass the "safe by default" 128 - 1024 Load/Query restriction. Fwiw, I'm happy with that restriction.But when you decide you need to "go through all the results", there are several options that can accomplish that. They all have trade offs that appear to be dead ends. I'm coming to believe that Data Subscriptions is the best bet except for a few caveats:
- You can't use your own id which means you have to save the subscription id to a document which raises it's own failure scenario issues.
- They don't work well in a fail over scenario between primary and secondary. I think raft will address this problem.
I use Stream fairly regularly. Invariably, I end up bringing the stream into memory [ToList, et al] so as not to get caught in the "reading too slow" trap. That's especially annoying when the bottleneck is writing back to the db. Of course, bringing the stream into memory defeats the purpose of "streaming" in the first place. Then you have to be careful about not loading "too much". Both the "reading too slow" timeout and "read all" workaround introduce non-deterministic errors into the system. User activity around document creation can create these issues long after the software is released.
I should add here that Scripted Patch is sometimes an alternative. That's equally hard to reason about.- If multiple patches are started, will the operations interleave?
- Can a patch timeout on the server?
- If so, is the work that's been done in a transaction?
- If not, how would one determine where to "restart"...assuming one even knew that failure occurred?
Right now it's kind of "fire and pray". Stream and Data Subscriptions at least give you reliable semantics around partial success.By consolidation, I wonder if Stream could be replaced by a non-persistent Data Subscription. Do we even need LoadStartingWith considering the overloads for Stream/Data Subscriptions?
Changes is fine as Data Subscription support, but does it still have sufficient value as a front line api element? Deep paging [Skip(12300).Take(100)] doesn't work very well [slow, iirc] and is kind of a silly UI concept. Take(int.MaxValue) has the same problem [memory pressure] as Stream().ToList().
Right now, replication is fraught with "bad choices" for failure cases. Hopefully raft [3.5?] alleviates that issue.When the Primary goes down, how can your code reason about the value of the Secondary? You can allow writes, and then deal with conflict resolution. You can allow reads, but risk bewildered users who "swear I just changed that". Even reads during normal operations risk interleaved results if you happen to bounce between nodes [two web servers pinned to opposite dbs]. I think raft will address those issues because the write is either on the majority or not. The next piece would be validating that a read is from a server "at least as up to date" as the end client.
What do you mean?Minor nit - it's be nice to be able to pass null to session.Load and get null back instead of throwing an exception. I usually end up having to "check null twice" in some way.Any thoughts on this one?
Another candidate for removal: is SIR pulling it's weight as a feature? I like the idea, but again, it's possibly supplanted by data subscriptions and a tiny amount of custom c#.
What do you mean? You want to store this as a document like hilo?Yeah. I've settled on creating documents manually that hold identities. Makes them easier to manage, export/import, exclude, patch, etc.Is this still the case since smuggler work with identities? And you can modify them in the studio?There are good reasons why we want to keep them out of documents. To start with, they aren't, really.
- Drop sql replication and instead provide a gist demonstrating how to do it with Data Subscriptions.Why?That means that every user will have to write their own replication system, which is decidedly non trivial, even if subscriptions handles a lot of it.When there are issues, and there are many on the mailing list, you have to wait for a new release and for that to become stable, or patch it yourself. Given reliable data subscriptions, maintaining the sql statements in c# seems preferable to shipping js. Is there "enough left over" to justify a supported feature or is there "so much variety" of what people want to do, it's better to simply demonstrate a few ways via gist / sample app and let people glue the data subscription to the external store of choice. For instance, write to elastic search or to files or whatever.Absolutely disagreeing with you here. For several reasons.One, the SQL Replication is a major feature in the way people consider RavenDB.Second, saying "here is how you can do that" because we have reliable subscriptions seems very much like this mindset:Third, I don't think we are seeing very many SQL Replication issues in the past year or so. We had a lot with index replication, but SQL Replication has been pretty great & stable.
- Consider some consolidation between map/reduce and dynamic aggregation. Perhaps Data Subscriptions would be better for some cases.Not following.It's not clear today how to achieve multi-level reduce. Advice is to do one level of reduce and then dynamic aggregation. That's more art than science. If multi reduce is not [easily] achievable, then maybe data subscriptions provides a clearer path forward. Fwiw, having reduce results as documents might be interesting here. The documents are mapped and possibly reduced [re-reduced]. Almost sounds like SIR at that point [see new entry below].We already have that in SIR, no?
- Using a transformer within a transformer is awkward.- Trying to session.Load/Query to "a different model" is awkward. Sometimes I just want to coerce the data into a different model and ignore the class in metadata. It works surprisingly well if the metadata model is not in the project.- Where you can and can't assert generic parameters in session.Include/Query is confusing.- The order of generic parameters is confusing between Queries and Transformers. I always expect Transformers parameters to be swapped.Do all of these make sense?Yes. That is an API issue. I wish we had better overall way to handle such complex things in the first place, to be honest.I'd like some consolidation between Stream, Data Subscriptions, Changes, LoadStartingWith, increase Take limits + Take(int.MaxValue), and deep paging, The all kinda sorta cover the same ground, but with quirks that make each of them difficult to work with deterministically.Can you expand on that?Ultimately, all of these options represent ways to bypass the "safe by default" 128 - 1024 Load/Query restriction. Fwiw, I'm happy with that restriction.But when you decide you need to "go through all the results", there are several options that can accomplish that. They all have trade offs that appear to be dead ends. I'm coming to believe that Data Subscriptions is the best bet except for a few caveats:Out of the items you listed, only subscriptions and streams will actually give you the full data set.
- You can't use your own id which means you have to save the subscription id to a document which raises it's own failure scenario issues.This is by design, otherwise you run into a lot of edgecases with two clients reading from the same subscription, or stealing it off one another.
- They don't work well in a fail over scenario between primary and secondary. I think raft will address this problem.Probably not, actually. As it currently stand, 4.0 is going to have separate etags for each server.The current plan is to use raft to coordinate the cluster, move the replication to a more gossip like protocol to support higher number of interconnected nodes in a cluster.I would like to discuss a good solution for this, but it is actually a really hard problem, and probably deserve a separate thread.I use Stream fairly regularly. Invariably, I end up bringing the stream into memory [ToList, et al] so as not to get caught in the "reading too slow" trap. That's especially annoying when the bottleneck is writing back to the db. Of course, bringing the stream into memory defeats the purpose of "streaming" in the first place. Then you have to be careful about not loading "too much". Both the "reading too slow" timeout and "read all" workaround introduce non-deterministic errors into the system. User activity around document creation can create these issues long after the software is released.We fixed a bunch of issues around that in bulk insert + streaming. In particular, data subscriptions are no longer susceptible to "processing time takes too long" issues.
I should add here that Scripted Patch is sometimes an alternative. That's equally hard to reason about.- If multiple patches are started, will the operations interleave?Yes- Can a patch timeout on the server?No- If so, is the work that's been done in a transaction?Patches run in multiple separate transaction batches.- If not, how would one determine where to "restart"...assuming one even knew that failure occurred?You can look at the operation stats you get back.And you can't restart, you've to run it again,
Right now it's kind of "fire and pray". Stream and Data Subscriptions at least give you reliable semantics around partial success.By consolidation, I wonder if Stream could be replaced by a non-persistent Data Subscription. Do we even need LoadStartingWith considering the overloads for Stream/Data Subscriptions?LoadStartingWith is actually quite common in scenarios such as give me this month's transaction: prefix: "accounts/1234/txs/2016-01"Subscriptions are for getting all the documents matching a particular (relatively simple) criteria.This is an ongoing effort, which under write load may never end.Streams are take the full result of a query, and get it. This can be a map/reduce output, all documents matching a complex query, etc."Stream me all the credit card made within 7 miles radius of the bank robbery" is not somtehing that you can do in a subscription.
Changes is fine as Data Subscription support, but does it still have sufficient value as a front line api element? Deep paging [Skip(12300).Take(100)] doesn't work very well [slow, iirc] and is kind of a silly UI concept. Take(int.MaxValue) has the same problem [memory pressure] as Stream().ToList().I don't understand the last statement.Changes() are a way to get notifications from the database about what is happening right now.
Right now, replication is fraught with "bad choices" for failure cases. Hopefully raft [3.5?] alleviates that issue.When the Primary goes down, how can your code reason about the value of the Secondary? You can allow writes, and then deal with conflict resolution. You can allow reads, but risk bewildered users who "swear I just changed that". Even reads during normal operations risk interleaved results if you happen to bounce between nodes [two web servers pinned to opposite dbs]. I think raft will address those issues because the write is either on the majority or not. The next piece would be validating that a read is from a server "at least as up to date" as the end client.That isn't what we're doing. We are using Raft to select the leader, and that is the server that will accept writes. There is still a small chance that the leader being depose will accept a write, but that would be replicated to its siblings.The leader can move between nodes on the fly.What do you mean?Minor nit - it's be nice to be able to pass null to session.Load and get null back instead of throwing an exception. I usually end up having to "check null twice" in some way.Any thoughts on this one?Yes.Another candidate for removal: is SIR pulling it's weight as a feature? I like the idea, but again, it's possibly supplanted by data subscriptions and a tiny amount of custom c#.I think that it is too complex to be really useful in many scenarios.We might just replace it with an option to say "persist this map/reduce index results as documents".
It'd be nice to have an example of "doing sql replication with data subscriptions" to stave off feature requests for edge cases or if someone wanted to write to another kind of store.
Out of the items you listed, only subscriptions and streams will actually give you the full data set.IIRC, LoadStartingWith still gives everything. But you could also write a loop and page. That's probably not a great idea given stream and data subscriptions.
- You can't use your own id which means you have to save the subscription id to a document which raises it's own failure scenario issues.This is by design, otherwise you run into a lot of edgecases with two clients reading from the same subscription, or stealing it off one another.That pushes the problem to the user. I need to experiment more here, but it seems like one would have to try to save a document with a given name, then create the subscription, then save the subscription id in the document....and address failures that can happen along the way. For instance, I failed to save the subscription id, how do I clean up the stranded subscription? We can query the list of subscriptions, but how can an arbitrary piece of code know which one is "no good"?
And you can't restart, you've to run it again,Right. With patches, developers need to make them idempotent and make them resilient to changes by other code doing concurrent patches. I don't know if people think about it at that level. I think there's a [incorrect] perception that these are going to run serially.
"Stream me all the credit card made within 7 miles radius of the bank robbery" is not somtehing that you can do in a subscription.
Changes is fine as Data Subscription support, but does it still have sufficient value as a front line api element? Deep paging [Skip(12300).Take(100)] doesn't work very well [slow, iirc] and is kind of a silly UI concept. Take(int.MaxValue) has the same problem [memory pressure] as Stream().ToList().I don't understand the last statement.Changes() are a way to get notifications from the database about what is happening right now.
I think that it is too complex to be really useful in many scenarios.We might just replace it with an option to say "persist this map/reduce index results as documents".
inlineIt'd be nice to have an example of "doing sql replication with data subscriptions" to stave off feature requests for edge cases or if someone wanted to write to another kind of store.What do you need more than getting the json and outputting the SQL?
Out of the items you listed, only subscriptions and streams will actually give you the full data set.IIRC, LoadStartingWith still gives everything. But you could also write a loop and page. That's probably not a great idea given stream and data subscriptions.That isn't correct. It is paged like everything else.
- You can't use your own id which means you have to save the subscription id to a document which raises it's own failure scenario issues.This is by design, otherwise you run into a lot of edgecases with two clients reading from the same subscription, or stealing it off one another.That pushes the problem to the user. I need to experiment more here, but it seems like one would have to try to save a document with a given name, then create the subscription, then save the subscription id in the document....and address failures that can happen along the way. For instance, I failed to save the subscription id, how do I clean up the stranded subscription? We can query the list of subscriptions, but how can an arbitrary piece of code know which one is "no good"?You can query the subscriptions, yes. An a subscription that isn't opened doesn't actually use any resources whatsoever.So "leaking" a subscription has no cost.Note that the whole idea is that you'll use subscriptions for very long running tasks, such as getting documents from a database as they change over months and years.And you can't restart, you've to run it again,Right. With patches, developers need to make them idempotent and make them resilient to changes by other code doing concurrent patches. I don't know if people think about it at that level. I think there's a [incorrect] perception that these are going to run serially.They _are_ going to run seirally. In the sense that each patch is applied independently.It is just that if you have multiple UpdateByIndex operations running, they will run concurrent and can interleave (on different documents).
"Stream me all the credit card made within 7 miles radius of the bank robbery" is not somtehing that you can do in a subscription.Stream has a StartsWith parameter. Why not drop LoadStartingWith? We just reported an issue with it that was fixed 30037. Not saying it's not useful [I happen to really like it], but does it pull it's weight when there's another way to do the same thing and more [api surface consolidation]?LoadStartingWith is not a streaming / unlimited API.It load the documents to the session identity map, and they get change tracking.Changes is fine as Data Subscription support, but does it still have sufficient value as a front line api element? Deep paging [Skip(12300).Take(100)] doesn't work very well [slow, iirc] and is kind of a silly UI concept. Take(int.MaxValue) has the same problem [memory pressure] as Stream().ToList().I don't understand the last statement.Changes() are a way to get notifications from the database about what is happening right now.Right. I'm thinking of Changes as "non-persistent Data Subscriptions". Could "do not persist" be a subscription option?What is the scenario that you are trying to enable?
I think that it is too complex to be really useful in many scenarios.We might just replace it with an option to say "persist this map/reduce index results as documents".I think that would be nice. It seems that with "no extra work", one could write a map/reduce on those persisted result documents and now you have unlimited reduce [endless loops aside]. Rollup by city/state/country/continent is straight forward to implement and you don't have to worry about "too much data" clogging up dynamic aggregation.One additional one: I think Sharding is poorly understood and/or under-utilized. I think it should be more popular than it is.Sharding probably requires us to do it completely on the server side with dynamic scale up & down.That isn't a simple problem, and we aren't going to address it in 4.0 in any big way right now.
--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.
What do you need more than getting the json and outputting the SQL?That simplicity why I suggested killing the feature in the first place. But as I said, I concede it's useful for marketing, convincing other stakeholders, etc. I think it'd be a nice article that would get people comfortable with raven in general and Data Subscriptions in particular.
Out of the items you listed, only subscriptions and streams will actually give you the full data set.IIRC, LoadStartingWith still gives everything. But you could also write a loop and page. That's probably not a great idea given stream and data subscriptions.That isn't correct. It is paged like everything else.Ok. Haven't tried it in a while with something that would break a page barrier, but iirc, you can just say pageSize: 5000 [or whatever]. I don't think it's limited to 1024.
What is the scenario that you are trying to enable?
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
On 10.02.2016 07:36, Oren Eini (Ayende Rahien) wrote:
> We are dropping the distinction between embedded and standard code.
> They'll both use the same exact mechanisms.
What exactly does that mean? Will embedded then even need to connect to
localhost?
> What do you mean, non session related?
Things like session.Advanced.Stream<>, session.Advanced.DocumentStore.*.
Both allow you to get entities not tracked by the session, so they are not
exactly related to the session.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.
What do you need more than getting the json and outputting the SQL?That simplicity why I suggested killing the feature in the first place. But as I said, I concede it's useful for marketing, convincing other stakeholders, etc. I think it'd be a nice article that would get people comfortable with raven in general and Data Subscriptions in particular.Consider what this means from an ops perspective.With SQL Replication, you just deploy ravendb, and your ops team can manage replication, change it,modify it, track it, monitor it, the works.With Subscriptions, you have to do all of that yourself, deploy additional endpoint, and any changes have to go through a dev cycle.
Out of the items you listed, only subscriptions and streams will actually give you the full data set.IIRC, LoadStartingWith still gives everything. But you could also write a loop and page. That's probably not a great idea given stream and data subscriptions.That isn't correct. It is paged like everything else.Ok. Haven't tried it in a while with something that would break a page barrier, but iirc, you can just say pageSize: 5000 [or whatever]. I don't think it's limited to 1024.Yes, it is limited to 1024.What is the scenario that you are trying to enable?None in particular. I'm trying to reduce the API surface area to make it more understandable.That is why we are having this discussion, yes. I feel there is some craft in the API and I want to take the time in a point release to clear it.
That isIt seems there are several api methods that differ only in subtle ways. That leads to confusion about what is the right choice for a given scenario. Another approach could be making those scenarios explicit options of one api.For changes & subscriptions, I'm not really sure that those are subtle differences.
Even the above about LoadStartingWith putting docs in the session identity map, it's straight forward to put Stream docs into session if needed.Not really, no. You _can_ do that, but you wouldn't do that in any of the common scenarios involving streaming and large objects.The streaming API is awkward _intentionally_ to consume in memory, remember. Very different usages.
Another difference is LoadStartingWith is ACID whereas the the Stream startsWith parameter is not. Subtle. Confusing.Stream is ACID. No change.
Sharding probably requires us to do it completely on the server side with dynamic scale up & down.That isn't a simple problem, and we aren't going to address it in 4.0 in any big way right now.Makes sense. I think there are "immediately achievable use cases" that aren't as popular as they should be. One example from recent forum activity would be Orders by Month. Each month gets a separate db and data growth is contained.But that is really trivial to do in RavenDB..ShardOn<Order>(x=>x.OrderDate.Year +"-" + x.OrderDate.Month);
Stream is ACID. No change.Even more reason to remove LoadStartingWith. Stream(startWith: 'foo/').ToList();
I'm not sure why you consider this so different.
Assume I have an enumerable from Stream and I want to limit for safety: Stream(startWith: 'foo/').Take(1024).ToList().
Given Stream startsWith is ACID, I honestly can't think of a reason why I would continue using LoadStartingWith. In every case I've used it, I know there is a well-defined and bounded set of matches, otherwise I'd query or stream anyway.
Stream is ACID. No change.Even more reason to remove LoadStartingWith. Stream(startWith: 'foo/').ToList();* OutOfMemoryException* Request takes 30 seconds to complete.I'm not sure why you consider this so different.Because they have very different semantics by definition.The streaming is supposed to be just that, you are doing something with each item as they come by, you don't hold on to the potentially very large set.Assume I have an enumerable from Stream and I want to limit for safety: Stream(startWith: 'foo/').Take(1024).ToList().Safe by default, not unsafe by default.
Given Stream startsWith is ACID, I honestly can't think of a reason why I would continue using LoadStartingWith. In every case I've used it, I know there is a well-defined and bounded set of matches, otherwise I'd query or stream anyway.You aren't the only use of RavenDB, however. And we do see people who need the API to guide them toward the appropriate solution.
1. Load gets a document(s) by id and is ACID.2. LoadStartingWith gets documents by id prefix and is ACID.Has limits3. Query, properly, has defaults and limits and is BASE.4. Stream enhances query to "yup I'm going to go through all of them, I know what I'm doing" and is BASE.Streams are identical to load if you are asking by prefix or by etag.If you stream a query, same as query.
4b. Stream might also be ACID if you use startsWith.5. Changes is "what's happening now".6. Data Subscriptions is a persisted Changes/Stream for "what happened since I last checked and, once I catch up, what's happening now".CorrectI'd propose the "public api" collapse to 3 or 4 with Load and Query being obvious "winners".
Query is safe by default. Stream is a semantic shift to loosen those restrictions. Stream is *still* safe by default in that you're getting a page at a time in memory unless you write code to get around it [e.g. list.Add(enumerator.Current.Document).
Here's the problem with "Safe by default", it's only safe from one point of view: the health of the db and the server. Assume a developer thinks LoadStartingWith("foo/") should always and forever return <128 results. Data growth unexpectedly leads to 129+ results. From a user's perspective, the app is broken. From the programmers perspective, it's awkward [at best] to detect this situation and deal with it using LoadStartingWith.
Various people have fought against the safe by default paradigm for Query and I disagree with that. I think Stream is enough of a "speed bump" to alert you that you're in more dangerous territory. The fact that you pass a Query to a Stream is brilliant in this regard.
For Query, it's clear that you should setup paging or filtering to explore the data. If you're in a situation where you "want everything", Stream gives you that. If LoadStartingWith doesn't give you everything and has a sub-optimal paging mechanism, it's redundant.
Fwiw, it's awkward that you can't pass etag *and* prefix to Stream. I realize that adding that overload converges with Data Subscriptions...hence these suggestions. Further, looking at SusbscriptionOptions and options for Changes, I can see someone wanting to Stream over those as well - non-persistent Subscription.
Query is safe by default. Stream is a semantic shift to loosen those restrictions. Stream is *still* safe by default in that you're getting a page at a time in memory unless you write code to get around it [e.g. list.Add(enumerator.Current.Document).Stream gets you all the data. And they certainly do that at a page at a time.Here's the problem with "Safe by default", it's only safe from one point of view: the health of the db and the server. Assume a developer thinks LoadStartingWith("foo/") should always and forever return <128 results. Data growth unexpectedly leads to 129+ results. From a user's perspective, the app is broken. From the programmers perspective, it's awkward [at best] to detect this situation and deal with it using LoadStartingWith.Better to have an explicit thing like that then bringing the entire system down.,
Various people have fought against the safe by default paradigm for Query and I disagree with that. I think Stream is enough of a "speed bump" to alert you that you're in more dangerous territory. The fact that you pass a Query to a Stream is brilliant in this regard.Yes, that is entirely the point. You can do that to bypass that limitation explicitly. Noting that when you do that you have to take the onus of protecting yourself from those details.That is why the API is the way it is.
For Query, it's clear that you should setup paging or filtering to explore the data. If you're in a situation where you "want everything", Stream gives you that. If LoadStartingWith doesn't give you everything and has a sub-optimal paging mechanism, it's redundant.Except that it isn't. We use it quite often for the purpose it is meant for, and it work great for that.
Fwiw, it's awkward that you can't pass etag *and* prefix to Stream. I realize that adding that overload converges with Data Subscriptions...hence these suggestions. Further, looking at SusbscriptionOptions and options for Changes, I can see someone wanting to Stream over those as well - non-persistent Subscription.The reason you can't is that this would force us to scan the entire dataset from that etag and filter everything.This can lead to a very long pause in some cases.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
Tobias
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
Do not increase the Etag of documents during indexing. This increases the chance on concurrency exceptions a lot, while the document hasn't actually changed in the meaning of a concurrency exception.
--
Guys,We are doing a lot of work on 4.0, and one of the things we are looking at is not just adding new stuff, but removing bad old stuff.
What are the things that you regret having in RavenDB? Pifalls, issues, confusing, etc?
--
--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.
#2 system administration needs to be made first class, the tools that exist now are more akin to dev tools than actual production server administrator tools.
2) Can you be more specific?
- Smart, optimized load-balancing. I wish there was an option to create a smart load-balanced environment. Correct me if I'm wrong, but at the moment if I have master-slave replication, if I write to the master, it will replicate the change and BOTH databases will reindex at once. This means incoming read-only queries (to both instances) are slowed by the massive IO. I admit that partly the problem is with my initial design causing a save to trigger 15 indexes to rebuild at once, but I wish that Raven would smartly load balance read-only queries to optimize performance. Maybe there's an option I'm missing to limit the # of simultaneous indexing operations. I understand this is hard but man would it be nice--Raven is billed as being easy for developers but when it's so easy to make a bad design mistake early on, it costs you immensely downstream when it's hard to make a change or realize you made a bad decision. If Raven could offset my ignorance and help optimize my index load, that'd be amazing.
- Patterns and Practices. Most of my time learning Raven was spent in the documentation but a lot of the practices I know now weren't really mentioned or recommended in the documentation, instead gleaned from StackOverflow questions or this group. For example, even though Int IDs are supported, they aren't supported well in certain scenarios causing grief. I wish I had just been advised to ALWAYS use string IDs and not bother with numeric IDs. Or how to realistically deal with concurrency issues and ETags, index staleness vs. UX, caching recommendations, etc.
- Performance recommendations. It would be nice if Raven documented well-known or recommended performance practices, or even provided sample applications with performance baselines. In other words, I wish when I was learning Raven, I could have seen a "real" production application(s), run it locally, and understand what the performance baselines were to compare against.
- Clearer performance stats in Studio. I couldn't tell you how bad or good my index performance is. I understand some of that information is available in the dashboard but it seems like I need a PhD to understand it. I want a Google Analytics for my stats, I want some clear values to see and know "Oh, okay, my indexing sucks--how can I fix that?" I need better insights into my data and indexes. I want to know how my indexes are performing over time, what factors are making them perform worse or better, etc. I like the "merge suggestions" feature, I want more of that and more high-level insights.

- Stable updates with bug fixes. I can't tell you how frustrating it has been running into bugs and then being forced to wait until RavenHQ updates. I couldn't use MoreLikeThis from May 2015 until like September/October, when 3xxxxx builds came out on HQ. It's also impossible to recommend Raven internally at work until there's a way to patch an instance without risking issues with new features. Nobody is okay with saying, well, we have 50 dev, QA, and Production RavenDB databases, we have to update them to the latest unstable to fix this one bug and by the way, that project is done so no one has capacity to fix the app if there are incompatibility issues--no way that's going to fly, it's not realistic. Our platform team and DBAs support hundreds of applications, and if we introduce Raven and let's say over a year 20 apps are built with it, over the year there will certainly be bugs and updates that need to be applied, it's not realistic to expect to upgrade to an unstable version and possibly cause production outages, let alone doing this over the course of years. I understand this should be addressed with 3.5 and onwards.
That's what I can think of.
On Tuesday, February 9, 2016 at 5:40:02 AM UTC-6, Oren Eini wrote:Guys,We are doing a lot of work on 4.0, and one of the things we are looking at is not just adding new stuff, but removing bad old stuff.What are the things that you regret having in RavenDB? Pifalls, issues, confusing, etc?Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Gareth Thackeray
CTO
www: vidados.com
M: +44 (0) 7748 300359
skype: gareththackeray
This e-mail message is confidential and may contain privileged information. If you are not the above named addressee, it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail message. If you are not the intended recipient of this e-mail message, please delete this message.
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/8qhAJ2hMjfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.
...