> There is a new build out there, and it is awesome.
Great!
I'm just missing the build tag in the Git and one test fails on my machine:
Raven.Bundles.Tests.Authentication.SmugglerOAuth.Export_WithoutCredentials_WillReturnWithStatus401
[FAIL]
Assert.Equal() Failure
Position: First difference is at position 0
Expected: The remote server returned an error: (401) Unauthorized.
Actual: Der Remoteserver hat einen Fehler zur�ckgegeben: (401)
Localization-issue!
Fixed in:
https://github.com/ravendb/ravendb/pull/463
Tobias
Am 01.03.2012 13:55, schrieb Oren Eini (Ayende Rahien):Great!
There is a new build out there, and it is awesome.
I'm just missing the build tag in the Git and one test fails on my machine:
Raven.Bundles.Tests.Authentication.SmugglerOAuth.Export_WithoutCredentials_WillReturnWithStatus401 [FAIL]
Assert.Equal() Failure
Position: First difference is at position 0
Expected: The remote server returned an error: (401) Unauthorized.
Actual: Der Remoteserver hat einen Fehler zurückgegeben: (401)
I wish I could get speeds like that.
I’m using 700, but my inserts are taking a lot longer, roughly 1.5 seconds per 512 inserts, granted this does include a query to NHibernate which is taking around 750ms.
When you are importing 220,000 docs, where are you importing from? Is all your data loaded into memory first?
Paul
I installed a new disk today, not quite SSD but it’s the Seagate Hybrid SSD 750GB (7200rpm) 32MB Cache.
When you are importing 220,000 docs, where are you importing from? Is all your data loaded into memory first?
The only way the I can think of to make is faster is outlined in this article http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/, in fact the comments there discuss the 2 approaches.This was the approach that I was taking with my original (complicated) approach to the faceted queries (before you simplified it ;-). However the issue is, how/when do you create the initial BitArray, in theory it can be done at the end of any index updates. If this approach was implemented, then you would only need to issue one query for a faceted search, regardless of the amt of terms.> One big thing that might help is actually enabling 304 on the endpoint. It currently does not supports this.What do you mean by this?
> On another subject, did you have a chance to look at INTERSECT ?Yeah, I've been busy at work and sick, so not had much spare time! Can you hang on another couple of weeks, I'd really like to implement it?
inline
On Fri, Mar 9, 2012 at 6:51 PM, Matt Warren <matt...@gmail.com> wrote:
That's exactly how I did it in my complex version, I calculated all the necessary BitArrays, and then stored them in a dictionary keyed on the FacetName/Range. I then serialized this as a doc and stored it in Raven.
That isn't good, it will force us to reindex (because you create a new doc).We can keep this in memory instead
At facet query time I pulled out the doc and did an AND of the query bit array with each facet term bit array.The issue I found was that it took a relatively long time to create all the facet bit arrays after the index had been updated. I seem to remember it took at least a minute (when there was 100,000's of docs in the store). But it could all be done in the background and I guess that the facet for that index could be marked as "STALE" until it was completed.
Facets is a relatively costly feature, we can probably do this on demand, rather than all the time. And if we detect an index difference, we can recalc this, or maybe return stale facets?How important is it to get up to the ms facet info, really?
That's exactly how I did it in my complex version, I calculated all the necessary BitArrays, and then stored them in a dictionary keyed on the FacetName/Range. I then serialized this as a doc and stored it in Raven.
At facet query time I pulled out the doc and did an AND of the query bit array with each facet term bit array.
The issue I found was that it took a relatively long time to create all the facet bit arrays after the index had been updated. I seem to remember it took at least a minute (when there was 100,000's of docs in the store). But it could all be done in the background and I guess that the facet for that index could be marked as "STALE" until it was completed.
The other issue I never addressed was when the update was done. It would be best if it was done after all the batches of work for an index were done, because you have to completely re-create the BitArrays each time, I don't think you can incrementally update it.
The timings are below, there are the same amout of rows in the SQL table as docs in the RavenDB store (now 55 million) and they represent the same data:SQL takes 50 secsselect Date, Count(*) from dbo.Tablegroup by DateFaceted search on the same field takes 4.5 secs!!
session.Store(new FacetSetup
{
Id = facetSetupDoc,
Facets = new List<Facet>
{
new Facet {Name = "Date"},
}
});
session.SaveChanges();var facetResults = session.Query<ForecastData>("ForecastIndex") .ToFacets(facetSetupDoc);
That is why I also said we need to consider stale facets as well, because we can avoid regenerating this all the time.
For the scenarios I used to work with (online travel agencies), the indexes and facets together were considered stale (so the counts where always correct). It was ok to have them stale for up to an hour.
When a documents causes a trigger (added/changed/deleted), the indexes and facets where rebuild in the background. When ready they replaced the current set of indexes and facets.
Optionally an x amount of time is waited after a trigger, before rebuilding, so multiple triggers could be rebuild in one go.
I think only stale facets is not a big issue, because it’s usually an indication to the user, but I would prefer to have correct counts.
Are you using build 700?Can you try build 888 ?
We added caching there that should greatly help performance in your scenario.
On Wed, Apr 18, 2012 at 5:12 AM, Stephen Panetta
Hi Oren,
Are you using build 700?Can you try build 888 ?
We added caching there that should greatly help performance in your scenario.
On Wed, Apr 18, 2012 at 5:12 AM, Steve wrote