Shutdown and Startup problem

124 views
Skip to first unread message

James G

unread,
Oct 5, 2016, 1:14:11 PM10/5/16
to RavenDB - 2nd generation document database
We are running RavenDB 2.5 build 2956 as a windows service on a Windows Server 2012 R2 VM in Windows Azure (a DS12 - with premium SSD storage). We have about 500 tenant databases on this server - only about 150 of which are actively used. Each has about 50 indexes in them - none of which have Reduce's on them. This server has master-master replication to another identical server with an identical setup. I know we are on an old Raven build at this point and we are researching upgrade paths - but until then it would be nice to understand what is going on or if there is something we could do differently. We'd also like to know, of course, if this is an issue resolved in the newer versions.

When we attempt to stop the Raven windows service (either via the Windows Services UI or using the command line), Raven never stops. It just logs the following warning over and over again forever. 

2016-10-05 04:19:11.1516,Raven.Storage.Esent.StorageActions.DocumentStorageActions,Warn,,Error when trying to open a new DocumentStorageActions,"Microsoft.Isam.Esent.Interop.EsentInvalidInstanceException: Invalid instance handle
   at Microsoft.Isam.Esent.Interop.Api.Check(Int32 err) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 2739
   at Microsoft.Isam.Esent.Interop.Api.JetBeginSession(JET_INSTANCE instance, JET_SESID& sesid, String username, String password) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 823
   at Raven.Storage.Esent.StorageActions.DocumentStorageActions..ctor(JET_INSTANCE instance, String database, TableColumnsCache tableColumnsCache, OrderedPartCollection`1 documentCodecs, IUuidGenerator uuidGenerator, IDocumentCacher cacher, EsentTransactionContext transactionContext, TransactionalStorage transactionalStorage) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\StorageActions\General.cs:line 76"

Well, we've never waited forever of course, but we've waited for over an hour. Raven seems to be stuck in an infinite loop that it never gets out of. So, eventually, we reboot the server and when Raven comes back online, as the databases start replicating again, the server seems to get very busy (both CPU and RAM). This happens for several hours - churning CPU and RAM - until things finally stabilize. It appears that all of the indexes are marked as stale and it needs to rebuild all indexes in all of the tenant databases - hence the churn while it is doing that. When it gets done, things seem to return to normal (meaning the CPU and RAM seem to stabilize and stop churning). 

I've found some other posts on this board with regards to the above error, but nothing in this particular context. This only seems to affect us when shutting down a raven instance - and it happens to both our primary and secondary instances (the secondary usually has nothing going on except replication - so no queries and no streaming results). 

Now, one more wrinkle. We did this process last night and this time we had 5 databases that became inaccessible. When the primary tries to replicate to these 5 databases, we now get these errors:

2016-10-05 16:37:21.9394,Raven.Database.Server.HttpServer,Warn,,Could not open database named: pms_bac1d117-93fc-4b0a-a944-b3df7cc70a7e,"System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException: Could not open transactional storage: f:\RavenDb\Databases\db_xxxx\Data ---> Microsoft.Isam.Esent.Interop.EsentTempPathInUseException: Temp path already used by another database instance
   at Microsoft.Isam.Esent.Interop.Api.Check(Int32 err) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 2739
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 431
   --- End of inner exception stack trace ---
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 445
   at Raven.Database.DocumentDatabase..ctor(InMemoryRavenConfiguration configuration, TransportState transportState) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\DocumentDatabase.cs:line 215
   at Raven.Database.Server.HttpServer.<>c__DisplayClass43.<TryGetOrCreateResourceStore>b__41() in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 1112
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait(TimeSpan timeout)
   at Raven.Database.Server.HttpServer.SetupRequestToProperDatabase(IHttpContext ctx) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 977
---> (Inner Exception #0) System.InvalidOperationException: Could not open transactional storage: f:\RavenDb\Databases\db_xxxx\Data ---> Microsoft.Isam.Esent.Interop.EsentTempPathInUseException: Temp path already used by another database instance
   at Microsoft.Isam.Esent.Interop.Api.Check(Int32 err) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 2739
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 431
   --- End of inner exception stack trace ---
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 445
   at Raven.Database.DocumentDatabase..ctor(InMemoryRavenConfiguration configuration, TransportState transportState) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\DocumentDatabase.cs:line 215
   at Raven.Database.Server.HttpServer.<>c__DisplayClass43.<TryGetOrCreateResourceStore>b__41() in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 1112
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()<---
"
2016-10-05 16:37:22.9863,Raven.Database.Server.HttpServer,Warn,,Error on request,"System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException: Could not open transactional storage: f:\RavenDb\Databases\db_xxxx\Data ---> Microsoft.Isam.Esent.Interop.EsentTempPathInUseException: Temp path already used by another database instance
   at Microsoft.Isam.Esent.Interop.Api.Check(Int32 err) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 2739
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 431
   --- End of inner exception stack trace ---
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 445
   at Raven.Database.DocumentDatabase..ctor(InMemoryRavenConfiguration configuration, TransportState transportState) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\DocumentDatabase.cs:line 215
   at Raven.Database.Server.HttpServer.<>c__DisplayClass43.<TryGetOrCreateResourceStore>b__41() in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 1112
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Raven.Database.Server.HttpServer.SetupRequestToProperDatabase(IHttpContext ctx) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 1014
   at Raven.Database.Server.HttpServer.DispatchRequest(IHttpContext ctx) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 783
   at Raven.Database.Server.HttpServer.HandleActualRequest(IHttpContext ctx) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 678
---> (Inner Exception #0) System.InvalidOperationException: Could not open transactional storage: f:\RavenDb\Databases\db_xxxx\Data ---> Microsoft.Isam.Esent.Interop.EsentTempPathInUseException: Temp path already used by another database instance
   at Microsoft.Isam.Esent.Interop.Api.Check(Int32 err) in C:\Work\ravendb\SharedLibs\Sources\managedesent-61618\EsentInterop\Api.cs:line 2739
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 431
   --- End of inner exception stack trace ---
   at Raven.Storage.Esent.TransactionalStorage.Initialize(IUuidGenerator uuidGenerator, OrderedPartCollection`1 documentCodecs) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Storage\Esent\TransactionalStorage.cs:line 445
   at Raven.Database.DocumentDatabase..ctor(InMemoryRavenConfiguration configuration, TransportState transportState) in c:\Builds\RavenDB-Stable-2.5\Raven.Database\DocumentDatabase.cs:line 215
   at Raven.Database.Server.HttpServer.<>c__DisplayClass43.<TryGetOrCreateResourceStore>b__41() in c:\Builds\RavenDB-Stable-2.5\Raven.Database\Server\HttpServer.cs:line 1112
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()<---

Maxim Buryak

unread,
Oct 6, 2016, 7:13:21 AM10/6/16
to rav...@googlegroups.com
Hi,
Please upgrade to latest 2.5 stable, that includes varius fixes to that situation. Also, consider using the minThreadCount configs we've introduced in the new version:
  • Raven/MinThreadPoolWorkerThreads 
    Indicates minimum worker threads amount value for the .net thread pool. Might be usefull when one wants to help the system to deal with violent bursts of work. Default: ThreadPool current value Minimum: 2

  • Raven/MinThreadPoolCompletionThreads 
    Indicates minimum completion threads amount value for the .net thread pool. Might be usefull when one wants to help the system to deal with violent bursts of work. Default: ThreadPool current value
    Minimum: 2

In the case of such big instance (that we recommend splitting to 200 tennants instances) we recommend using 400 for each





Best Regards,

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Maxim Buryak l Core Team Developer Mobile:+972-54-217-7751

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/  



--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James G

unread,
Oct 6, 2016, 4:21:16 PM10/6/16
to RavenDB - 2nd generation document database
Hi Maxim,

Thanks for you response. We are testing the changes you proposed now and I will follow up here with any results.

Can you clarify one of your statements for me:

In the case of such big instance (that we recommend splitting to 200 tennants instances) we recommend using 400 for each

If I read that correctly, you are suggesting I set the value of both of those settings to 400, correct?

Also, are you suggesting we should only have 200 tenants per server? If so, can you please explain a bit further what leads you to that recommendation?

Thanks,

James

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Oct 7, 2016, 3:11:30 AM10/7/16
to ravendb
The problem with running many tenants on the same hardware is that they compete for a limited amount of resources. 
If you HD can provide 50MB / sec and 1000 IOPS, and you have 400 databases all trying their best to give you their best performance, you are going to see a rate of 125 KB / sec and 2.5 IOPS per db.

Now, typically not all dbs are active at the same time, but in most systems, when you get to hundreds of db, they will compete, and that can cause perf issues.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

James G

unread,
Oct 7, 2016, 9:59:39 AM10/7/16
to RavenDB - 2nd generation document database
Thanks for that insight Oren. 

So, in reality, our current environment provides 200 mb/s and 5000 iops, which then translates to 500 kb/s and 12.5 iops per db (assuming 400 dbs). 

In the Azure environment we can combine drives and each time we add a drive we double those baselines. So, given that the easiest thing to do is "throw hardware" at the problem, what is your recommended level of throughput and iops per DB? I'm thinking both minimal and optimal here (obviously more is always better, but looking for a reasonable "optimal"). 

In addition to disk performance, how does RAM come into play here? Again, I know the more RAM the better, but does slower disk lead to higher RAM consumption? Are they related in any way?

Since we are on 2.5, we are using ESENT. Does Voron help with any of this in any significant way?

Finally, given that we have a multi-tenant solution where each tenant has their own DB, based on your recommendations of 200 dbs per server (or whatever # of dbs per server based upon the hardware config), I assume you would suggest sharding as the way forward for us? Is that accurate? And if so, how would we move from our existing non-sharded environment to a sharded environment - any recommendations/best practices/insight into the best ways to approach that? 

(I can ask this in a separate thread if that makes more sense and provides better visibility to any of these questions.)

Thanks,
James

Oren Eini (Ayende Rahien)

unread,
Oct 7, 2016, 11:05:48 AM10/7/16
to ravendb
Hi,
The issue generally comes to a point when you have a lot of actively used databases.
Esent uses a page cache to avoid hitting the disk, but given that you have so many dbs, the chance that most pages will have to go to disk is high.
What is the total disk size of all the active dbs you have right now?

RavenDB will try to use as much memory as it can to speed things up, slow I/O won't impact that need, since even fast I/O is much slower than disk.
Voron is using mmap files, which relies on the OS to keep them in the cache, it is better since the active parts are better handled, but it is roughly the same behavior.

Sharding isn't quite the way to go, just split your dbs among multiple servers.
I don't know if you are doing replication, but that would also have to be taken into account.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

James G

unread,
Oct 7, 2016, 11:57:44 AM10/7/16
to RavenDB - 2nd generation document database
Hey Oren,

Well we have about 500 DBs on the server and in total they are using about 150GB disk space. Now, as I noted in the original post, only about 150 of those are truly active DBs (paying customers and a few shared/operational DBs) the remaining 350ish DBs are no longer active DBs (created for the purpose of potential customers evaluating our software). So, I don't know the actual disk space used by the 150ish truly "active" DBs. 

We are replicating to an identical server using master-master replication. Do these non-active DBs (that are NOT being accessed by our apps at all) stay active in any way simply because of the fact that they are being replicated? We do plan to build a process to delete the non-active DBs and have that process run regularly (to keep the non-active DBs to a minimum going forward) and I'm wondering if that would have an impact on server performance at all (because those DBs would no longer take up any resources in terms of replication)?

Thanks,
James

Oren Eini (Ayende Rahien)

unread,
Oct 7, 2016, 12:50:58 PM10/7/16
to ravendb
Yes, if they are replicated, then they are likely held active (we need this to be able to failover rapidly).
The /admin/stats endpoint will list all the loaded dbs.

And it will also record the size of those.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

James G

unread,
Oct 7, 2016, 5:05:51 PM10/7/16
to RavenDB - 2nd generation document database
So, I hit the admin/stats and indeed all of the databases are loaded. I guess we will be working on deleting inactive DBs right now! 

Thanks for your help guys,
James

Jahmai Lay

unread,
Oct 17, 2016, 2:27:41 AM10/17/16
to RavenDB - 2nd generation document database

Does clustering help with having lots of databases or does it just duplicate the issue across the whole cluster?

Oren Eini (Ayende Rahien)

unread,
Oct 18, 2016, 3:12:34 AM10/18/16
to ravendb
Depend on the mode you choose.
Typically in this scenario you won't have all the databases across all the nodes.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jahmai Lay

unread,
Oct 18, 2016, 3:23:20 AM10/18/16
to RavenDB - 2nd generation document database
Can you elaborate on "mode"? Reading the clustering documentation https://ravendb.net/docs/article-page/3.5/csharp/server/scaling-out/clustering/clustering-overview it talks alot about how a voting is done, but not about things like load balancing and which nodes get which databases.

Oren Eini (Ayende Rahien)

unread,
Oct 18, 2016, 3:27:16 AM10/18/16
to ravendb
It is about the cluster topology, and how you run it.

In general, a cluster is a shared set of nodes that all have the same databases and behavior, and they will select a leader among themselves to be the primary write node.
You can them specify load balancing behavior (round robin, SLA, etc).

If you have large number of databases, you'll typically have multiple clusters, each having some portion of the databases.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Jahmai Lay

unread,
Oct 18, 2016, 3:42:20 AM10/18/16
to RavenDB - 2nd generation document database

Right so there is some degree of manual client side configuration when deciding which db goes to / lives in which cluster. Maybe future cluster will have auto balancing?

Oren Eini (Ayende Rahien)

unread,
Oct 18, 2016, 3:54:18 AM10/18/16
to ravendb
Yes
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages