RavenDB 4 - Subscription cannot be processed by current node

69 views
Skip to first unread message

Andrej Krivulčík

unread,
Oct 16, 2017, 5:29:42 AM10/16/17
to RavenDB - 2nd generation document database
I'm not sure how I got to the following situation. There were no configuration changes since last work with subscriptions when the subscriptions worked correctly. The issue first occurred with Build 40, Version 4.0, SemVer 4.0.0-nightly-20170929-0401, Commit 4fd682b and persists with the latest nightly (Build 40, Version 4.0, SemVer 4.0.0-nightly-20171016-0400, Commit 4a4e831).

When trying to open a subscription, it fails with the following exception (it's trying to redirect to null or empty string):

Subscription With Id '32' cannot be processed by current node, it will be redirected to 
   at Raven.Client.Documents.Subscriptions.Subscription`1.AssertConnectionState(SubscriptionConnectionServerMessage connectionStatus)
   at Raven.Client.Documents.Subscriptions.Subscription`1.<ProcessSubscriptionAsync>d__30.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Client.Documents.Subscriptions.Subscription`1.<RunSubscriptionAsync>d__36.MoveNext()

When I try to open Settings - Manage Ongoing Tasks on the database, the call to /databases/data-subset/tasks fails and the following error is shown:

System.ArgumentNullException: Value cannot be null.
Parameter name: key
   at System.ThrowHelper.ThrowArgumentNullException(ExceptionArgument argument)
   at System.Collections.Generic.Dictionary`2.FindEntry(TKey key)
   at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
   at Raven.Client.Http.ClusterTopology.GetUrlFromTag(String tag) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Client\Http\ClusterTopology.cs:line 82
   at Raven.Server.Web.System.OngoingTasksHandler.<CollectSubscriptionTasks>d__2.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Web\System\OngoingTasksHandler.cs:line 92
   at System.Collections.Generic.List`1.AddEnumerable(IEnumerable`1 enumerable)
   at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
   at Raven.Server.Web.System.OngoingTasksHandler.GetOngoingTasksFor(String dbName, ServerStore store) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Web\System\OngoingTasksHandler.cs:line 61
   at Raven.Server.Web.System.OngoingTasksHandler.GetOngoingTasks() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Web\System\OngoingTasksHandler.cs:line 31
   at Raven.Server.Routing.RequestRouter.<HandlePath>d__5.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Routing\RequestRouter.cs:line 107
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at Raven.Server.RavenServerStartup.<RequestHandler>d__11.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\RavenServerStartup.cs:line 157

It seems that the error in /databases/.../tasks didn't occur before trying to create the subscription. What I did:
  1. Tried to process some documents in the data-subset database. Failed with the error above.
  2. Tried to access Settings - Manage Ongoing Tasks for data-subset. Failed with the error above.
  3. Tried to access Settings - Manage Ongoing Tasks for database data-prod. The screen worked successfully.
  4. Tried to process the documents in the data-prod database. Failed with the error above.
  5. Subsequent opening of Settings - Manage Ongoing Tasks for data-prod fails with the same error.
Any suggestions about how could I find out what causes this error? The server runs standalone, without other servers.

kar...@ayende.com

unread,
Oct 16, 2017, 6:11:50 AM10/16/17
to RavenDB - 2nd generation document database
Hi,

could you please share the output of /databases?name=data-set endpoint?

Andrej Krivulčík

unread,
Oct 16, 2017, 6:41:08 AM10/16/17
to RavenDB - 2nd generation document database
Here it is, formatting mine:

{
  "Name": "data-subset",
  "Disabled": false,
  "TotalSize": {
    "HumaneSize": "3.2 GBytes",
    "SizeInBytes": 3439329280
  },
  "IsAdmin": true,
  "IsEncrypted": false,
  "UpTime": "02:04:55.7114760",
  "BackupInfo": null,
  "Alerts": 0,
  "RejectClients": false,
  "IndexingErrors": 0,
  "DocumentsCount": 198172,
  "HasRevisionsConfiguration": false,
  "HasExpirationConfiguration": false,
  "IndexesCount": 6,
  "IndexingStatus": "Running",
  "NodesTopology": {
    "Members": [
      
    ],
    "Promotables": [
      
    ],
    "Rehabs": [
      {
        "NodeTag": "A",
        "NodeUrl": "http://REDACTED:80",
        "ResponsibleNode": null
      }
    ],
    "Status": {
      "A": {
        "LastStatus": "NotResponding",
        "LastError": "In rehabilitation because the last report status was \"WaitingForResponse\""
      }
    }
  },
  "ReplicationFactor": 1,
  "DynamicNodesDistribution": true,
  "DeletionInProgress": {

Oren Eini (Ayende Rahien)

unread,
Oct 16, 2017, 6:41:58 AM10/16/17
to ravendb
In addition, you have just a single node in the cluster?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrej Krivulčík

unread,
Oct 16, 2017, 6:42:43 AM10/16/17
to RavenDB - 2nd generation document database
Yes, single node only.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Grisha Kotler

unread,
Oct 16, 2017, 6:45:55 AM10/16/17
to rav...@googlegroups.com
Do you see any alerts?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Andrej Krivulčík

unread,
Oct 16, 2017, 6:49:20 AM10/16/17
to RavenDB - 2nd generation document database
This message intermittently appears:

No living nodes in the database topology
October 16, 2017 12:48 PM
It appears that all nodes of the data-subset database are not responding to the supervisor, the database is not reachable


The database operated normally until now though, this is the first issue.

kar...@ayende.com

unread,
Oct 16, 2017, 6:50:59 AM10/16/17
to RavenDB - 2nd generation document database
Ok, interesting... could you also share the /admin/cluster/log endpoint? 

Andrej Krivulčík

unread,
Oct 16, 2017, 6:51:51 AM10/16/17
to RavenDB - 2nd generation document database
{"CommitIndex":33,"LastTruncatedIndex":33,"LastTruncatedTerm":5,"FirstEntryIndex":0,"LastLogEntryIndex":0,"Entries":[]}

kar...@ayende.com

unread,
Oct 16, 2017, 8:12:58 AM10/16/17
to RavenDB - 2nd generation document database
sorry for the trouble, could you share
/admin/cluster/observer/decisions

also, does the 'Iteration' property in this endpoint incriminating?   

Andrej Krivulčík

unread,
Oct 16, 2017, 8:15:32 AM10/16/17
to RavenDB - 2nd generation document database
{"LeaderNode":"A","Term":5,"Suspended":false,"Iteration":26351,"ObserverLog":[]}

I guess you meant incrementing - yes, it is.

kar...@ayende.com

unread,
Oct 16, 2017, 9:42:08 AM10/16/17
to RavenDB - 2nd generation document database
If you go to the admin logs in the studio ( /studio/index.html#admin/settings/adminLogs) do you see any exceptions there?

Andrej Krivulčík

unread,
Oct 16, 2017, 9:55:20 AM10/16/17
to RavenDB - 2nd generation document database
Yes, there were exceptions about not being able to contact itself by the public IP address. I can't provide the full exception because I had to navigate away from the admin logs screen. However, based on the exception, I realized that the server has IP-based firewall with only few IP addresses whitelisted. After adding its public IP address to the firewall whitelist, the issue is resolved.

This might be something to include in the documentation and perhaps it would be useful to do an explicit check about the server being able to contact its PublicServerUrl, as the symptoms don't easily lead to the cause.

Guys, thank you very much for leading me with this one.

Andrej Krivulčík

unread,
Oct 16, 2017, 9:59:31 AM10/16/17
to RavenDB - 2nd generation document database
After a while (after removing the IP address from the whitelist), I could reproduce again. The admin logs show the following:

2017-10-16T13:55:56.2880218Z, 155, Information, A, Raven.Server.ServerWide.Maintenance.ClusterMaintenanceSupervisor+ClusterNode, Exception was thrown while collecting info from A, Raven.Client.Exceptions.AllTopologyNodesDownException: Tried to send 'GetTcpInfoCommand' request via `GET /info/tcp?tag=Supervisor` to all configured nodes in the topology, all of them seem to be down or not responding. I've tried to access the following nodes: http://REDACTED ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.WinHttpException: The operation timed out
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Threading.Tasks.RendezvousAwaitable`1.GetResult()
   at System.Net.Http.WinHttpHandler.<StartRequest>d__105.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsyncUnbuffered>d__59.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Client.Http.RequestExecutor.<ExecuteAsync>d__68`1.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Client\Http\RequestExecutor.cs:line 633
   --- End of inner exception stack trace ---
   at Raven.Client.Http.RequestExecutor.ThrowFailedToContactAllNodes[TResult](RavenCommand`1 command, HttpRequestMessage request, Exception e, Exception timeoutException) in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Client\Http\RequestExecutor.cs:line 738
   at Raven.Client.Http.RequestExecutor.<ExecuteAsync>d__68`1.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Client\Http\RequestExecutor.cs:line 649
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Server.Utils.ReplicationUtils.<GetTcpInfoAsync>d__1.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\Utils\ReplicationUtils.cs:line 38
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Raven.Server.ServerWide.Maintenance.ClusterMaintenanceSupervisor.ClusterNode.<ListenToMaintenanceWorker>d__17.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Server\ServerWide\Maintenance\ClusterMaintenanceSupervisor.cs:line 169
2017-10-16T13:55:56.3956455Z, 143, Information, Client, Raven.Client.Http.RequestExecutor, Failed to check node topology, will ignore this node until next topology update, System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at Sparrow.Json.JsonContextPoolBase`1.AllocateOperationContext(T& context) in C:\Builds\RavenDB-4.0-Nightly\src\Sparrow\Json\JsonContextPoolBase.cs:line 92
   at Sparrow.Json.JsonContextPoolBase`1.AllocateOperationContext(JsonOperationContext& context) in C:\Builds\RavenDB-4.0-Nightly\src\Sparrow\Json\JsonContextPoolBase.cs:line 68
   at Raven.Client.Http.RequestExecutor.<CheckNodeStatusCallback>d__88.MoveNext() in C:\Builds\RavenDB-4.0-Nightly\src\Raven.Client\Http\RequestExecutor.cs:line 977

Oren Eini (Ayende Rahien)

unread,
Oct 17, 2017, 4:16:08 AM10/17/17
to ravendb
So basically, this is a case where we are trying to connect to ourselves via a public IP, but are being blocked by firewall, right?

Can anyone think of a reason why we shouldn't make it required for a database to be able to connect to itself via the PublicServerUrl ?

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Andrej Krivulčík

unread,
Oct 17, 2017, 8:28:29 AM10/17/17
to RavenDB - 2nd generation document database
Yes, exactly this case.

Requiring the connection to its PublicServerUrl is reasonable. However, I think that if it's possible to detect that the database can use the loopback connection, it should use that one, potentially improving performance - avoiding part of the network stack, at least on Linux, I think.

Oren Eini (Ayende Rahien)

unread,
Oct 17, 2017, 10:00:25 AM10/17/17
to ravendb

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
Reply all
Reply to author
Forward
0 new messages