Event store projections failing on a regular basis

660 views
Skip to first unread message

omar....@3dsemantix.com

unread,
May 29, 2017, 10:04:20 AM5/29/17
to Event Store
Hello everybody,

We have started using Event Store for one of our projects at the beginning of this year.

We rely extensively on the projections in order to limit the scope of work to be done by downstream activities.

Unfortunately, we have noticed that the projections engine fails regularly, and requires that we reboot the Event store.

We are currently using Event Store 4.0.0.

Did anybody face the same issue? After performing searches on Google nothing really came up in the results.

I am joining a print screen of our dashboard.

Best regards

Omar

Auto Generated Inline Image 1

Steven Blair

unread,
May 29, 2017, 10:15:05 AM5/29/17
to Event Store
We had a similar problem, but it turned out one of our custom projections was doing something pretty bad, causing a knock on effect, rather than the ES itself being the source of the problem.
Could you have a similar issue?

omar....@3dsemantix.com

unread,
May 29, 2017, 10:28:18 AM5/29/17
to Event Store
Hello Steven,

Our projections try to create a link to existing projections in order to avoid a bloated Event store.

I am joining a sample of our projections hoping you see something that we were not able to see ourselves.

The store behaves properly until all of a sudden the projections seems to fail..

Best regards

Omar

Sample of Queries

fromStreams('$ce-{Component}',
            '$ce-{Geometry}')
            .when(
                {
                    ComponentCreatedEvent:function(state, event){            
                            linkTo(ComponentModified, event);
                            return state;
                    },
                    GeometryCreatedEvent:function(state, event){            
                        if(event.body.SignatureData !== null && event.body.SignatureData.Length !==0) {
                            linkTo(ComponentModified, event);
                            state.count += 1;}
                            return state;
                    },             
                    GeometrySignatureExtractedEvent:function(state, event){
                            linkTo(ComponentModified, event);
                            return state;
                    },
                    GeometryExchangeFileExtractedEvent:function(state, event){
                            linkTo(ComponentModified, event);
                            return state;
                    },
                    GeometryGroupedEvent:function(state, event){
                            linkTo(ComponentModified, event);
                            return state;
                    },
                }
            );
fromCategory('GeometricalGroup')
            .foreachStream()
            .when(
                {
                    GeometricalGroupCreatedEvent:function(state, event){            
                            linkTo(GroupsModified, event);
                            return state;
                    },
                    GeometricalGroupVisuAddedEvent:function(state, event){  
                            linkTo(GroupsModified, event);
                            return state;
                    },    
                    GeometryGroupedEvent:function(state, event){  
                            linkTo(GroupsModified, event);
                            return state;
                    },                
                }
            );

fromStreams('$ce-{Supplier}',
             '$ce-{Family}',
             '$ce-{Component}',
             '$ce-{Geometry}')
            .when(
                {
                    SupplierCreatedEvent:function(state, event){            
                            linkTo(SupAndFamModified}, event);
                            return state;
                    },
                    FamilyCreatedEvent:function(state, event){            
                            linkTo(SupAndFamModified, event);
                            return state;
                    },
                    ComponentCreatedEvent:function(state, event){
                            linkTo(SupAndFamModified, event);
                            return state;
                    },
                    GeometryCreatedEvent:function(state, event){
                        if(event.body.BoundingBox !== null) {
                            linkTo(SupAndFamModified, event); }
                            return state;
                    },
                    GeometrySignatureExtractedEvent:function(state, event){            
                            linkTo(SupAndFamModified, event);
                            return state;
                    },
                }
            );

Steven Blair

unread,
May 29, 2017, 11:17:06 AM5/29/17
to Event Store
Could your projection be writing to another Stream which was created *outside* this projection?

omar....@3dsemantix.com

unread,
May 29, 2017, 11:27:40 AM5/29/17
to Event Store
I am not sure I understand your ** properly.

Some of our projections create new streams. By they don't write to streams they have not created themselves.

Omar
P.S.

When we create the projections, we use the following arguments
deleteStateStream= true
deleteCheckpointStream= true
deleteEmittedStreams= true

Steven Blair

unread,
May 29, 2017, 11:32:28 AM5/29/17
to Event Store
OK, that's what I meant.
I was worried something else had created the stream, and this projection was trying to write to it.

It's hard for me to guess what the real problem is, but perhaps you can disable some of the projections, and slowly start enabling them until the problem happens again?

omar....@3dsemantix.com

unread,
May 29, 2017, 11:50:43 AM5/29/17
to Event Store
Right now I have only one single machine (That is backed up every night)

It is used for our production environment, so stopping some projections and starting them one by one until the server fails does not guarantee that it will be reproducible, as we continue adding more data to the event store.

I am open to other suggestions though :o)

Omar

Steven Blair

unread,
May 29, 2017, 11:54:53 AM5/29/17
to Event Store
Sometimes when I have had a faulted projection, I can click the Projection and it informs me of an error that occurred.
Have you seen anything like that?
If the Projection has Faulted, and caused the By-Category to also fails, it strongly suggests it's a Write to a Stream that ultimately causes the problem.
Sorry I can't be much more help.

Pieter Germishuys

unread,
May 29, 2017, 12:00:37 PM5/29/17
to Event Store
What is the reason for the failure? clicking on the projection in the UI will provide you with more details as to what the cause of the failure is.

omar....@3dsemantix.com

unread,
May 29, 2017, 12:02:36 PM5/29/17
to Event Store
Hey Steven

Thanks for your help.

So far. after looking at the error messages, we did not see any helpful message. I will be on the lookout.

I noticed in the logs some messages such as the message below.

The machine is a Windows machine, with an SSD hard drive. It has 8 Gigs of memory, Event Store occupies 1.8G.

I have a hunch that it is a timeout setting (for the projection computation) but don't know how to change it, nor why it is happening.


[PID:03884:028 2017.05.28 22:50:21.380 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 7966ms. Q: 0/21.
[PID:03884:020 2017.05.28 22:51:18.514 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #1]: ReadStreamEventsForward - 12010ms. Q: 0/13.
[PID:03884:017 2017.05.28 22:52:00.970 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #3]: ReadStreamEventsForward - 12160ms. Q: 0/15.
[PID:03884:007 2017.05.28 22:52:52.474 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #3]: ReadStreamEventsForward - 12658ms. Q: 0/19.
[PID:03884:013 2017.05.28 22:53:36.567 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 11805ms. Q: 0/16.
[PID:03884:032 2017.05.28 22:54:20.106 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 9863ms. Q: 0/19.
[PID:03884:020 2017.05.28 22:54:59.468 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #3]: ReadStreamEventsForward - 9085ms. Q: 0/9.
[PID:03884:020 2017.05.28 22:55:41.536 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 8697ms. Q: 0/16.
[PID:03884:007 2017.05.28 22:56:19.605 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 8578ms. Q: 0/19.
[PID:03884:021 2017.05.28 22:57:00.713 ERROR QueuedHandlerThreadP] ---!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 8620ms. Q: 0/12.
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

omar....@3dsemantix.com

unread,
May 29, 2017, 12:03:30 PM5/29/17
to Event Store
Hello Pieter,

Is it possible to retrieve such an error from the log files.

I have already restarted the event store service

Omar

Pieter Germishuys

unread,
May 29, 2017, 12:10:37 PM5/29/17
to Event Store
You can just search for the text "faulted" in the logs.

omar....@3dsemantix.com

unread,
May 29, 2017, 12:15:33 PM5/29/17
to Event Store
Hello Pieter

Here are all of the faulted found through NotePad++ in the logs folder

Hope this helps

Omar

Search "faulted" (8 hits in 3 files)
  C:\ProgramData\chocolatey\lib\eventstore-oss\tools\logs\2017-05-26\0.0.0.0-2113-cluster-node-err.log (1 hit)
    Line 4524: [PID:03884:019 2017.05.26 21:20:25.649 ERROR ProjectionManager   ] The '$by_category' projection faulted due to 'After retrying 5 times, we failed to write the checkpoint for $by_category to $projections-$by_category-checkpoint due to a CommitTimeout'
  C:\ProgramData\chocolatey\lib\eventstore-oss\tools\logs\2017-05-26\0.0.0.0-2113-cluster-node.log (6 hits)
    Line 536309: [PID:03884:027 2017.05.26 21:12:34.679 DEBUG ResponseWriter      ] PROJECTIONS: Scheduling the writing of $faulted to $projections-$master. Current status of Writer: Busy: True
    Line 540377: [PID:03884:027 2017.05.26 21:12:35.991 DEBUG ResponseWriter      ] PROJECTIONS: Finished writing events to $projections-$master: $faulted
    Line 697794: [PID:03884:019 2017.05.26 21:20:25.649 DEBUG ProjectionManagerRes] PROJECTIONS: Response received: 6770389@$faulted
    Line 697795: [PID:03884:019 2017.05.26 21:20:25.649 ERROR ProjectionManager   ] The '$by_category' projection faulted due to 'After retrying 5 times, we failed to write the checkpoint for $by_category to $projections-$by_category-checkpoint due to a CommitTimeout'
    Line 968992: [PID:03884:025 2017.05.26 21:27:43.024 DEBUG ResponseWriter      ] PROJECTIONS: Finished writing events to $projections-$master: $faulted
    Line 968998: [PID:03884:018 2017.05.26 21:27:44.227 DEBUG ResponseWriter      ] PROJECTIONS: Finished writing events to $projections-$master: $faulted
  C:\ProgramData\chocolatey\lib\eventstore-oss\tools\logs\2017-05-29\0.0.0.0-2113-cluster-node-err.log (1 hit)
    Line 7714: [PID:01692:020 2017.05.29 14:55:35.422 ERROR ProjectionManager   ] The 'proj-ComponentGeometriesProjection' projection faulted due to 'Failed to write an events to $$$projections-proj-ComponentGeometriesProjection-Geometry-136e0441-75c4-4bc6-aa8d-6127ae55c486-checkpoint. Retry limit of 5 reached. Reason: CommitTimeout'



Steven Blair

unread,
May 30, 2017, 3:56:08 AM5/30/17
to Event Store
Omar,

This was the issue we saw as well with the timeout.
It ended up being one of our projections,  which was using "fromAll" so as our Event Store was growing, so was the amount of times this projection was being executed.
Do you happen to have "fromAll" being used?
 

omar....@3dsemantix.com

unread,
May 30, 2017, 8:11:07 AM5/30/17
to Event Store
Hello Steven

Here is a projection that fails even though it does not use fromAll. We do not use fromAll, we rely heavily on fromCatgory as illustrated below
fromCategory('Geometry')
            .foreachStream()
            .when(
                {
                    GeometryCreatedEvent:function(state, event){           
                            linkTo("Component-"+event.body.ComponentId+"-ChildrenGeometries", event);
                            return state;
                    },
                      
                }
            );

Here is a print screen of a failed projection.

Should I just restart the service whenever this happens. Is it possible to raise the timeout used for computing the projections?

Omar



Auto Generated Inline Image 1

Pieter Germishuys

unread,
May 30, 2017, 8:33:45 AM5/30/17
to Event Store
From the provided logs, it would seem that your projections are faulting for a valid reason which is basically it failing to write an event. There is one for the $by_category checkpoint failing to be written after 5 retries.

Pieter Germishuys

unread,
May 30, 2017, 8:37:21 AM5/30/17
to Event Store
As for your question to Steven whether you can up the compilation timeout. From the logs you provided (feel free to drop all of your logs here in an archive for us to go through) increasing the compilation timeout won't help in this case and we need to further diagnose why your projections are struggling to write events to Event Store.

On Tuesday, 30 May 2017 13:11:07 UTC+1, omar....@3dsemantix.com wrote:

omar....@3dsemantix.com

unread,
May 30, 2017, 4:35:06 PM5/30/17
to Event Store
Hello Pieter

Please find enclosed the log of the day where the projection faulted
Let me know if you need more
Omar

https://drive.google.com/open?id=0B5X5ITZq4xz3U1YyOXJyNmRWWEE

omar....@3dsemantix.com

unread,
May 31, 2017, 3:05:04 PM5/31/17
to Event Store
Hello All
The event store seems to have failed creating its projections after we restarted the service.


Here is the exception from the Event Store :
System.NotSupportedException: Specified method is not supported.
   at EventStore.Projections.Core.Services.Management.ManagedProjection.StopUnlessPreparedOrLoaded() in c:\projects\eventstore\src\EventStore.Projections.Core\Services\Management\ManagedProjection.cs:line 882
   at EventStore.Core.Bus.MessageHandler`1.TryHandle(Message message) in c:\projects\eventstore\src\EventStore.Core\Bus\MessageHandler.cs:line 33
   at EventStore.Core.Bus.InMemoryBus.Publish(Message message) in c:\projects\eventstore\src\EventStore.Core\Bus\InMemoryBus.cs:line 324
   at EventStore.Core.Bus.QueuedHandlerMresWithMpsc.ReadFromQueue(Object o) in c:\projects\eventstore\src\EventStore.Core\Bus\QueuedHandlerMRESWithMPSC.cs:line 134

More exceptions
[PID:03968:013 2017.05.31 15:15:52.331 ERROR ProjectionManager   ] The 'proj-GeometricalGroupGeometriesProjection' projection faulted due to 'Failed to write an events to $$$projections-proj-GeometricalGroupGeometriesProjection-Geometry-3634e7f1-4165-46af-bd34-377bb1ceb588-checkpoint. Retry limit of 5 reached. Reason: CommitTimeout'
[PID:03968:013 2017.05.31 15:15:52.331 ERROR ProjectionManager   ] The 'proj-SupplierComponentsProjection' projection faulted due to 'Failed to write an events to $$$projections-proj-SupplierComponentsProjection-Component-5ce7be66-aa6f-416f-9499-9ae81eadd2cd-checkpoint. Retry limit of 5 reached. Reason: CommitTimeout'
[PID:03968:013 2017.05.31 15:15:52.331 ERROR ProjectionManager   ] The 'proj-FamilyComponentsProjection' projection faulted due to 'Failed to write an events to Family-e6a5013a-2f95-4190-b1f1-414b93855a6c-ChildrenComponents. Retry limit of 5 reached. Reason: CommitTimeout'
[PID:03968:013 2017.05.31 15:15:53.455 ERROR ProjectionManager   ] The 'proj-ProjectionForExtractionSignature' projection faulted due to 'Failed to write an events to $$$projections-proj-ProjectionForExtractionSignature-Geometry-c04d500a-75ca-4a61-80db-767dca04bc7d-checkpoint. Retry limit of 5 reached. Reason: CommitTimeout'
[PID:03968:013 2017.05.31 15:15:53.455 ERROR ProjectionManager   ] The 'proj-ProjectionForFamiliesByGeometry' projection faulted due to 'Failed to write an events to $projections-proj-ProjectionForFamiliesByGeometry-GeometricalGroup-389f0913-7cd1-432f-8a4d-758be4f7960e-checkpoint. Retry limit of 5 reached. Reason: CommitTimeout'
[PID:03968:010 2017.05.31 18:01:33.046 FATAL GLOBAL-LOGGER       ] Global Unhandled Exception occurred.
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.ValueCollection.Enumerator.MoveNext()
   at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source, Func`2 predicate)
   at EventStore.Projections.Core.Services.Processing.MultiStreamEventReader.Handle(ReadTimeout message) in c:\projects\eventstore\src\EventStore.Projections.Core\Services\Processing\MultiStreamEventReader.cs:line 173
   at EventStore.Core.Messaging.SendToThisEnvelope.ReplyWith[T](T message) in c:\projects\eventstore\src\EventStore.Core\Messaging\SendToThisEnvelope.cs:line 20
   at EventStore.Core.Services.TimerService.ThreadBasedScheduler.DoTiming() in c:\projects\eventstore\src\EventStore.Core\Services\TimerService\ThreadBasedScheduler.cs:line 68
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

It seems that the category projection has issues being computed again.

Would you guys offer any alternative to using the projections ? A general idea as to the source of the problem would be fine too

Best regards

Omar Msaaf
0.0.0.0-2113-cluster-node-err.log
0.0.0.0-2113-cluster-node-err.log
Auto Generated Inline Image 1

Pieter Germishuys

unread,
May 31, 2017, 3:08:26 PM5/31/17
to Event Store
Hi Omar,
Given the previous set of logs, it would appear your environment is not a good place... The previous logs are filled with commit timeouts and the projection eventually faulting due to the number of attempted retries to write those events are exhausted.

Pieter Germishuys

unread,
May 31, 2017, 3:11:28 PM5/31/17
to Event Store
Apologies, I meant to say "good state" instead of "good place".

omar....@3dsemantix.com

unread,
May 31, 2017, 3:12:38 PM5/31/17
to Event Store
After checking the administrative events, I saw multiple errors :
  • Application Error
  • .Net Runtime

I am joining some of those errors below
Application: EventStore.ClusterNode.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
   at EventStore.Projections.Core.v8.Js1.CompilePrelude(System.String, System.String, LoadModuleDelegate, EnterCancellableRegionDelegate, ExitCancellableRegionDelegate, LogDelegate)
   at EventStore.Projections.Core.v8.PreludeScript.CompileScript(System.String, System.String)
   at EventStore.Projections.Core.v8.PreludeScript..ctor(System.String, System.String, System.Func`2<System.String,System.Tuple`2<System.String,System.String>>, System.Action`2<Int32,System.Action>, System.Action`2<System.String,System.Object[]>)
   at EventStore.Projections.Core.Services.v8.V8ProjectionStateHandler..ctor(System.String, System.String, System.Func`2<System.String,System.Tuple`2<System.String,System.String>>, System.Action`2<System.String,System.Object[]>, System.Action`2<Int32,System.Action>)
   at EventStore.Projections.Core.Services.Management.ProjectionStateHandlerFactory.Create(System.String, System.String, System.Action`2<Int32,System.Action>, System.Action`2<System.String,System.Object[]>)
   at EventStore.Projections.Core.Services.Processing.ProjectionCoreService.CreateStateHandler(EventStore.Projections.Core.Services.ISingletonTimeoutScheduler, EventStore.Common.Log.ILogger, System.String, System.String)
   at EventStore.Projections.Core.Services.Processing.ProjectionCoreService.Handle(CreateAndPrepare)
   at EventStore.Core.Bus.MessageHandler`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TryHandle(EventStore.Core.Messaging.Message)
   at EventStore.Core.Bus.InMemoryBus.Publish(EventStore.Core.Messaging.Message)
   at EventStore.Core.Bus.QueuedHandlerMresWithMpsc.ReadFromQueue(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.ThreadHelper.ThreadStart(System.Object)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Faulting application name: EventStore.ClusterNode.exe, version: 4.0.0.7, time stamp: 0x58d3a166
Faulting module name: js1.DLL, version: 0.0.0.0, time stamp: 0x57ebcee5
Exception code: 0xc0000005
Fault offset: 0x0000000000017d6a
Faulting process id: 0x17c
Faulting application start time: 0x01d2d65c4e4ae482
Faulting application path: C:\ProgramData\chocolatey\lib\eventstore-oss\tools\EventStore.ClusterNode.exe
Faulting module path: C:\ProgramData\chocolatey\lib\eventstore-oss\tools\js1.DLL
Report Id: 13fa4d34-b2eb-4c4c-a1de-6ee63d818853
Faulting package full name:
Faulting package-relative application ID:

Print screens
Auto Generated Inline Image 1

omar....@3dsemantix.com

unread,
May 31, 2017, 3:14:10 PM5/31/17
to Event Store
Hello Pieter

What would you suggest I do ?

Resubmit all of the data again ? Delete projections, then recompute them ?

How could I avoid this in the future ? Avoid projections ?

Best regards

Omar Msaaf
Reply all
Reply to author
Forward
0 new messages