The static checks on cyclic dependences between indexes are aimed to prevent infinite loops in the runtime. But the restriction on static cyclic dependences between indexes is too strict, as it requires to recalculate every new state from scratch, not allowing to reuse the previous state value.
Meanwhile, this static kind of restriction could be accompanied with the option of dynamic infinite loops prevention. This would allow us to cardinally extend the powerful Map-Reduce feature with the ability to implement really complex logic based on setting dependencies on the previously calculated and saved states in the elegant way, easy to develop, support and improve code. I think, it would be great to provide a developer with the ability to combine the approaches.
As for now the restriction on static cyclic dependences between indexes can be overcome with Subscriptions/Changes API, however, at the cost of extra code (out of RavenDB), significant performance losses and additional traffic and resources. Another way is switching off the restriction in the local build as I have made for this example.
Adding this feature is rather easy and is going to cardinally extend the RavenDB capabilities and development experience. Please look at the attached 'Shares1' experimental database dump. There is an example task implemented in the two ways:
- Without static cyclic dependences between indexes.
- With static cyclic dependences between indexes (I have turned off the static cyclic dependences checks in my local build of Raven.Server). Actually, in the runtime there are no infinite loops, indexes work nicely. Actual dependences are between the documents, produced by the indexes. Dependences between them are not cyclic, but have a spiral shape. This approach has some advantages and there are ways to prevent infinite loops dynamically. I will describe these below.
The example task is rather simple. It is on the periodic shareholders share calculation. We have opening investment amounts ('Opening Balances' collection) at participant accounts and their corresponding shares (account's investment amount / sum of investment amounts on all accounts ('totalAmount')). Then we have transactions of investments made by accounts ('Transactions' collection). Each period (month) for each account we should prepare a report with opening and closing amounts and shares (share = amount / totalAmount). Also we should prepare periodic aggregated reports on all accounts (share is to equal to 1.0 in all of them).
The first implementation way is based on the approach on running totals calculation demonstrated by Oren in the
webinar on Amberwood case. So all the related indexes in the example are marked with prefix '
Oren/'. It is based on the recalculation from scratch of closings for each (period, account). It required me to implement six indexes (each next one depends on the one or more previous ones), the first four of them are the intermediate auxiliary ones to avoid static cyclic dependencies (maybe you could make less, would be interesting to see):
- Oren/AmountsBy/Account/Period: Aggregates amounts by accounts and periods
- Oren/ClosingsBy/Account/Period: Calculates running totals by accounts and periods (closing amounts)
- Oren/PeriodCount: period count per period kind (is used by the next index),
- Oren/ClosingsBy/Period: Aggregates closing amounts by periods (closing total amounts)
- Oren/ReportsBy/Account/Period: Calculates the final reports with shares, amountChange and openings
- Oren/ReportsBy/Period: Aggregates final account reports by periods
The second implementation way is of mine reflecting the way this original task has been implemented in Excel. It calculates the documents for a next period based on the documents calculated for its previous period. It required me to have only two indexes depending on each other and one of them even on itself:
- ReportsBy/Period: Aggregates reports by periods (totals, except for totalAmounts (!), similar to Oren/ClosingsBy/Period) from the next index.
- ReportsBy/Account/Period: Calculates the final report documents. It depends on its own output collection and for every changed report document it produces the document for the next period (until its start date gets invalid, i.e. out of overall period latest boarder, which stops the index calculation loop), setting its opening equal to the closing of the entry report. It also uses total amounts from the previous index but this does not cause an infinite loop: amount (here) -> total amount (prev index) -> share (here) -> total share (prev index).
The dynamic prevention of infinite loops
RavenDB can already detect static cyclic dependencies so it could provide the option to prevent infinite loops dynamically for the every detected cyclic dependency by configuring and checking the maxim number of calculations of indexes in the loop.
Both static dependences between indexes described above have predictable number of iterations which can be calculated for the modified document (using its fields) after the first iteration(when RavenDB sees that next index (maybe the same) in the static cyclic dependency path is to be updated):
- For the first dependency: For every period.kind of the output document:
from Periods
group by kind
select count(dateStart) As maxIndexCalculationsInTheLoop
- For the second dependency: It is just static 4 index calculations in the loop.
In case this exact criteria is absent, static limitations with default values could be configured and used for dynamic checks.
Let's compare the two implementations
- Code
- The first way uses Map-Reduce for Accounts and Periods dimensions in Oren/ClosingsBy/Account/Period index (i.e. their cartesian product, Periods dimension to recalculate closings from scratch). So here we have pure Map-Reduce approach.
- The second way uses Map-Reduce for Accounts dimension only (as it relies on the saved state for the previous period). So here we have the combination of the Map-Reduce and Dependency-on-the-saved-state approaches.
- The second way has only two final indexes instead of six of the first one. The first way requires extra 4 auxiliary indexes to avoid static cyclic dependencies.
- The second way is straightforward, can be directly reproduced from the Excel prototype, requires much less code, is much easier and faster to develop, support and improve.
- E.g. I have refactored the second implementation to improve documents structuring. But I haven't done for the first one as it is much more time consuming.
- This example is rather simple. More complex business logic is going to make the second way even more preferable.
- Productivity
- Time/CPU
- Modifying a transaction in the latest period
- The second way is about 3 times faster (comparing the summary length of colored blocks) and, I guess, requires less CPU, as it requires less indexes to calculate and much less Map-Reduce entries (no extra Periods dimension).

- Adding a new latest period with new transactions
- The second way is about 2 times faster (comparing the summary length of colored blocks) and, I guess, requires less CPU, as it requires less indexes to calculate and much less Map-Reduce entries (no extra Periods dimension).


- Details are in the attached 'indexPerf - Adding new latest period with transactions.json' file
- Modifying a transaction in the earliest period
- The first way about 1.5 is faster (comparing the summary length of colored blocks), as I guess, it is performed in parallel for the periods, while the second way is sequential and requires indexes saves for every period.

- Details are in the attached 'indexPerf - Updating transaction in the first period.json' file
- Memory / Disk space
- I guess, the first way should take much more memory and disk space for auxiliary indexes and, especially, to keep Map-Reduce tree for Oren/ClosingsBy/Account/Period index, as it requires additional Periods dimension.
Thus, the second way, combining Map-Reduce with dependencies on the earlier calculated and saved states, looks more preferable to the first way from the following point of views:
- development simplicity and speed
- memory and disk space consumed
- runtime performance in the most common scenarios of changes in the latest period (changes in the past periods are rare in many cases)
So, I think, it would be great to enrich the powerful Map-Reduce feature with the option to prevent infinite loops dynamically in addition to the current static checks. This would allow developers the combined use of the Map-Reduce and Dependency-on-the-saved-state approaches, and cardinally extend RavenDB's capabilities and improve development experience for really complex business logic.
For the first step, it would have nearly zero cost just to allow to switch off the static checks for some indexes as the experimental feature. Afterwards, the dynamic checks could be introduced (as described above). Optionally, I suppose, some optimisations could be done later to increase performance in looped index calculations (e.g. to make less writes to disk of indexes' output collections while they are modified in the loop).
What do you think about this option?
Thanks,
Shlomo