Hi Richard,
This is a very common question that gets asked, and is generalised to the set validation problem in CQRS. Greg Young (the "CQRS guy") has a great post here that addresses it:
You've obviously thought about the problem a bit already and a lot of the post may not be new to you, but the key take away I think is the last paragraph:
To sum up I just want to reiterate that this is a *good* thing. Eventual consistency is forcing us to learn more about our domain. It is forcing us to ask questions that are otherwise often not asked.
The questions to ask in your scenario is what is the business cost of having two persistent entities for the same business key? The answer is a product of the chances of such a event happening, vs the cost of compensating (whether that be manual or the cost of developing an automated compensating action). And it's a good question to ask, because in my experience with most business processes, there tends to be enough humans involved that no matter how consistent your system is, duplicates will occur all the time - two different operators might create something for the same logical transaction at the same time, for example, and then to compensate the conflict needs to be resolved some how, by merging or deleting one. And so even with a perfectly consistent store, you still could end up with duplicate entities for the same business entity (in this case, with two different business keys). And so it is important for the business to understand the likelihood and costs associated with dealing with these consistency problems. And then whatever solutions you come up with for that (whether they be automated processes or manual intervention) can be applied both to situations where human error or inconsistency causes the duplicate, and when your systems eventual consistency causes the duplicate.
At the end of the day, we're not saying that ES/CQRS is the right solution for every problem. We just think it's a very good default go to, much better than a relational database, which is to say we think people should first consider ES/CQRS to solve their problems, and use relational databases as a kind of niche solution for cases that don't suit ES/CQRS, rather than the other way around.
In your specific case, if after evaluating your business requirements you find that you absolutely must ensure that there are never duplicate entities for the same business key, you might consider putting a consistent store in front of your persistent entities that maps your business key to the surrogate key, which will check for existence, and create if not found. Such a store could be implemented in Cassandra using an insert if not exists. This way, you isolate your strong consistency requirements from the rest of the functions, and so only pay the cost of strong consistency where it really matters, but for the remainder of your services functionality, it can be eventually consistent, with all the advantages of ES/CQRS.
Regards,
James