Narayana is changing towards becoming more cloud native to keep up with modern technologies (te.g. K8s/OpenShift, Azure, AWS, etc.). As Narayana is a transaction manager for distributed systems, most of its functionalities are ready to be hosted in cloud environments. Nevertheless, there are some parts of Narayana that need to be improved to optimise the integration of our code base in cloud environments. One of them is the node identifier (ID). In fact, in scenarios where the runtime controls the ID (e.g. the integration of Narayana in WildFly), there is the need to:
For what concerns the persistence of the ID (point 2 above), we are still discussing what is the best approach to store the ID up. Initially, we thought to store the node identifier into the object store but then the generation of the ID must come before choosing the object store; moreover, if the object store is shared among different Narayana instances, there might also be a problem identifying the owner of a particular ID.
On the other hand, to address point 1, the team has put together a couple of options. Following, some of the available algorithms to generate a globally unique identifier (GUI) are listed:
In my opinion, KSUID would be the best choice to generate GUI and the reason is well explained in its GitHub page: “[...] A KSUID includes 128 bits of pseudorandom data ("entropy"). This number space is 64 times larger than the 122 bits used by the well-accepted RFC 4122 UUIDv4 standard. The additional timestamp component can be considered "bonus entropy" which further decreases the probability of collisions, to the point of physical infeasibility in any practical implementation. [...]”. Moreover, this algorithm has been “Battle Tested” for several years in a production environment.
This conversation is very important for the future development of Narayana in the cloud environment so your contribution is essential :-) Feel free to propose any solution/suggestion for either point above. Thanks!
--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/2d821d08-7015-4182-b1f1-2d7ee3b77fdbn%40googlegroups.com.
UUID v4
This is a widespread algorithm/standard to generate random numbers of 128 bit
Pros
Generation of UUIDs is a decentralised process that can be executed locally without any interaction with other participants
It has been widely used in distributed systems since its origin (1980s). It is currently employed as primary key of DBs as well as in systems that are locally confined (e.g.linux filesystem)
Even though it is not guaranteed to be collision-free, UUID v4 can rely on a very low probability of producing duplicates, which suggests that UUID can be reliably used in scenarios where UUIDs are generated every once in a while (as it would be used in Narayana to generate the node’s identifier)
With 128 bit (i.e. 16 byte) UUIDs can be comfortably used as node’s identifier as they can be squeezed into the transaction identifier (XID) without any problem of space
UUID is part of Java already
Cons
Source of randomness plays an important role when UUIDs are generated. In fact, the source of randomness (i.e. entropy) should be reliable (for example, atmospheric noise, laser, white noise). The lack of entropy is particularly evident in embedded systems, where the source of randomness can be limited. On the other hand, the source of randomness in modern OSs is quite reliable as many random events happen while a machine is operating (e.g. exchange of data on network interfaces, interaction with users, interrupts, etc.)
KSUID (only pros :-) I am a bit biased here)
This algorithm can be considered an evolution of the UUID v4 as it overcomes the issue of relying on a real source of randomness introducing more entropy in the equation. In fact, the overall 160 bit are divided in:
128 bit to generate a random number (which works out to be a number-space that is 64 times bigger than UUID v4’s)
32 bit are used to report a timestamp (extra entropy)
As a consequence, the probability of producing duplicates becomes even lower than UUID v4, “to the point of physical infeasibility in any practical implementation” (if the equation considered here is modified considering 160 bit, the result will be 1.42*10^24...if an immortal human being (with quite some spare time) wants to measure the diameter of the (observable) universe in meters, they would report a number of 8.8*10^26).
Talking about the size of the number generated using KSUID, 160 bit (i.e. 20 byte) is still a reasonable size to be used as a node's identifier and squeezed into the XID. It is still a decentralised algorithm so numbers produced locally are globally unique. Additionally, KSUID:
Produces k-sortable IDs
Is currently supported
Supplies a Java implementation
I do not see any evident cons using this algorithm but if you spot some, please, let me know.
SnowFlake (and, as a consequence, SonyFlake)
This algorithm was introduced by Twitter in this announcement. Since then, new versions of SnowFlake have been developed and the original source code deprecated. This algorithm is currently used by Twitter to generate an identifier for whatever entity needs to be identified (e.g. tweets, users, etc.).
Pros:
64 bit (8 byte)
Identifiers are guaranteed to be collision-free
Cons:
SnowFlake needs to initially coordinate nodes (basically, nodes need a unique identifier to start the algorithm). Tweeter used to employ ZooKeeper. This is the main disadvantage that excludes SnowFlake from the possible solutions of the ID generation problem we are discussing
No maintained implementations available
No Java implementation
The reason SnowFlake was listed among the solutions is that this algorithm is the originator of other (more evolved) decentralised algorithms that might be a good solution for our problem.
SonyFlake is an implementation of SnowFlake but it does not resolve the problem of assigning a unique ID to each machine. As a consequence, this algorithm will not be discussed.
Boundary’s Flake
Flake is an evolution of SnowFlake that tries to introduce two important improvements to the original algorithm:
Identifier generation at a node should not require coordination with other nodes
IDs should be roughly time-ordered when sorted lexicographically. In other words they should be k-ordered
This sounds very exciting but the shortfall of this algorithm is that the SnowFlake’s coordination phase has been replaced with a “unique” value that is locally available: the MAC address. This solution has been already employed in UUID v1 which has very well-known problems of security and privacy.
Pros:
Solve the SnowFlake’s coordination problem
Improved entropy
IDs can be ordered
Improved implementation of UUID v1
Cons:
Privacy problem due to MAC address
MAC addresses can be shared in modern technologies (e.g. cloud)
It is no longer supported (last commit is 5 years old)
There is not an implementation written in Java
TimeFlake
This algorithm is another improvement of SnowFlake but completely based on time. Pros are very well summarised in the algorithm’s GitHub page:
Pros:
Fast. Roughly ordered (K-sortable), incremental timestamp in most significant bits enables faster indexing
Unique enough. With 1.2e+24 unique timeflakes per millisecond, even if you're creating 50 million of them per millisecond the chance of a collision is still 1 in a billion. You're likely to see a collision when creating 1.3e+12 (one trillion three hundred billion) timeflakes per millisecond
Efficient. 128 bits are used to encode a timestamp in milliseconds (48 bits) and a cryptographically generated random number (80 bits)
Flexible. Out of the box encodings in 128-bit unsigned int, hex, URL-safe base62 and raw bytes. Fully compatible with uuid.UUID
Implementation in Java available
Currently maintained
Cons:
No evident cons
Just as a side note, if compared with KSUID, TimeFlake has less entropy (and therefore a higher probability to produce a duplicate; bare in mind that we are still talking about a very low probability - see pros)
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/b1c26baa-d60f-479d-b2da-3706de2a7b63n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/000ae250-694f-4911-a166-e596bcd76dc7n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/2d821d08-7015-4182-b1f1-2d7ee3b77fdbn%40googlegroups.com.
On 30 Sep 2021, at 16:35, Manuel Finelli <jfin...@redhat.com> wrote:> If that is going to happen, you'll need to check their licences and determine whether or not they're compatible.+1 thanks Mark
--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/09f07022-9a14-4833-8d39-510d6a06e2a1n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/823282eb-36fa-4ab4-b356-30167bf0f2d3n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/36c031b6-a471-4c79-8a52-f64bed152d69n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/1f7c8732-abd5-4686-8269-b6dd48273908n%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/7a02677f-4191-4d68-bab3-5614cd260cbcn%40googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/67e3e0f8-ce21-4099-bf61-ca6604181b98n%40googlegroups.com.
On 25 Nov 2021, at 09:37, Michael Musgrove <michael....@gmail.com> wrote:So are you suggesting that our Uid class, which uses (2 * 64) + (3 * 32) bits, is unsafe?
And are you suggesting that we deprecate Uid altogether in favour of something else?
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/e0ee8c1e-7667-43d8-a447-7de184f3972cn%40googlegroups.com.
On 25 Nov 2021, at 10:05, Michael Musgrove <michael....@gmail.com> wrote:
On Thu, Nov 25, 2021 at 9:57 AM Mark Little <mli...@redhat.com> wrote:Inline ...On 25 Nov 2021, at 09:37, Michael Musgrove <michael....@gmail.com> wrote:So are you suggesting that our Uid class, which uses (2 * 64) + (3 * 32) bits, is unsafe?Yes, I’d like to see the data on this compared to one of the other options.And are you suggesting that we deprecate Uid altogether in favour of something else?I’d be -1 to this for a number of reasons, not least of which is the invasiveness of the change and impact on so many other products, like Fuse, Integration and PAM.
Me too. So we can either:1. introduce a new data type for storing the nodeIdentifier, or2. we re-purpose the hostname + pid fields and stuff one of these 160-bit KSUIDs into them, or3. we get some hard data on the safety of our current implementation of Uid and then make a decision
So are you suggesting that our Uid class, which uses (2 * 64) + (3 * 32) bits, is unsafe?
And are you suggesting that we deprecate Uid altogether in favour of something else?
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/c7f4e227-91e8-473b-b98d-6d22e2b298bfn%40googlegroups.com.