A Follow-Up: "3 Reasons Why Encryption is Overrated" -- http://bit.ly/e8CyM

28 views
Skip to first unread message

LJMecca

unread,
Jun 23, 2009, 11:46:25 AM6/23/09
to Cloud Computing
There has been a lot of discussion (both negative and positive) around
a recent Cleversafe blog post titled "3 Reasons Why Encryption is
Overrated" (http://bit.ly/EokrT).

Today, Cleversafe has posted a detailed clarification (part 1 of 3).
It's focused around the threat to encrypted data due to changes in
processing power over time (explained by Moore's Law).

Full post available here: http://bit.ly/e8CyM

Sassa

unread,
Jun 24, 2009, 8:36:04 AM6/24/09
to Cloud Computing
Well, in this form it is not dispersal as such. They must be using
some form of threshold cryptography to encrypt the same data N times
(so that only by having K copies of those can someone recover the
plaintext).

* They don't discuss the strength of their threshold cryptography
algorithm - they only contrast how RSA can be compromised in 40 years.
* They don't discuss the strength of their "authentication" system and
why compromising all K systems should be harder than compromising one.
E.g. if K=10, the complexity isn't increasing beyond impossible; if K
systems are homogenous, hacking the second one can be as easy as the
first one; if K is large, throughput needs to increase
proportionately, because you'd need K copies
* They don't discuss how they make sure that no one in transit has
access to K copies


Sassa

Steve Mansfield

unread,
Jun 24, 2009, 9:20:30 AM6/24/09
to cloud-c...@googlegroups.com
I don't get it.

If it is true that "While processing power today may keep encrypted files (that are stored in the cloud, for example) safe, as processing power improves, archived encrypted files will require systematic re-encryption to remain safe from potential hackers. Systematic re-encryption, though, is difficult, laborious and expensive."

Then if your data becomes fairly trivial to brute force de-crypt, then it should be easier to re-crypt?

Steve

Greg Pfister

unread,
Jun 25, 2009, 11:00:25 PM6/25/09
to Cloud Computing
I don't get it either. Check out Wydarr's response on the follow-up
page; he points out that Moore's Law doesn't continue forever, and in
fact stops well short of the time frames required for 128-bit AES to
have problems.

The response to Wydarr is basically "Well, something else will come
along." Uncharitably, this is no more than wishful thinking. Lots of
things have been proposed, but none have proven practical. Yeah,
"yet." Impossible argument against faith.

My net: This is a solution in search of a problem.

Greg Pfister
http://perilsofparallel.blogspot.com/

Jason

unread,
Jun 26, 2009, 2:45:03 PM6/26/09
to Cloud Computing
Greg,

Hello, my name is Jason, I wrote the response post and most of the
replies to commenters. I would like to point out that I just posted a
second post which goes into much more detail regarding how our system
works. It is available here: http://dev.cleversafe.org/weblog/index.php?p=111


On Jun 25, 10:00 pm, Greg Pfister <greg.pfis...@gmail.com> wrote:
> I don't get it either. Check out Wydarr's response on the follow-up
> page; he points out that Moore's Law doesn't continue forever, and in
> fact stops well short of the time frames required for 128-bit AES to
> have problems.
>

In that post I explain that the we believe symmetric ciphers with key
lengths used today will be secure long into the future. In fact the
All-Or-Nothing Transform that we use is based on symmetric ciphers.
The concerns I have revolve around the security of asymmetric
algorithms which I explain in greater detail with this latest post.

> The response to Wydarr is basically "Well, something else will come
> along." Uncharitably, this is no more than wishful thinking. Lots of
> things have been proposed, but none have proven practical. Yeah,
> "yet." Impossible argument against faith.
>

Regarding Moore's law, and its life expectency, I highly recommend
reading this essay written by Ray Kurzweil:

http://www.kurzweilai.net/articles/art0134.html?printable=1 In
particular, the section starting with the title "Wherefrom Moore's
Law"

In it he explains Moore's law is not the first exponential rate of
increase that computing technology has seen, but is rather the 5th.
There have been 4 previous technologies which similarly met
limitations and were soon replaced by another paradigm and we are no
where near reaching the physical limits of computing power (though we
are approaching limits of what is possible with 2-d transistor based
circuits.

> My net: This is a solution in search of a problem.
>

The security benefits of dispersal is only one of the many benefits
dispersal provides. The most impressive feature is the reliability
and scalability that it enables. RAID 6 can only recover from two
simultaneous failures, and with unrecoverable read errors becoming
more likely ask disk sizes increase, often data loss will occur during
the rebuild from a simultaneous two disk failure. Dispersal can
provide any desired level of fault tolerance, for example a 24/16
configuration could tolerate 8 simultaneous failures making the system
extremely reliable even when storing many Petabytes of data.

Another benefit is the efficiency of this system. In total (for the
24/16) its overhead is only 50% beyond that of the original data; less
than would be stored keeping a single copy. Using copies to achieve a
fault tolerance of 8, one would have to create 9 copies, with an
overhead of 800%. This is what allows dispersal to be so reliable,
but these benefits will be lost if the key storage system is not
similarly reliable. By using the AONT on the data, it eliminates the
need for a separate key storage system, since the data itself is
stored with the same level of security as a key would be using a
traditional Secret Sharing Scheme.

Please see my latest post as it should answer some of your concerns.

Best Regards,

Jason Resch

> Greg Pfisterhttp://perilsofparallel.blogspot.com/

Jason

unread,
Jun 26, 2009, 2:57:23 PM6/26/09
to Cloud Computing
Sassa,

I think most of your questions are addressed with the new post. In
short:

The strength of our cryptography is dependent on the length of the
symmetric key used in the All or Nothing Transform.

The authentication system is pluggable, and in theory any
authentication system is possible. For example to avoid situations
where the compromise of a single machine could lead to someone getting
your data one could use different trusted Certificate Authorities for
each storage node, and have a cert for each one. In that way,
multiple CA private keys would need to be compromised at once. To
avoid public key cryptography altogether, one could use a different
pre-shared key for each storage node.

Throughput for us does not increase linearly with the number of
storage nodes, because unlike secret sharing scheme our shares, we
call "slices", are only a fraction of the size of the original data.
So a 24/16 configuration, where 24 slices are sent to 24 different
systems has an overhead of only 1.5x, not 24x as one would expect with
a secret sharing scheme. For details as to why this is so please see
the most recent post, titled "Response Part 2: Complexities of Key
Management" at http://dev.cleversafe.org/weblog/

This approach is geared for providing security for data at rest.
Traditional techniques must be used to secure the data in transit.
However data in transit security has the advantage that insecure keys
can be expired and new ones issued, with data encryption, one could of
course re-encrypt data, but the encryption doesn't stop someone from
making a copy and holding on to it unbeknownst to you, which would
render any re-encryption effort useless.

Thanks for your questions.

Jason

Jim Starkey

unread,
Jun 26, 2009, 5:13:19 PM6/26/09
to cloud-c...@googlegroups.com
Perhaps I've missed a subtlety, but wouldn't an easier, faster, and more
flexible approach be to encrypt the data with a single symmetric cypher
key and use your scheme to store key slices rather than data slices?
It does require that you trust your symmetric cypher, but frankly, your
arguments against AES aren't convincing.
--
Jim Starkey
President, NimbusDB, Inc.
978 526-1376

Jason

unread,
Jun 26, 2009, 5:47:55 PM6/26/09
to Cloud Computing
On Jun 26, 4:13 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
> Perhaps I've missed a subtlety, but wouldn't an easier, faster, and more
> flexible approach be to encrypt the data with a single symmetric cypher
> key and use your scheme to store key slices rather than data slices?  
> It does require that you trust your symmetric cypher, but frankly, your
> arguments against AES aren't convincing.
>
>

Jim,

Thanks for your response. The reason we disperse the data and not
just the key is that it provides both high availability and
confidentiality. If we only dispersed the key the data would not be
stored reliably. If we compensate by making copies there are more
possible attack vectors and opportunities to accidentally lose or
disclose that data. Please see the second post for more details
regarding the usual trade off between confidentiality and availability
and how we avoid it.

Which arguments against AES are you referring to? I have stated that
AES will offer strong security well into the future, what I believe to
be at risk are asymmetric ciphers which rest on unproven mathematical
assumptions and are both vulnerable to quantum computers. Therefore
even if we used a 15360-bit RSA key, one breakthrough in math could
render RSA useless overnight.

Jason

Jim Starkey

unread,
Jun 27, 2009, 7:33:11 AM6/27/09
to cloud-c...@googlegroups.com
Jason wrote:
> On Jun 26, 4:13 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
>
>> Perhaps I've missed a subtlety, but wouldn't an easier, faster, and more
>> flexible approach be to encrypt the data with a single symmetric cypher
>> key and use your scheme to store key slices rather than data slices?
>> It does require that you trust your symmetric cypher, but frankly, your
>> arguments against AES aren't convincing.
>>
>>
>>
>
> Jim,
>
> Thanks for your response. The reason we disperse the data and not
> just the key is that it provides both high availability and
> confidentiality. If we only dispersed the key the data would not be
> stored reliably. If we compensate by making copies there are more
> possible attack vectors and opportunities to accidentally lose or
> disclose that data. Please see the second post for more details
> regarding the usual trade off between confidentiality and availability
> and how we avoid it.
>
If the data is encrypted with a single key (managed as above), you could
have as many secure copies sprinkled around the world as you do now.
The data availaibility, then, is separated from the dispersed key
management. It also eliminates the bandwidth and most storage
requirements from the key management sites, letting you put the bulk of
the data where storage is cheap and bandwidth is plentiful, even if they
didn't meet rigorous security requirements.

Your only real argument for dispersed data, as opposed to dispersed
keys, is that some unspecified magic is going to render AES insecure.
Given a) the diminishingly low probability of that happening, b) the
additional data transmission costs of dispersing the data, and c) your
requirement that a storage site be secure, high bandwidth, and
acceptable storage costs, I don't think your scheme is going to appear
attractive to many.


> Which arguments against AES are you referring to? I have stated that
> AES will offer strong security well into the future, what I believe to
> be at risk are asymmetric ciphers which rest on unproven mathematical
> assumptions and are both vulnerable to quantum computers. Therefore
> even if we used a 15360-bit RSA key, one breakthrough in math could
> render RSA useless overnight.
>
>

Unlikely symmetric cyphers used to encrypt persistent data, public key
cyphers are used for secure key distribution. If someone invented an
algorithm or device that could quickly factor very large numbers, public
key systems could be changed in a blink of the eye. RSA is hardly the
only public key cryptosystem. There are other and more would emerge is
RSA were demonstrated vulnerable.

So the issue isn't whether a cryptosystem can be attacked, but the
consequences of a success attack. Terabytes of data redundantly at many
potentially insecure sites would be at risk if AES were compromised, but
the compromise of a key (or partial key) distribute cypher would be of
almost no consequence.

Jason

unread,
Jun 27, 2009, 8:44:02 PM6/27/09
to Cloud Computing


On Jun 27, 6:33 am, Jim Starkey <jstar...@NimbusDB.com> wrote:

> If the data is encrypted with a single key (managed as above), you could
> have as many secure copies sprinkled around the world as you do now.  
> The data availaibility, then, is separated from the dispersed key
> management.  It also eliminates the bandwidth and most storage
> requirements from the key management sites, letting you put the bulk of
> the data where storage is cheap and bandwidth is plentiful, even if they
> didn't meet rigorous security requirements.
>

But this is ignoring the efficiency aspect for dispersed storage.
Let's say you have to secure 10 TBs of data. If you sprinkled 5
encrypted copies around the world you would end up needing 50 TB of
storage. With dispersal each slice is only a fraction of the size of
the original data. Let's say we used a 10 of 14 dispersal, the total
storage requirements would be 14 TB compared with the 50 TB of making
copies, yet it is highly available because 4 such slices could be
completely destroyed and you can still access your data (just like in
the case of making 5 copies).

> Your only real argument for dispersed data, as opposed to dispersed
> keys, is that some unspecified magic is going to render AES insecure.

I never said anything about AES being made insecure, in fact the AONT
can use AES, it has to use some existing symmetric cipher.
 
>
> If someone invented an
> algorithm or device that could quickly factor very large numbers, public
> key systems could be changed in a blink of the eye.  RSA is hardly the
> only public key cryptosystem.  There are other and more would emerge is
> RSA were demonstrated vulnerable.

These are the following asymmetric encryption systems I am aware of:

RSA
Rabin
ElGamal
Elliptic Curve
GGH (Lattice Encryption)

The first two both depend on the hardness of factoring integers, the
next two both depend on the hardness of the discrete logarithm problem
and the first 4 are all vulnerable to quantum computers. GGH I know
little about but there are published papers claiming it is insecure
and leaks information about the plain text. There are very few well
supported asymmetric algorithms and if a device like a quantum
computer of sufficient size were built we would not have any good,
well analyized alternatives.

>
> So the issue isn't whether a cryptosystem can be attacked, but the
> consequences of a success attack.  Terabytes of data redundantly at many
> potentially insecure sites would be at risk if AES were compromised, but
> the compromise of a key (or partial key) distribute cypher would be of
> almost no consequence.

If one did choose to use an existing key management system to encrypt
data stored reliably in many copies at different sites, the storage
system as a whole (key management system + copies) can never be more
reliable than the key management system. If the building housing the
key management system flooded or burned down all those copies you made
at other sites would be useless. To avoid this problem you can either
make multiple copies of the key (insecure) or use a threshold based
secret sharing system to distribute shares to different locations. If
you go this far though, why not simply store the data itself using a
secret sharing scheme? This is what dispersal offers, an ability to
store your data in the s ame way the most secure systems store keys.

Jason

Greg Pfister

unread,
Jun 28, 2009, 3:38:41 PM6/28/09
to Cloud Computing
On Jun 26, 1:45 pm, Jason <jasonre...@gmail.com> wrote:
> Greg,
>
> Hello, my name is Jason, I wrote the response post and most of the
> replies to commenters.  I would like to point out that I just posted a
> second post which goes into much more detail regarding how our system
> works.  It is available here:http://dev.cleversafe.org/weblog/index.php?p=111

Hi, Jason.

> On Jun 25, 10:00 pm, Greg Pfister <greg.pfis...@gmail.com> wrote:
>
> > I don't get it either. Check out Wydarr's response on the follow-up
> > page; he points out that Moore's Law doesn't continue forever, and in
> > fact stops well short of the time frames required for 128-bit AES to
> > have problems.
>
> In that post I explain that the we believe symmetric ciphers with key
> lengths used today will be secure long into the future.  In fact the
> All-Or-Nothing Transform that we use is based on symmetric ciphers.
> The concerns I have revolve around the security of asymmetric
> algorithms which I explain in greater detail with this latest post.

OK, I'll check that out.

> > The response to Wydarr is basically "Well, something else will come
> > along." Uncharitably, this is no more than wishful thinking. Lots of
> > things have been proposed, but none have proven practical. Yeah,
> > "yet." Impossible argument against faith.
>
> Regarding Moore's law, and its life expectency, I highly recommend
> reading this essay written by Ray Kurzweil:
>
> http://www.kurzweilai.net/articles/art0134.html?printable=1In
> particular, the section starting with the title "Wherefrom Moore's
> Law"
>
> In it he explains Moore's law is not the first exponential rate of
> increase that computing technology has seen, but is rather the 5th.
> There have been 4 previous technologies which similarly met
> limitations and were soon replaced by another paradigm and we are no
> where near reaching the physical limits of computing power (though we
> are approaching limits of what is possible with 2-d transistor based
> circuits.

Uh-huh. Or
http://www.spectrum.ieee.org/static/singularity
for a more balanced view, including part by Kurzweil.

I side with the "this is nonsense" crowd, but they never get the press
because they're insufficiently flamboyant.

Greg Pfister
http://perilsofparallel.blogspot.com/

Jim Starkey

unread,
Jun 28, 2009, 5:25:54 PM6/28/09
to cloud-c...@googlegroups.com
Jason wrote:
>
> On Jun 27, 6:33 am, Jim Starkey <jstar...@NimbusDB.com> wrote:
>
>
>> If the data is encrypted with a single key (managed as above), you could
>> have as many secure copies sprinkled around the world as you do now.
>> The data availaibility, then, is separated from the dispersed key
>> management. It also eliminates the bandwidth and most storage
>> requirements from the key management sites, letting you put the bulk of
>> the data where storage is cheap and bandwidth is plentiful, even if they
>> didn't meet rigorous security requirements.
>>
>>
>
> But this is ignoring the efficiency aspect for dispersed storage.
> Let's say you have to secure 10 TBs of data. If you sprinkled 5
> encrypted copies around the world you would end up needing 50 TB of
> storage. With dispersal each slice is only a fraction of the size of
> the original data. Let's say we used a 10 of 14 dispersal, the total
> storage requirements would be 14 TB compared with the 50 TB of making
> copies, yet it is highly available because 4 such slices could be
> completely destroyed and you can still access your data (just like in
> the case of making 5 copies).
>
Your Reed-Solomon codes increase the size of each slice, so the total
bandwidth required to fetch data goes up. And, while dispersal requires
that all but one slice be remote for a usage site, simple encryption
with dispersed keys would be a much higher performance solution with all
of the benefits that you claim while reducing bandwidth and storage costs.

The data availability for your scheme can also be significantly reduced
by the number of copies to the quorum to reconstruct a full key -- no
point in maintain encrypted data with no hope for key retrieval. This
reduces the amount of storage even more.

I'm not going to quibble with your dispersed key. Every organization
has its own institution level of paranoia. Bus if you trust AES to
encrypt each slide, they you shoud trust AES to encrypt the whole data set.


>> If someone invented an
>> algorithm or device that could quickly factor very large numbers, public
>> key systems could be changed in a blink of the eye. RSA is hardly the
>> only public key cryptosystem. There are other and more would emerge is
>> RSA were demonstrated vulnerable.
>>
>
> These are the following asymmetric encryption systems I am aware of:
>
> RSA
> Rabin
> ElGamal
> Elliptic Curve
> GGH (Lattice Encryption)
>
> The first two both depend on the hardness of factoring integers, the
> next two both depend on the hardness of the discrete logarithm problem
> and the first 4 are all vulnerable to quantum computers. GGH I know
> little about but there are published papers claiming it is insecure
> and leaks information about the plain text. There are very few well
> supported asymmetric algorithms and if a device like a quantum
> computer of sufficient size were built we would not have any good,
> well analyized alternatives.
>

Any why would you expect more? RSA is generally accepted as adequate.
There is neither financial return nor academic glory to be had by
developing another. If RSA were comprised, however, the world would
beat a path to a better solution.

If you are willing to postulate that RSA could be weakened by an
unanticipated break through in mathematics, isn't it reasonable that it
would be met by a much less spectacular advance in encryption engineering?


>
>> So the issue isn't whether a cryptosystem can be attacked, but the
>> consequences of a success attack. Terabytes of data redundantly at many
>> potentially insecure sites would be at risk if AES were compromised, but
>> the compromise of a key (or partial key) distribute cypher would be of
>> almost no consequence.
>>
>
> If one did choose to use an existing key management system to encrypt
> data stored reliably in many copies at different sites, the storage
> system as a whole (key management system + copies) can never be more
> reliable than the key management system. If the building housing the
> key management system flooded or burned down all those copies you made
> at other sites would be useless. To avoid this problem you can either
> make multiple copies of the key (insecure) or use a threshold based
> secret sharing system to distribute shares to different locations. If
> you go this far though, why not simply store the data itself using a
> secret sharing scheme? This is what dispersal offers, an ability to
> store your data in the s ame way the most secure systems store keys.
>
>

Not at all. You've convinced us all that dispersed storage is a balance
between security and availability. I'm just arguing that your system is
sufficiently robust that using it for keys obviates the need to use it
for data.

Jason

unread,
Jun 29, 2009, 4:44:50 AM6/29/09
to Cloud Computing


On Jun 28, 4:25 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
> Jason wrote:
>
> > On Jun 27, 6:33 am, Jim Starkey <jstar...@NimbusDB.com> wrote:
>
> >> If the data is encrypted with a single key (managed as above), you could
> >> have as many secure copies sprinkled around the world as you do now.  
> >> The data availaibility, then, is separated from the dispersed key
> >> management.  It also eliminates the bandwidth and most storage
> >> requirements from the key management sites, letting you put the bulk of
> >> the data where storage is cheap and bandwidth is plentiful, even if they
> >> didn't meet rigorous security requirements.
>
> > But this is ignoring the efficiency aspect for dispersed storage.
> > Let's say you have to secure 10 TBs of data.  If you sprinkled 5
> > encrypted copies around the world you would end up needing 50 TB of
> > storage.  With dispersal each slice is only a fraction of the size of
> > the original data.  Let's say we used a 10 of 14 dispersal, the total
> > storage requirements would be 14 TB compared with the 50 TB of making
> > copies, yet it is highly available because 4 such slices could be
> > completely destroyed and you can still access your data (just like in
> > the case of making 5 copies).
>
> Your Reed-Solomon codes increase the size of each slice, so the total
> bandwidth required to fetch data goes up.  And, while dispersal requires
> that all but one slice be remote for a usage site, simple encryption
> with dispersed keys would be a much higher performance solution with all
> of the benefits that you claim while reducing bandwidth and storage costs.
>

The total amount of data needed to reconstruct the data is the same as
the size of the data itself. For example with a 10 of 14
configuration, only 10 slices are needed. The aggregate size of the
10 slices equals the size of the original data. There is additional
overhead when storing the data for the first time, but in this case it
is less than making a single backup.

Explain how you propose that the data be stored using simple
encryption. Is it a copy-based scheme? Copy based schemes do not
have the high reliability and availability of dispersal. The storage
costs may be higher than keeping a single encrypted copy, but they
will be less than the storage costs of making keeping a single backup.

> The data availability for your scheme can also be significantly reduced
> by the number of copies to the quorum to reconstruct a full key -- no
> point in maintain encrypted data with no hope for key retrieval.  This
> reduces the amount of storage even more.
>

I am not certain I understand this point, could you please explain it
more?

> I'm not going to quibble with your dispersed key.  Every organization
> has its own institution level of paranoia.  Bus if you trust AES to
> encrypt each slide, they you shoud trust AES to encrypt the whole data set.
>

Beyond the security of an algorithm is the security of the
organization and the people within that organization. No matter how
strong AES may be, if any one system containing the key gets
compromised then your data is at risk of exposure. With dispersal the
compromise of any one system does not cause data to be exposed. If an
organization set itself up such that each administrator only had
control over machines at one site, then neither incompetence or
maliciousness on the part of any one administrator could cause data
loss or data exposure.

> >> If someone invented an
> >> algorithm or device that could quickly factor very large numbers, public
> >> key systems could be changed in a blink of the eye.  RSA is hardly the
> >> only public key cryptosystem.  There are other and more would emerge is
> >> RSA were demonstrated vulnerable.
>
> > These are the following asymmetric encryption systems I am aware of:
>
> > RSA
> > Rabin
> > ElGamal
> > Elliptic Curve
> > GGH (Lattice Encryption)
>
> > The first two both depend on the hardness of factoring integers, the
> > next two both depend on the hardness of the discrete logarithm problem
> > and the first 4 are all vulnerable to quantum computers.  GGH I know
> > little about but there are published papers claiming it is insecure
> > and leaks information about the plain text.  There are very few well
> > supported asymmetric algorithms and if a device like a quantum
> > computer of sufficient size were built we would not have any good,
> > well analyized alternatives.
>
> Any why would you expect more?   RSA is generally accepted as adequate.  
> There is neither financial return nor academic glory to be had by
> developing another.  If RSA were comprised, however, the world would
> beat a path to a better solution.
>

On the contrary I think there would be much academic glory in
inventing an alternative cryptosystem. Asymmetric ciphers are an
entirely different animal from symmetric ciphers of which hundreds
exist and anyone could whip up in no time should the need arise.
Asymmetric ciphers require both a one-way function and a trap door, a
rather rare combination found in math which must be discovered. RSA
relies on so many different mathematical properties coming together in
just the right way that it is amazing such a thing exists at all. To
name a few: Efficient detection of probabilistic primes, the Extended
Euclidean Algorithm, Exponentiation by squaring, and Euler's Totient
function. Other such perfect combinations may exist but they seem to
be very rare and hard to find.

> If you are willing to postulate that RSA could be weakened by an
> unanticipated break through in mathematics, isn't it reasonable that it
> would be met by a much less spectacular advance in encryption engineering?
>

It is possible, but I cannot definitively say. Consider the question
of P = NP, the most famous unsolved problem in computer science
( http://en.wikipedia.org/wiki/P%3DNP ). It is a question of whether
or not there exist algorithms for efficiently solving problems which
are easy to verify. If such an algorithm exists for solving one of
those problems of the type NP-complete, it means that all such
problems have efficient solutions. A similar thing might be true for
asymmetric encryption which relies on trap doors (thing which are
difficult to solve without special knowledge). It may be the case
that there are efficient solutions for both RSA and Elliptic curve,
and a breakthrough in math that defeats one could lead to proofs that
show there really is no such thing as a trap door function (
http://en.wikipedia.org/wiki/Trap_door_function ) in math, and that
the existence of asymmetric cryptography has only been a temporary
gift based on our own ignorance of a solution.

>
>
> >> So the issue isn't whether a cryptosystem can be attacked, but the
> >> consequences of a success attack.  Terabytes of data redundantly at many
> >> potentially insecure sites would be at risk if AES were compromised, but
> >> the compromise of a key (or partial key) distribute cypher would be of
> >> almost no consequence.
>
> > If one did choose to use an existing key management system to encrypt
> > data stored reliably in many copies at different sites, the storage
> > system as a whole (key management system + copies) can never be more
> > reliable than the key management system.  If the building housing the
> > key management system flooded or burned down all those copies you made
> > at other sites would be useless.  To avoid this problem you can either
> > make multiple copies of the key (insecure) or use a threshold based
> > secret sharing system to distribute shares to different locations.  If
> > you go this far though, why not simply store the data itself using a
> > secret sharing scheme?  This is what dispersal offers, an ability to
> > store your data in the s ame way the most secure systems store keys.
>
> Not at all.  You've convinced us all that dispersed storage is a balance
> between security and availability.  I'm just arguing that your system is
> sufficiently robust that using it for keys obviates the need to use it
> for data.
>

Using it for data was the original intention as it provides a level of
reliability that RAID 6 cannot even come close to. Based on
reliability calculations I have done which factor in disk mean time to
failure rates and unrecoverable read error rates building large scale
(1 PB sized) storage systems based on RAID 6 can have between a 1 -
10% chance of data loss over a five year period. For every extra
tolerable failure, the MTTF of the system increases by about 1,000, so
a dispersed storage system capable of tolerating 4 failures would be
about a million times more reliable than a RAID 6 system which can
only tolerate 2 failures. This latest technique, of marrying
dispersed storage with an all-or-nothing transform is just the icing
on the cake for a new paradigm of data storage. It creates a storage
system which is not only extremely reliable and efficient, but also
extremely secure. I'd invite you and others on this list to point to
a method that provides a greater level of confidentiality than secret
sharing schemes.

Jason

Jason

unread,
Jun 29, 2009, 4:54:53 AM6/29/09
to Cloud Computing


On Jun 28, 2:38 pm, Greg Pfister <greg.pfis...@gmail.com> wrote:
>
> Uh-huh. Orhttp://www.spectrum.ieee.org/static/singularity
> for a more balanced view, including part by Kurzweil.
>
> I side with the "this is nonsense" crowd, but they never get the press
> because they're insufficiently flamboyant.
>

Well you do have a blog whose focus on the end of Moore's law, but
putting aside the rest of what he says regarding the singularity I
find what he says regarding exponential growth in computing power to
be very convincing. While we can plot the size of transistors and see
that they will soon be the size of atoms what makes you believe
transistors are the last advance in computing that man will invent?
Look at the processing power of your brain which rivals that of the
most powerful super computers ever built yet runs on the equivalent of
10 watts of electricity. If such efficient and powerful computers
exist in nature then we know such things are physically possible and
it is only a matter of time before we replicate and improve upon those
designs.

Ray Kurzweil agrees with your blog that Moore's law is reaching its
end, but only if one narrowly defines it as the number of transistors
that can be fit on a chip and not the amount of computing power that
can be purchased with a fixed amount of money. To quote his page:

"It's obvious what the sixth paradigm will be after Moore's Law runs
out of steam during the second decade of this century. Chips today are
flat (although it does require up to 20 layers of material to produce
one layer of circuitry). Our brain, in contrast, is organized in three
dimensions. We live in a three dimensional world, why not use the
third dimension? The human brain actually uses a very inefficient
electrochemical digital controlled analog computational process. The
bulk of the calculations are done in the interneuronal connections at
a speed of only about 200 calculations per second (in each
connection), which is about ten million times slower than contemporary
electronic circuits. But the brain gains its prodigious powers from
its extremely parallel organization in three dimensions. There are
many technologies in the wings that build circuitry in three
dimensions. Nanotubes, for example, which are already working in
laboratories, build circuits from pentagonal arrays of carbon atoms.
One cubic inch of nanotube circuitry would be a million times more
powerful than the human brain. There are more than enough new
computing technologies now being researched, including three-
dimensional silicon chips, optical computing, crystalline computing,
DNA computing, and quantum computing, to keep the law of accelerating
returns as applied to computation going for a long time."

Regards,

Jason

Jason

unread,
Jun 29, 2009, 4:58:41 AM6/29/09
to Cloud Computing


On Jun 28, 4:25 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
>
> Not at all.  You've convinced us all that dispersed storage is a balance
> between security and availability.  I'm just arguing that your system is
> sufficiently robust that using it for keys obviates the need to use it
> for data.
>

I hope to eventually convince you that dispersal obviates the need for
storing keys.

Jason

Sassa

unread,
Jun 29, 2009, 7:13:54 AM6/29/09
to Cloud Computing
On Jun 28, 1:44 am, Jason <jasonre...@gmail.com> wrote:
> On Jun 27, 6:33 am, Jim Starkey <jstar...@NimbusDB.com> wrote:
>
...> > If someone invented an
> > algorithm or device that could quickly factor very large numbers, public
> > key systems could be changed in a blink of the eye.  RSA is hardly the
> > only public key cryptosystem.  There are other and more would emerge is
> > RSA were demonstrated vulnerable.
>
> These are the following asymmetric encryption systems I am aware of:
>
> RSA
> Rabin
> ElGamal
> Elliptic Curve
> GGH (Lattice Encryption)
>
> The first two both depend on the hardness of factoring integers, the
> next two both depend on the hardness of the discrete logarithm problem
> and the first 4 are all vulnerable to quantum computers.  GGH I know

Can you quote research suggesting that quantum computers are a real
threat now? I think they managed to factor a small integer, but they
can't work with large integers, because the stable quantum computers
of the necessary size aren't available :-)


> little about but there are published papers claiming it is insecure
> and leaks information about the plain text.  There are very few well
> supported asymmetric algorithms and if a device like a quantum
> computer of sufficient size were built we would not have any good,
> well analyized alternatives.

Does Moore's law apply to quantum computers? :-)


Sassa

Sassa

unread,
Jun 29, 2009, 7:27:09 AM6/29/09
to Cloud Computing
On Jun 26, 7:57 pm, Jason <jasonre...@gmail.com> wrote:
> Sassa,
>
> I think most of your questions are addressed with the new post.  In
> short:
>
> The strength of our cryptography is dependent on the length of the
> symmetric key used in the All or Nothing Transform.
>
> The authentication system is pluggable, and in theory any
> authentication system is possible.  For example to avoid situations
> where the compromise of a single machine could lead to someone getting
> your data one could use different trusted Certificate Authorities for
> each storage node, and have a cert for each one.  In that way,
> multiple CA private keys would need to be compromised at once.  To

24 CAs isn't a big overhead for a government that is able to crack one
private key.


> avoid public key cryptography altogether, one could use a different
> pre-shared key for each storage node.

That's another story, but what's the point in doing tricky data
splitting, if you have different pre-shared key for each storage node.
The security of any data split will depend on these pre-shared keys,
and since recovery of data is trivial if you can get K pre-shared
keys, the strength of the data split is bounded from above by the
strength of the pre-shared keys / authentication system.


> Throughput for us does not increase linearly with the number of
> storage nodes, because unlike secret sharing scheme our shares, we
> call "slices", are only a fraction of the size of the original data.
> So a 24/16 configuration, where 24 slices are sent to 24 different
> systems has an overhead of only 1.5x, not 24x as one would expect with
> a secret sharing scheme.  For details as to why this is so please see
> the most recent post, titled "Response Part 2: Complexities of Key
> Management" athttp://dev.cleversafe.org/weblog/
>
> This approach is geared for providing security for data at rest.
> Traditional techniques must be used to secure the data in transit.
> However data in transit security has the advantage that insecure keys
> can be expired and new ones issued, with data encryption, one could of
> course re-encrypt data, but the encryption doesn't stop someone from
> making a copy and holding on to it unbeknownst to you, which would
> render any re-encryption effort useless.

Yes. Since you don't propose to re-encrypt the data in your data
splitting scheme, this means I have the window of opportunity the size
of the whole data lifetime to break K of the systems.


Sassa

Jason

unread,
Jun 29, 2009, 10:50:48 AM6/29/09
to Cloud Computing


On Jun 29, 6:27 am, Sassa <sassa...@gmail.com> wrote:
> On Jun 26, 7:57 pm, Jason <jasonre...@gmail.com> wrote:
>
> > Sassa,
>
> > I think most of your questions are addressed with the new post.  In
> > short:
>
> > The strength of our cryptography is dependent on the length of the
> > symmetric key used in the All or Nothing Transform.
>
> > The authentication system is pluggable, and in theory any
> > authentication system is possible.  For example to avoid situations
> > where the compromise of a single machine could lead to someone getting
> > your data one could use different trusted Certificate Authorities for
> > each storage node, and have a cert for each one.  In that way,
> > multiple CA private keys would need to be compromised at once.  To
>
> 24 CAs isn't a big overhead for a government that is able to crack one
> private key.
>

Yes I agree. However authentication systems may be more easily
updated than re-encrypting what is potentially a vast amount of data.
Authentication using keys almost always come with digital certificates
with expiration times. Of course depending on how the government
might be able to crack a key, it could be the case that no length of
key would be sufficient.

> > avoid public key cryptography altogether, one could use a different
> > pre-shared key for each storage node.
>
> That's another story, but what's the point in doing tricky data
> splitting, if you have different pre-shared key for each storage node.

In general I think it is more difficult to compromise multiple systems
than a single one.

> The security of any data split will depend on these pre-shared keys,
> and since recovery of data is trivial if you can get K pre-shared
> keys, the strength of the data split is bounded from above by the
> strength of the pre-shared keys / authentication system.
>

True, but I don't see how the system could be any more secure than
that. Cracking a pre-shared key used for authentication is different
from cracking an encryption key because it is not possible to do an
offline attack, meaning someone will notice these millions of failed
login attempts at one of the machines sooner or later. Dispersal is
not a solution for authentication, it is a solution for data-at-rest
security.

>
>
> > Throughput for us does not increase linearly with the number of
> > storage nodes, because unlike secret sharing scheme our shares, we
> > call "slices", are only a fraction of the size of the original data.
> > So a 24/16 configuration, where 24 slices are sent to 24 different
> > systems has an overhead of only 1.5x, not 24x as one would expect with
> > a secret sharing scheme.  For details as to why this is so please see
> > the most recent post, titled "Response Part 2: Complexities of Key
> > Management" athttp://dev.cleversafe.org/weblog/
>
> > This approach is geared for providing security for data at rest.
> > Traditional techniques must be used to secure the data in transit.
> > However data in transit security has the advantage that insecure keys
> > can be expired and new ones issued, with data encryption, one could of
> > course re-encrypt data, but the encryption doesn't stop someone from
> > making a copy and holding on to it unbeknownst to you, which would
> > render any re-encryption effort useless.
>
> Yes. Since you don't propose to re-encrypt the data in your data
> splitting scheme, this means I have the window of opportunity the size
> of the whole data lifetime to break K of the systems.
>

Correct. However if any one of these compromises is noticed, you
could overwrite the existing dispersed data (the AONT is random and
will generate a new key each time it is used, even for the same data)
and therefore make all the existing slices an attacker may have
useless. So the user also has a window of opportunity too, to defend
the data against an attacker reaching the threshold.

Thanks for your comments.

Jason

Jason

unread,
Jun 29, 2009, 12:04:18 PM6/29/09
to Cloud Computing

On Jun 29, 6:13 am, Sassa <sassa...@gmail.com> wrote:
> Can you quote research suggesting that quantum computers are a real
> threat now? I think they managed to factor a small integer, but they
> can't work with large integers, because the stable quantum computers
> of the necessary size aren't available :-)
>

I don't think they are a threat now, but they almost certainly will be
in the future. The difficulty in scaling them has been in keeping the
qubits isolated from the environment for a long enough time to do the
computation, but if this problem is solved generally then I don't
think it would be that much more difficult to make a 2048 bit quantum
computer once the ability exists to create a 1024 bit one.

> > little about but there are published papers claiming it is insecure
> > and leaks information about the plain text.  There are very few well
> > supported asymmetric algorithms and if a device like a quantum
> > computer of sufficient size were built we would not have any good,
> > well analyized alternatives.
>
> Does Moore's law apply to quantum computers? :-)
>

That's a good question, I don't know.

Jason

Jeanne Morain

unread,
Jun 29, 2009, 2:13:56 PM6/29/09
to cloud-c...@googlegroups.com


http://www.healthleadersmedia.com/content/235106/page/2/topic/WS_HLM2_FIN/HIPAA-5010-Requires-IT-to-Do-More-with-Fewer-Resources.html

I found this article interesting - specifically it talks about a few key things for IT implementors to think about with the new standards for HIPAA - specifically - the date for compliance has been moved in a year to 2012 - although technically they are suppose to have until 2013.

What would be interesting is if anyone on this group can provide insight on what the impact of these new standards are on their cloud implementation plan for Healthcare.

There are some specifics around increasing field length, etc and having a "single registry". 

Thoughts?



Sassa

unread,
Jun 29, 2009, 2:38:45 PM6/29/09
to Cloud Computing
If the systems are non-uniform, then yes, it is more difficult.


Say, your company manages 100K machines. Each at a different patch
level and completely different OS?


> > The security of any data split will depend on these pre-shared keys,
> > and since recovery of data is trivial if you can get K pre-shared
> > keys, the strength of the data split is bounded from above by the
> > strength of the pre-shared keys / authentication system.
>
> True, but I don't see how the system could be any more secure than
> that.  Cracking a pre-shared key used for authentication is different
> from cracking an encryption key because it is not possible to do an
> offline attack, meaning someone will notice these millions of failed
> login attempts at one of the machines sooner or later.  Dispersal is
> not a solution for authentication, it is a solution for data-at-rest
> security.

Honestly, I think I understand the point you are making, and the split
does make a lot of sense for data availability, but I cannot see how
it makes stuff more secure.

However, I am still uncomfortable trading an "integer factorization"
hardness for "K times one system hack" hardness.



Sassa

Jason

unread,
Jun 29, 2009, 2:41:18 PM6/29/09
to Cloud Computing
Interestingly this news article was just posted today, researchers
have created a solid state quantum computing chip, which is able to
maintain a stable state for a microsecond (previous attempts only
lasted for a nanosecond).

http://www.sciencedaily.com/releases/2009/06/090628171949.htm

Jason

Jason

unread,
Jun 29, 2009, 4:26:43 PM6/29/09
to Cloud Computing
On Jun 29, 1:38 pm, Sassa <sassa...@gmail.com> wrote:
>
> Honestly, I think I understand the point you are making, and the split
> does make a lot of sense for data availability, but I cannot see how
> it makes stuff more secure.
>

What protects your private key? Unless you have it memorized it too
depends on the security of the media on which it resides, and perhaps
also a password. Since that key is critical for your ability to
recover your data you must take precautions for storing it reliably.
The best known technique for reliable and secure storage of keys is a
k of n threshold scheme. If this k of n secret sharing scheme is good
enough to protect the security of your keys, and people rely on such
systems today, then why do you think it provides inadequate security
for your data?

> However, I am still uncomfortable trading an "integer factorization"
> hardness for "K times one system hack" hardness.
>

I can understand that, but nothing forces you to trade one for the
other. You may encrypt your data using traditional encryption
techniques and then disperse it, gaining security advantages from both
approaches. Dispersal doesn't displace encryption, but is a new
alternative with additional security, reliability and efficiency
benefits. However, if you think about how your keys are protected in
many existing encryption systems you may see the redundancy of having
keys when your data is dispersed.

If you don't mind my asking, what encryption tools or systems do you
use to protect your data? I personally use or have used EFS, dm-crypt/
luks and Truecrypt and all rely on passwords to protect the keys.
Passwords do not provide good availability as they are easily
forgotten and they do not provide good security as they are easily
brute forced (short of using very long random passwords). The longer
they are the easier they are to forget, so again we have the
reliability/confidentiality trade off. If you worry about forgetting
it and write it down you improve reliability but again harm
confidentiality.

Now imagine instead you do this: Use the Shamir scheme to write down
shares of your password on post-it notes, put one in a safe deposit
box, keep one in your computer desk, and give another to your friend
who lives in another state. Any two of these three pieces may be used
to recompute your password should you forget it. The security
advantages of this approach should be clear. An attacker would have
to not only get the note in your house, but also either break into a
bank or make a long distance journey to track down your friend and get
him to give up his share. The practicality of this is harder than
breaking into your house and stealing a laptop or CD-ROM containing
your private key.

Dispersal won't help you encrypt your computer's hard drive or thumb
drive, but it is a nice fit for companies which rely on multiple off-
site backups to protect their important data.

Thanks,

Jason

Sassa

unread,
Jun 29, 2009, 6:08:00 PM6/29/09
to Cloud Computing
On Jun 29, 9:26 pm, Jason <jasonre...@gmail.com> wrote:
> On Jun 29, 1:38 pm, Sassa <sassa...@gmail.com> wrote:
>
> > Honestly, I think I understand the point you are making, and the split
> > does make a lot of sense for data availability, but I cannot see how
> > it makes stuff more secure.
>
> What protects your private key?  Unless you have it memorized it too
> depends on the security of the media on which it resides, and perhaps
> also a password.  Since that key is critical for your ability to
> recover your data you must take precautions for storing it reliably.
> The best known technique for reliable and secure storage of keys is a
> k of n threshold scheme.  If this k of n secret sharing scheme is good
> enough to protect the security of your keys, and people rely on such
> systems today, then why do you think it provides inadequate security
> for your data?

I don't think splitting data your way is inadequate. I think the claim
that this makes data MORE secure needs proving.


Besides,

a) encrypting really important stuff with k of n key shares wouldn't
necessarily mean keeping the key shares on the disk permanently
mounted to the PC;

b) splitting keys into shares is done for purposes other than high
availability of the decryption function - in fact, quite the opposite
is the intention
This is true, but not the entire picture. You don't travel to your
friend on the other shore to get a share of your password daily. Also,
you mention the locations storing a key share are physically
different. This isn't the same as storing a post-it note in 14
armoured vehicles of the same make and similar trim, with known
numberplates.

So, to complete the key sharing picture, you must say that it is CFO
and CEO that use their share of the private key to counter-sign a deal
with another company.

Or a board of directors can decide and digitally sign a directive any
time when they have a quorum.



> Dispersal won't help you encrypt your computer's hard drive or thumb
> drive, but it is a nice fit for companies which rely on multiple off-
> site backups to protect their important data.

Yes, it is another way to protect your data. Not sure about "more
secure".


Sassa


> Thanks,
>
> Jason

Daniel Drozdzewski

unread,
Jun 30, 2009, 2:26:59 AM6/30/09
to cloud-c...@googlegroups.com
Jason and the rest,

Very interesting thread! Keep fighting them back Jason - security
publicly scrutinized (and defended) is of a greater value than one
taken at its face value. You are doing good job so far! Well done.

I have to say I want to know more about it now - will have to go and
research all the blogs and papers, but this summer is really busy for
me :/

Anyway, from what I have seen here, I agree with Jason on many points
including security and reliability.

I also think that it's worth mentioning in the whole argument, that
the law af accellerated-returns applies also to storage, addmittedly
with not as great acceleration as it does in processing power. Both in
turns however drive the amount of data produced and data in the need
of storing.

Now, it is obvious that scaling of the systems can be maintained only,
if the systems scale at slower rates than the size of the problems
they face. In other words the fact that Jason's system scales
sub-linearly (rate depends on the required reliability, again good
selling point, as it can be adjusted) is a great selling point to me.

In 5 years time, mobile phones will have PB storage attached.

And here I would just like to ask some questions(forgive if soundiing ignorant):
Jason, with your solution, would it be possible to have some nodes in
the scheme constantly re-encrypting the slices they store, without
compromising too much of the whole system responsiveness? If so, why
not have few nodes per whole scheme churning in the background,
constantly re-encrypting the data stored, or in the intervals that
could be adjusted.

Would such solution increase the security of the whole system?
Also would it be true that the shorter the interval, the more secure data?

I know that I could possibly find answers to those doing some
homework, but since you are following this thread and you have
implemented this, it will be a piece of cake for you.


Thanks a lot!
--
Daniel Drozdzewski

Jason

unread,
Jun 30, 2009, 3:58:36 PM6/30/09
to Cloud Computing


On Jun 29, 5:08 pm, Sassa <sassa...@gmail.com> wrote:
> On Jun 29, 9:26 pm, Jason <jasonre...@gmail.com> wrote:
>
> > On Jun 29, 1:38 pm, Sassa <sassa...@gmail.com> wrote:
>
> > > Honestly, I think I understand the point you are making, and the split
> > > does make a lot of sense for data availability, but I cannot see how
> > > it makes stuff more secure.
>
> > What protects your private key?  Unless you have it memorized it too
> > depends on the security of the media on which it resides, and perhaps
> > also a password.  Since that key is critical for your ability to
> > recover your data you must take precautions for storing it reliably.
> > The best known technique for reliable and secure storage of keys is a
> > k of n threshold scheme.  If this k of n secret sharing scheme is good
> > enough to protect the security of your keys, and people rely on such
> > systems today, then why do you think it provides inadequate security
> > for your data?
>
> I don't think splitting data your way is inadequate. I think the claim
> that this makes data MORE secure needs proving.
>

You agree that dispersal provides security benefits both in terms of
availability and confidentiality. Your question now is could
dispersal be used to create a more secure data storage system than any
existing system. I believe the answer is yes. Here is why:

The term security when applied to information often refers to the
principles of CIA confidentiality, integrity and availability (thus
far we've only covered the first and last in these discussions but
there are also integrity advantages of dispersal as well which I can
go into in more detail in the future). The reason that dispersal can
create more secure systems than anything that exists today is that it
is compatible with existing encryption techniques. Therefore one may
encrypt their data using traditional encryption techniques and then
store that encrypted data on an information dispersal network. Thus
one gains confidentiality benefits of both existing encryption systems
and dispersal.

In the sense of confidentiality, this combination would be more secure
than dispersal alone, however in the sense of availability, it would
be less secure than dispersal. Again we have a compromise between
availability and confidentiality, the question of which way is more
"secure" depends on which aspect is hurt more by the combination and
which aspect is more important to you. I think dispersal, due to the
fact that it can provide the best of both worlds of high availability
and confidentiality is a good balance for most people. If you are a
type who leans more toward confidentiality and care less so about
availability, then perhaps a combination of both would be best.


> Besides,
>
> a) encrypting really important stuff with k of n key shares wouldn't
> necessarily mean keeping the key shares on the disk permanently
> mounted to the PC;
>

You are correct, putting the slices on computers and putting them off-
line would be more confidential but also less available. However if
you need ready access to your data at any time it isn't possible to
keep your key-shares, or your dispersed data on off-line servers.
Therefore, while the option exists to put servers offline after
storing data to them, I don't think it is something most people would
do.

> b) splitting keys into shares is done for purposes other than high
> availability of the decryption function - in fact, quite the opposite
> is the intention
>

Often secret sharing schemes are employed to create secure backups of
keys and those shares are not kept online, for the reasons you cite.
However in these backup scenarios the complete key will often exist in
a single physical location and often stored durably (on a hard drive,
in a key card, etc.) so that it can be used conveniently (more
convenient in the sense that one doesn't have to go and gather off-
line copies to use the key). With dispersal no single copy of a key
is stored durably in any location, it only exists ephemerally while
the data is reassembled from the slices. In this sense it provides a
greater degree of confidentiality than a system where there is a key
in one online location of which off-line shares are maintained as a
backup. Certainly keeping the storage servers off-line will make the
data they store more confidential, but it also makes it less available
(and has less integrity since data lost to bit-rot cannot be rebuilt),
so the question of which is more secure again depends on how you
weight the aspects of CIA.
A better analogy to the use-case of dispersal is being able to call up
your friend on the phone, where he authenticates you by your voice and
he can give you the share. For the authenticated user this is easy to
do, but hard for the attacker as he has to either defeat the
authentication system or physically compromise a threshold number of
your friends' houses. You have yet to provided your proposal for how
you would securely store your encryption keys. I believe in thinking
about the issues around secure key storage you will see that the
protection offered by encryption can be no better than how well
protected the keys are. Thus if one could store there data in a way
that is just as secure as the way they store their keys, it would
provide the same level of security even if the data itself were not
encrypted. Do you agree with this point?

Jason

unread,
Jun 30, 2009, 4:18:24 PM6/30/09
to Cloud Computing


On Jun 30, 1:26 am, Daniel Drozdzewski <daniel.drozdzew...@gmail.com>
wrote:
> Jason and the rest,
>
> Very interesting thread! Keep fighting them back Jason - security
> publicly scrutinized (and defended) is of a greater value than one
> taken at its face value. You are doing good job so far! Well done.
>

Thank you Daniel, I will keep trying my best.

> I have to say I want to know more about it now - will have to go and
> research all the blogs and papers, but this summer is really busy for
> me :/
>

We will be publishing some white papers in the near future, as well as
some papers to submit to the File and Storage Technologies 2010
conference. They should summarize many of the benefits concisely. If
you would like, I can let you know when they are made available.

> Anyway, from what I have seen here, I agree with Jason on many points
> including security and reliability.
>
> I also think that it's worth mentioning in the whole argument, that
> the law af accellerated-returns applies also to storage, addmittedly
> with not as great acceleration as it does in processing power. Both in
> turns however drive the amount of data produced and data in the need
> of storing.
>

This is something we are intimately aware of. Consider that a single
RAID array will only have a fixed total amount of usable storage, and
a fixed reliability. Let us say we use RAID 6 using 10 disks in the
array. Since 2 are used for parity we have 8 TB usable, but what if
we are tasked with store 4 PB of data? That works out to 4096/8 = 512
separate storage arrays. If the mean time to data loss for a single
RAID 6 array was 1,000 years the mean time to data loss for all 512
arrays will be 1,000 / 512 ~= 2 years. Dispersal allows the
reliability of the storage system to be adjusted arbitrarily high
without a decrease in storage efficiency, only at the cost of
additional processing power. Therefore so long as processing power
can scale to keep up with growing storage requirements dispersal
provides a viable solution. For example the overhead of a 10 of 14
storage set will be the same as that of a 20 of 28, but the 20 of 28
can handle 8 simultaneous failures while the 10 of 14 can only handle
4. Every additional tolerable failure allows the total storage system
become a few hundred to a few thousand times larger while maintaining
the same reliability.

> Now, it is obvious that scaling of the systems can be maintained only,
> if the systems scale at slower rates than the size of the problems
> they face. In other words the fact that Jason's system scales
> sub-linearly (rate depends on the required reliability, again good
> selling point, as it can be adjusted) is a great selling point to me.
>

I am glad you find it useful. It seems you have some experience with
the problems of scaling large storage systems.
That is a very interesting idea, you could force any adversary to work
under some very tight time constraints. The degree to which system
responsiveness would be effected would be dependent on how much total
data one has and how frequently they would want it to be re-encoded,
but it sounds like a viable feature for those who are very concerned
about confidentiality. Let's say we re-encoded all the data every 12
hours, then an attacker would have to gain access to a threshold
number of slices within that time period, otherwise the re-written
data would be composed of slices which are useless to de-coding the
ones the attacker previously acquired.

> Would such solution increase the security of the whole system?
> Also would it be true that the shorter the interval, the more secure data?
>

Yes it seems that way. If one was slowly compromising slices in an
unnoticed way this would prevent them from reaching a threshold,
however if they were compromising the machines remotely and the
attacker maintained the ability to access these machines then this re-
encoding may be futile in stopping them.

> I know that I could possibly find answers to those doing some
> homework, but since you are following this thread and you have
> implemented this, it will be a piece of cake for you.
>
> Thanks a lot!
> --

You're welcome.

Jason

Sassa

unread,
Jun 30, 2009, 4:34:41 PM6/30/09
to Cloud Computing
On Jun 30, 7:26 am, Daniel Drozdzewski <daniel.drozdzew...@gmail.com>
wrote:
...
> Jason, with your solution, would it be possible to have some nodes in
> the scheme constantly re-encrypting the slices they store, without
> compromising too much of the whole system responsiveness? If so, why
> not have few nodes per whole scheme churning in the background,
> constantly re-encrypting the data stored, or in the intervals that
> could be adjusted.
>
> Would such solution increase the security of the whole system?
> Also would it be true that the shorter the interval, the more secure data?
>
> I know that I could possibly find answers to those doing some
> homework, but since you are following this thread and you have
> implemented this, it will be a piece of cake for you.

Re-encryption would mean re-encryption of the whole lot - individual
pieces cannot be re-encrypted, because the decryption key is obtained
by combining all the pieces - you can't use one piece encrypted with
the old key, and the other piece encrypted with the new key.


Sassa

Jason

unread,
Jun 30, 2009, 8:55:51 PM6/30/09
to Cloud Computing
Sassa is right that the nodes storing the slices cannot re-encrypt
their slices, it would need to be done by a node/user that has access
to all the slices. When I first read your question I interpreted the
"some nodes in the scheme constantly re-encrypting the slices they
store" as the nodes who store data to the dispersed storage network as
opposed to the nodes which store slices. For the nodes that slice the
data it is possible, but it is not possible for the nodes that store
the slices.

Nice catch Sassa, you seem to have quickly formed a clear
understanding of the system.

Jason

Daniel Drozdzewski

unread,
Jun 30, 2009, 9:51:35 PM6/30/09
to cloud-c...@googlegroups.com
Jason,

Thanks a lot for the answers!
Please let the group know as soon as you have the white papers or the
'proper' academic papers out.


Daniel
--
Daniel Drozdzewski

Jim Starkey

unread,
Jun 30, 2009, 10:17:53 PM6/30/09
to cloud-c...@googlegroups.com
Jason wrote:
>
>
> You agree that dispersal provides security benefits both in terms of
> availability and confidentiality. Your question now is could
> dispersal be used to create a more secure data storage system than any
> existing system. I believe the answer is yes. Here is why:
>
>

Jason, critique the following:

0. Presume Jason's scheme is everything he claims it is.
1. Take a dataset and encypt it with a random key
2. Disperse the key at n sites as per #0
3. Store the encrypted datasets anywhere convenient, including
Time's Square

This has the same level of security as Jason's scheme, reduced storage,
great availability, lower bandwidth, and better locality than's Jan
dispersed data scheme.

If you believe in 256 bit AES, key management is the issue. If you
don't believe in AES, dispersed storage doesn't work.

Jason

unread,
Jul 1, 2009, 1:12:50 AM7/1/09
to Cloud Computing


On Jun 30, 9:17 pm, Jim Starkey <jstar...@NimbusDB.com> wrote:
> Jason wrote:
>
> > You agree that dispersal provides security benefits both in terms of
> > availability and confidentiality.  Your question now is could
> > dispersal be used to create a more secure data storage system than any
> > existing system.  I believe the answer is yes.  Here is why:
>
> Jason, critique the following:
>
>     0.  Presume Jason's scheme is everything he claims it is.
>     1.  Take a dataset and encypt it with a random key
>     2.  Disperse the key at n sites as per #0
>     3.  Store the encrypted datasets anywhere convenient, including
>     Time's Square
>

If I understand your proposal correctly, you are suggesting only the
key be dispersed and not the encrypted data?

> This has the same level of security as Jason's scheme, reduced storage,
> great availability, lower bandwidth, and better locality than's Jan
> dispersed data scheme.

How many copies of the encrypted data will you be storing? Let's see
what happens given a range of numbers:

One instance of the data: If we only store a single instance of the
encrypted data, it will not be very available. The corruption of one
bit, or failure of one hard drive will cause data loss. The only
benefit of this is that it has less storage cost and bandwidth than
dispersal. While having good locality we lose the ability to recover
from a natural disaster, power outage, fire, or any other catastrophe
effecting the site where that copy resides. Again this is bad for
availability and reliability, and hence not secure as security
encompasses both availability and confidentiality.

One instance of the data + 1 copy: If we maintain two copies of the
encrypted data we have better availability than storing a single copy,
but still much less availability than a typical dispersal
configuration. If there are two failures within the time it takes to
recover from the first you will lose data. Dispersal on the other
hand could tolerate the simultaneous loss of X number of sites/
machines/drives. It is not vulnerable to a disaster affecting only
one site. Let us also look at the storage and bandwidth costs, they
will be roughly twice that of storing a single copy. Dispersal on the
other hand, if we take a 10 of 14 example, only has 40% more storage
requirements beyond that of a single instance of the data. With just
two copies, dispersal already wins out in terms of storage efficiency
and bandwidth.

More than 2 copies: Exponentially greater availability at linearly
increasing storage and bandwidth costs. With dispersal we can have
exponentially greater availability with constant storage and bandwidth
costs and linearly increasing CPU requirements.

From the above examples we notice two important things: 1. Storing a
copy will always result in bad availability and reliability and 2.
Storing multiple copies will almost always be less efficient than
using dispersal (of course one could configure a dispersed storage
network to be inefficient, but there is no need to). If one doesn't
wish to pay for bandwidth, they could keep all the machines at the
same site (in either the copy-based or dispersed systems), but you
lose the primary benefits of dispersal, which are immunity from losing
data due to a site failure and immunity from data exposure due to the
compromise or theft of resources from one site. Therefore keeping all
the data in one physical location, whether encrypted or dispersed, is
bad for both confidentiality and availability reasons. That is why we
disperse not just the key, but the data itself.

For a primer on the benefits of dispersal (not focused on the security
aspects) please see these sites:

http://www.cleversafe.org/dispersed-storage
http://www.cleversafe.org/dispersed-storage/idas
http://www.cleversafe.org/dispersed-storage/benefits

Please let me know if you have any more questions related to dispersal
and why we use it for storing the data.

>
> If you believe in 256 bit AES, key management is the issue.  If you
> don't believe in AES, dispersed storage doesn't work.

Right, if AES-256 were not secure, than the AONT would not be secure
either. I have stated numerous times in this thread that my issue is
not with the security of AES, but rather how one secures those keys
AES uses. Existing systems use either passwords, asymmetric
encryption keys, or secret sharing schemes. Of these I believe the
best option is a secret sharing scheme, as it is the only type method
which does not have to trade confidentiality for availability.

For secret sharing schemes to be secure they must store shares in
different locations and in a protected way, but this is not why we
disperse data, we originally set out to disperse data for its extreme
reliability, availability and efficiency. Only later did we invent an
efficient secret sharing scheme based on dispersal which could allow
data to be stored in the same way shares are stored in a secret
sharing scheme. This is accomplished simply by applying an AONT
before dispersing the data, it was an easy change to make which did
not detrimentally affect the previous properties of dispersal, but
from it we gained much.

We realized how much it could simplify issues of existing key
management systems since all key-based systems do is create two
physical locations both of which one must have access to. Dispersal
goes beyond this and makes it such that multiple slices (think of them
as keys) are needed, and typically the number is greater than 2. I
can sympathize with your hesitance in believing all these benefits are
simulaneously possible in a single system, but I invite you to look
into those links and see if it clarifies why we do what we do. I will
also gladly answer any additional questions you may have.

Best Regards,

Jason

Sassa

unread,
Jul 1, 2009, 6:50:46 AM7/1/09
to Cloud Computing
On Jun 30, 8:58 pm, Jason <jasonre...@gmail.com> wrote:
> On Jun 29, 5:08 pm, Sassa <sassa...@gmail.com> wrote:
...
> > I don't think splitting data your way is inadequate. I think the claim
> > that this makes data MORE secure needs proving.
>
> You agree that dispersal provides security benefits both in terms of
> availability and confidentiality.  Your question now is could

Benefits in availability at lower storage cost - yes; "benefits in
confidentiality" means "MORE secure" - so, no, you need to formally
prove it.

* Security of m<K-1 of N pieces is guaranteed by AES - the dispersal
scheme doesn't ADD any benefit here.

* Your scheme gets rid of key management in its traditional form at
the cost of making m=K of N pieces of data insecure. This has its
applications, but not for everyone who fears their 4096-bit RSA key
will be compromised within 40 years.


> dispersal be used to create a more secure data storage system than any
> existing system.  I believe the answer is yes.  Here is why:
>
> The term security when applied to information often refers to the
> principles of CIA confidentiality, integrity and availability (thus
> far we've only covered the first and last in these discussions but
> there are also integrity advantages of dispersal as well which I can
> go into in more detail in the future).  The reason that dispersal can
> create more secure systems than anything that exists today is that it
> is compatible with existing encryption techniques.  Therefore one may

That's a prerequisite but not a proof that it is more secure.


> encrypt their data using traditional encryption techniques and then
> store that encrypted data on an information dispersal network.  Thus
> one gains confidentiality benefits of both existing encryption systems
> and dispersal.
>
> In the sense of confidentiality, this combination would be more secure
> than dispersal alone, however in the sense of availability, it would
> be less secure than dispersal.  Again we have a compromise between
> availability and confidentiality, the question of which way is more
> "secure" depends on which aspect is hurt more by the combination and
> which aspect is more important to you.  I think dispersal, due to the
> fact that it can provide the best of both worlds of high availability
> and confidentiality is a good balance for most people.  If you are a
> type who leans more toward confidentiality and care less so about
> availability, then perhaps a combination of both would be best.

This isn't proof and not a fact that dispersal provides the BEST of
both worlds.


> > Besides,
>
> > a) encrypting really important stuff with k of n key shares wouldn't
> > necessarily mean keeping the key shares on the disk permanently
> > mounted to the PC;
>
> You are correct, putting the slices on computers and putting them off-
> line would be more confidential but also less available.  However if
> you need ready access to your data at any time it isn't possible to
> keep your key-shares, or your dispersed data on off-line servers.
> Therefore, while the option exists to put servers offline after
> storing data to them, I don't think it is something most people would
> do.

Exactly my point. The analogy that you introduced between threshold
cryptography and dispersal scheme is misleading.



> > b) splitting keys into shares is done for purposes other than high
> > availability of the decryption function - in fact, quite the opposite
> > is the intention
>
> Often secret sharing schemes are employed to create secure backups of
> keys and those shares are not kept online, for the reasons you cite.

Secret sharing - yes. Each share is kept as securely as the key should
- easy to do and guarantee for a trifling of data that key material
is. In your scheme the requirement applies to the whole data, since
the key is part of the data now.


> However in these backup scenarios the complete key will often exist in
> a single physical location and often stored durably (on a hard drive,
> in a key card, etc.) so that it can be used conveniently (more
> convenient in the sense that one doesn't have to go and gather off-
> line copies to use the key).  With dispersal no single copy of a key
> is stored durably in any location, it only exists ephemerally while
> the data is reassembled from the slices.

But you can't guarantee the same level of confidentiality to all data
in all locations. Just as you say, I can go ahead and store key shares
in vaults with 10 inch thick walls. I have no problem having the
hassle of reassembling the key, if my copy of the complete key gets
corrupt - that's the value of key sharing. This is very different to
key shares stored alongside the data on hard drives around the world,
which may become accessible via TCP/IP because of a blunder in WiFi
configuration at a data center, or some other such mishap for which
there is no mathematical measure of security.


> In this sense it provides a
> greater degree of confidentiality than a system where there is a key
> in one online location of which off-line shares are maintained as a
> backup.

The One online location being.... my data processing CPUs! They have
the data in plain text, or the credentials to access the dispersed
storage anyway.
This isn't a better analogy, because I trust my friend and I don't
trust the third-party data center (hence the encryption in the first
place).


> about the issues around secure key storage you will see that the
> protection offered by encryption can be no better than how well
> protected the keys are.  Thus if one could store there data in a way
> that is just as secure as the way they store their keys, it would
> provide the same level of security even if the data itself were not
> encrypted.  Do you agree with this point?

* If the data center is more secure than AES, then I don't need to
encrypt anything - yes.

* Make me want to rely on 10x the security of a single third-party
data center for confidentiality of the decryption key


Sassa

Sassa

unread,
Jul 1, 2009, 7:26:24 AM7/1/09
to Cloud Computing
I want to see the maths of that. I don't think you need to divide 1000
by 512, because it is probability and is not additive. If mean time to
failure of one array is 1000 years, then the probability of at least
one node failure after 2 years is 64%. I think mean time might be 1.4
years, but I might want to check that.

Anyhow, if you are using 28 x RAID 6 arrays with 512/28 disks, you
still get 512 arrays and the same failure rate.


Sassa

Jason

unread,
Jul 1, 2009, 1:44:21 PM7/1/09
to Cloud Computing


On Jul 1, 6:26 am, Sassa <sassa...@gmail.com> wrote:
>
> > This is something we are intimately aware of.  Consider that a single
> > RAID array will only have a fixed total amount of usable storage, and
> > a fixed reliability.  Let us say we use RAID 6 using 10 disks in the
> > array.  Since 2 are used for parity we have 8 TB usable, but what if
> > we are tasked with store 4 PB of data?  That works out to 4096/8 = 512
> > separate storage arrays.  If the mean time to data loss for a single
> > RAID 6 array was 1,000 years the mean time to data loss for all 512
> > arrays will be 1,000 / 512 ~= 2 years.  Dispersal allows the
> > reliability of the storage system to be adjusted arbitrarily high
> > without a decrease in storage efficiency, only at the cost of
> > additional processing power.  Therefore so long as processing power
> > can scale to keep up with growing storage requirements dispersal
> > provides a viable solution.  For example the overhead of a 10 of 14
> > storage set will be the same as that of a 20 of 28, but the 20 of 28
> > can handle 8 simultaneous failures while the 10 of 14 can only handle
> > 4.  Every additional tolerable failure allows the total storage system
> > become a few hundred to a few thousand times larger while maintaining
> > the same reliability.
>
> I want to see the maths of that. I don't think you need to divide 1000
> by 512, because it is probability and is not additive. If mean time to
> failure of one array is 1000 years, then the probability of at least
> one node failure after 2 years is 64%. I think mean time might be 1.4
> years, but I might want to check that.
>

The mean time to failure rate isn't a probability of failure, it is a
special statistic given in a quantity of time, often used to estimate
the number of failures given a population of a certain size. For a
primer on the meaning I recommend http://www.faqs.org/faqs/arch-storage/part2/section-151.html

As to the math behind the simple division, consider that whenever one
is running multiple simultaneous arrays in a large system, data loss
on any single array counts as data loss for the system. This is
identical to RAID 0: the failure of any single disk in the array
causes failure of the system. Notice that the calculation for MTBF is
calculated for RAID 0: http://en.wikipedia.org/wiki/Raid_0#RAID_0_failure_rate

Consider what MTBF is, it is the mean number of device-hours between
observed failures. Thus if a particular hard drive has a MTBF of 20
years, and I maintain 1,000 of them over 1 year I would expect to
encounter 50 failures, as there are 1,000 device years. These 50
errors would be more or less evenly distributed over the 1 year
period, thus the mean time to the first failure will be 1/50th a year.

> Anyhow, if you are using 28 x RAID 6 arrays with 512/28 disks, you
> still get 512 arrays and the same failure rate.
>

We won't use RAID arrays; dispersal is a replacement for RAID offering
arbitrarily high levels of fault tolerance. Instead of using RAID 6
we would use a 10 of 16 configuration which would have a mean time to
data loss in the billions or trillions of years such that a system of
many hundreds of concurrent "dispersed arrays" has a MTTDL in the many
millions of years.

Jason

Jason

unread,
Jul 1, 2009, 4:17:33 PM7/1/09
to Cloud Computing


On Jul 1, 5:50 am, Sassa <sassa...@gmail.com> wrote:
> On Jun 30, 8:58 pm, Jason <jasonre...@gmail.com> wrote:
>
> > On Jun 29, 5:08 pm, Sassa <sassa...@gmail.com> wrote:
> ...
> > > I don't think splitting data your way is inadequate. I think the claim
> > > that this makes data MORE secure needs proving.
>
> > You agree that dispersal provides security benefits both in terms of
> > availability and confidentiality. Your question now is could
>
> Benefits in availability at lower storage cost - yes; "benefits in
> confidentiality" means "MORE secure" - so, no, you need to formally
> prove it.
>

Let us agree to stop using the word secure as it is a loaded term
comprising different dimensions, including confidentiality,
availability, and integrity.

Is your question why the combination of dispersal with encryption is
more confidential than encryption alone?

Is your question why dispersal alone is more available than dispersal
with encryption?

Both should be easy to demonstrate. If you apply an existing
encryption system prior to dispersal, dispersal can only improve
confidentiality, it won't hurt it. Therefore it must be more secure
than encryption alone. Even the AONT without dispersal provides
additional confidentiality benefits, see Rivest's original paper for
his list: http://theory.lcs.mit.edu/~cis/pubs/rivest/fusion.ps so the
combination of existing encryption, with AONT by itself has already
provided a higher level of confidentiality.


> * Security of m<K-1 of N pieces is guaranteed by AES - the dispersal
> scheme doesn't ADD any benefit here.
>

How is the AES key stored? I have asked this several times in
previous posts and I have yet to see your solution for key
management. The confidentiality of the AES cipher cannot be analyzed
in a vacuum as the cipher is just one small component of a much
greater system. You will never see the confidentiality benefits of
dispersal without looking at the full picture, the complete system.

> * Your scheme gets rid of key management in its traditional form at
> the cost of making m=K of N pieces of data insecure. This has its
> applications, but not for everyone who fears their 4096-bit RSA key
> will be compromised within 40 years.
>

Who said the slices are made insecure? If each slice is stored with
the same level of confidentiality as an existing system stores its
keys then the dispersal system will have a higher level of
confidentiality. By virtue of the greater difficulty of compromising
multiple locations.

> > dispersal be used to create a more secure data storage system than any
> > existing system. I believe the answer is yes. Here is why:
>
> > The term security when applied to information often refers to the
> > principles of CIA confidentiality, integrity and availability (thus
> > far we've only covered the first and last in these discussions but
> > there are also integrity advantages of dispersal as well which I can
> > go into in more detail in the future). The reason that dispersal can
> > create more secure systems than anything that exists today is that it
> > is compatible with existing encryption techniques. Therefore one may
>
> That's a prerequisite but not a proof that it is more secure.
>

If it is an equation you are looking for consider this:

Probability of encryption system being compromised: P(EC)
Probability of dispersal system being compromised: P(DC)

Therefore if we combine both systems as one, the probability of the
combined system's compromise would be: P(EC) * P(DC)

P(DC) cannot be greater than 1, and so long as it is less than 1 then
the combined system is more secure. Does this make sense?

Are you arguing that the probability of compromising a dispersal
system is exactly 1?

Jason

> > encrypt their data using traditional encryption techniques and then
> > store that encrypted data on an information dispersal network. Thus
> > one gains confidentiality benefits of both existing encryption systems
> > and dispersal.
>
> > In the sense of confidentiality, this combination would be more secure
> > than dispersal alone, however in the sense of availability, it would
> > be less secure than dispersal. Again we have a compromise between
> > availability and confidentiality, the question of which way is more
> > "secure" depends on which aspect is hurt more by the combination and
> > which aspect is more important to you. I think dispersal, due to the
> > fact that it can provide the best of both worlds of high availability
> > and confidentiality is a good balance for most people. If you are a
> > type who leans more toward confidentiality and care less so about
> > availability, then perhaps a combination of both would be best.
>
> This isn't proof and not a fact that dispersal provides the BEST of
> both worlds.
>

It may not be a proof, but what parts do you find unconvincing? If
you name the parts you doubt I can try to prove them to you.

> > > Besides,
>
> > > a) encrypting really important stuff with k of n key shares wouldn't
> > > necessarily mean keeping the key shares on the disk permanently
> > > mounted to the PC;
>
> > You are correct, putting the slices on computers and putting them off-
> > line would be more confidential but also less available. However if
> > you need ready access to your data at any time it isn't possible to
> > keep your key-shares, or your dispersed data on off-line servers.
> > Therefore, while the option exists to put servers offline after
> > storing data to them, I don't think it is something most people would
> > do.
>
> Exactly my point. The analogy that you introduced between threshold
> cryptography and dispersal scheme is misleading.
>

Think about threshold cryptography when used to store keys. If the
shares are kept offline, the availability is no greater than keeping
the storage servers offline in a dispersed scheme. If you want ready
access to your data, then in both cases the shares / slices must be
kept online. I don't see how the analogy breaks in either case.

The only time when one can keep the shares offline and have ready
access to their data is when they keep a full copy of the key on some
other machine. This key will not be protected nearly as well as it
would by a threshold scheme (online or offline) because it is a single
point of compromise and by definition is online.

> > > b) splitting keys into shares is done for purposes other than high
> > > availability of the decryption function - in fact, quite the opposite
> > > is the intention
>
> > Often secret sharing schemes are employed to create secure backups of
> > keys and those shares are not kept online, for the reasons you cite.
>
> Secret sharing - yes. Each share is kept as securely as the key should
> - easy to do and guarantee for a trifling of data that key material
> is. In your scheme the requirement applies to the whole data, since
> the key is part of the data now.
>

So is your objection based on the idea that it is easier to secure a
small piece of data than a large one? In a sense the opposite can be
true, it is much easier and faster to copy a small key than it is to
copy what might be a very large slice.

> > However in these backup scenarios the complete key will often exist in
> > a single physical location and often stored durably (on a hard drive,
> > in a key card, etc.) so that it can be used conveniently (more
> > convenient in the sense that one doesn't have to go and gather off-
> > line copies to use the key). With dispersal no single copy of a key
> > is stored durably in any location, it only exists ephemerally while
> > the data is reassembled from the slices.
>
> But you can't guarantee the same level of confidentiality to all data
> in all locations.

The confidentiality doesn't have to be the same level in all
locations, so long as the most confidential site among any threshold
number of sites meets your minimum confidentiality requirements, you
should be good. For there to be a problem, there would need to be a
threshold number of blunders across your sites all at the same time.

> Just as you say, I can go ahead and store key shares
> in vaults with 10 inch thick walls. I have no problem having the
> hassle of reassembling the key, if my copy of the complete key gets
> corrupt

Where do you keep this copy of the complete key? How do you keep it
secure against compromise, or the machine it lives on from being
stolen or remotely compromised? Dispersal has the advantage that the
complete key doesn't live anywhere when at rest, and when in use
exists only in RAM of the machine reading the data. It also uses a
unique key for every piece of data stored, so if one source being read
has its key intercepted in RAM, the rest of the data is still secure.

> - that's the value of key sharing. This is very different to
> key shares stored alongside the data on hard drives around the world,
> which may become accessible via TCP/IP because of a blunder in WiFi
> configuration at a data center, or some other such mishap for which
> there is no mathematical measure of security.
>

Let's generally assign a probability for accidental or malicious
disclosure for a key stored at a single location. Let's call it P
(D). For dispersal, the probability of disclosure will be P(D)^T,
where T is the threshold. No matter what number you chose for P(D),
except for 0 or 1, the probability of disclosure for dispersal will
always be lower. This equation doesn't hold if the disclosures are
non-independent, but even if they are highly correlated P(D) will be
greater than ( P(D) * P(Sufficient number of additional correlated
compromises at other sites to reach threshold) ) so long as the
threshold is more than 1, which for dispersal it always is. Thus even
if the exact math for modeling rates of compromise is hard to model,
regardless of what the true probability is, dispersal is more
confidential than keeping a copy of the key at one location when the
confidentiality of the sites keeping slices is as good as the
confidentiality of the site keeping your single copy of the key.

> > In this sense it provides a
> > greater degree of confidentiality than a system where there is a key
> > in one online location of which off-line shares are maintained as a
> > backup.
>
> The One online location being.... my data processing CPUs! They have
> the data in plain text, or the credentials to access the dispersed
> storage anyway.
>

The advantage of credentials is that they can be disabled, changed, or
reset whenever needed while the keys for the data cannot. Thus users
could use a very long password and not have to worry about forgetting
it as it can be reset, or if an employee storing dispersed data left
the company those who manage the authentication system may still
recover the data.
To gain the full secret sharing properties of dispersal, one should
have a high degree of trust in the sites storing slices. However,
even if you do not fully trust any one site, do you trust the
different sites to not form a conspiracy to get your data? So long as
less than a threshold number of site operates agree to betray you then
your data will remain confidential. In typical deployments the
machines storing dispersed data are owned by the person or
organization storing the data and therefore can trust them to be
properly hardened and not tampered with.

> > about the issues around secure key storage you will see that the
> > protection offered by encryption can be no better than how well
> > protected the keys are. Thus if one could store there data in a way
> > that is just as secure as the way they store their keys, it would
> > provide the same level of security even if the data itself were not
> > encrypted. Do you agree with this point?
>
> * If the data center is more secure than AES, then I don't need to
> encrypt anything - yes.
>

Again, this is not an issue of the security of the cipher, but an
issue of how securely the keys that cipher uses are stored. The sites
need only be as trusted as the location where your keys are stored in
an existing encryption system. You seem to be evaluating the security
of dispersal with a different standard than you are evaluating the
security of existing encryption systems. One cannot say encrypt it
with AES, and your data will be unbreakable because AES is
unbreakable, this is only true if you destroy the key. If you need to
get the data back sometime in the future you need to store the key
somewhere, somehow.

> * Make me want to rely on 10x the security of a single third-party
> data center for confidentiality of the decryption key
>

Explain your approach for keeping your decryption key confidential and
I will attempt to show you where it is lacking.

Jason

Jason

unread,
Jul 1, 2009, 5:14:34 PM7/1/09
to Cloud Computing


On Jul 1, 6:26 am, Sassa <sassa...@gmail.com> wrote:
>
> I want to see the maths of that. I don't think you need to divide 1000
> by 512, because it is probability and is not additive. If mean time to
> failure of one array is 1000 years, then the probability of at least
> one node failure after 2 years is 64%. I think mean time might be 1.4
> years, but I might want to check that.

Another way to think about it which might make it clearer:

MTBF is 1 failure per T time. If we run N such devices we can expect
N failures over T time, or put another way 1 failure over T/N time.

Jason

Sassa

unread,
Jul 1, 2009, 7:00:29 PM7/1/09
to Cloud Computing
This is more misleading than a Poissonian distribution for the
"constant failure rate" phase. It is a plain arithmetic average of
time between failures (i.e. sum of times between failures divided by
the number of failures).


> As to the math behind the simple division, consider that whenever one
> is running multiple simultaneous arrays in a large system, data loss
> on any single array counts as data loss for the system. This is
> identical to RAID 0: the failure of any single disk in the array
> causes failure of the system. Notice that the calculation for MTBF is
> calculated for RAID 0:http://en.wikipedia.org/wiki/Raid_0#RAID_0_failure_rate

This is exactly what I considered:

512 independent failures DON'T occur with probability poisson(0,years/
1000) each. So the probability of 512 arrays to survive X years
together is poisson(0,X/1000)^512, and a probability of 512 array
failure during X years is 1-poisson(0,X/1000)^512, hence for X=2 the
probability of 512-array failure is ~64%.


1.4 years is the point when half the failures are expected to happen.
Well, but now I recall the mean *time* to failure for Poissonian
actually is at the ~63% point (the bias from 50% is caused by later
failures moving the average time up). So 2 years is the MTBF for 512
arrays indeed.


> Consider what MTBF is, it is the mean number of device-hours between
> observed failures. Thus if a particular hard drive has a MTBF of 20
> years, and I maintain 1,000 of them over 1 year I would expect to
> encounter 50 failures, as there are 1,000 device years. These 50
> errors would be more or less evenly distributed over the 1 year
> period, thus the mean time to the first failure will be 1/50th a year.
>
> > Anyhow, if you are using 28 x RAID 6 arrays with 512/28 disks, you
> > still get 512 arrays and the same failure rate.
>
> We won't use RAID arrays; dispersal is a replacement for RAID offering
> arbitrarily high levels of fault tolerance. Instead of using RAID 6
> we would use a 10 of 16 configuration which would have a mean time to
> data loss in the billions or trillions of years such that a system of
> many hundreds of concurrent "dispersed arrays" has a MTTDL in the many
> millions of years.

Show these fantastic billions, too.


Sassa


> Jason

Sassa

unread,
Jul 1, 2009, 7:47:04 PM7/1/09
to Cloud Computing
On Jul 1, 9:17 pm, Jason <jasonre...@gmail.com> wrote:
> On Jul 1, 5:50 am, Sassa <sassa...@gmail.com> wrote:
>
> > On Jun 30, 8:58 pm, Jason <jasonre...@gmail.com> wrote:
>
> > > On Jun 29, 5:08 pm, Sassa <sassa...@gmail.com> wrote:
> > ...
> > > > I don't think splitting data your way is inadequate. I think the claim
> > > > that this makes data MORE secure needs proving.
>
> > > You agree that dispersal provides security benefits both in terms of
> > > availability and confidentiality.  Your question now is could
>
> > Benefits in availability at lower storage cost - yes; "benefits in
> > confidentiality" means "MORE secure" - so, no, you need to formally
> > prove it.
>
> Let us agree to stop using the word secure as it is a loaded term
> comprising different dimensions, including confidentiality,
> availability, and integrity.
>
> Is your question why the combination of dispersal with encryption is
> more confidential than encryption alone?

Yes.


> Is your question why dispersal alone is more available than dispersal
> with encryption?

No


> Both should be easy to demonstrate.  If you apply an existing
> encryption system prior to dispersal, dispersal can only improve
> confidentiality, it won't hurt it.  Therefore it must be more secure

That remains to be shown. See below.


> than encryption alone.  Even the AONT without dispersal provides
> additional confidentiality benefits, see Rivest's original paper for
> his list:http://theory.lcs.mit.edu/~cis/pubs/rivest/fusion.psso the
> combination of existing encryption, with AONT by itself has already
> provided a higher level of confidentiality.
>
> > * Security of m<K-1 of N pieces is guaranteed by AES - the dispersal
> > scheme doesn't ADD any benefit here.
>
> How is the AES key stored?  I have asked this several times in
> previous posts and I have yet to see your solution for key
> management.  The confidentiality of the AES cipher cannot be analyzed
> in a vacuum as the cipher is just one small component of a much
> greater system.  You will never see the confidentiality benefits of
> dispersal without looking at the full picture, the complete system.

The AES key can be stored in a variety of ways. You mentioned several,
including thumb drives, smart cards, USB key, password-protected
PKCS#12. Note that this will be stored on the system destined to
process the data, where the data will be available in plaintext, hence
small difference to confidentiality of the data.

In the dispersal case the plaintext data can be obtained where it
doesn't exist ever.


> > * Your scheme gets rid of key management in its traditional form at
> > the cost of making m=K of N pieces of data insecure. This has its
> > applications, but not for everyone who fears their 4096-bit RSA key
> > will be compromised within 40 years.
>
> Who said the slices are made insecure?  If each slice is stored with
> the same level of confidentiality as an existing system stores its
> keys then the dispersal system will have a higher level of
> confidentiality.  By virtue of the greater difficulty of compromising
> multiple locations.
>
> > > dispersal be used to create a more secure data storage system than any
> > > existing system.  I believe the answer is yes.  Here is why:
>
> > > The term security when applied to information often refers to the
> > > principles of CIA confidentiality, integrity and availability (thus
> > > far we've only covered the first and last in these discussions but
> > > there are also integrity advantages of dispersal as well which I can
> > > go into in more detail in the future).  The reason that dispersal can
> > > create more secure systems than anything that exists today is that it
> > > is compatible with existing encryption techniques.  Therefore one may
>
> > That's a prerequisite but not a proof that it is more secure.
>
> If it is an equation you are looking for consider this:
>
> Probability of encryption system being compromised: P(EC)
> Probability of dispersal system being compromised: P(DC)
>
> Therefore if we combine both systems as one, the probability of the
> combined system's compromise would be: P(EC) * P(DC)

Excuse me, but that is true only if EC and DC are independent. The
true picture will be P(EC/DC) * P(DC) + P(EC/!DC) + ..., where EC/DC
is the event of EC compromise upon condition that the DC has been
compromised.


> P(DC) cannot be greater than 1, and so long as it is less than 1 then
> the combined system is more secure.  Does this make sense?
>
> Are you arguing that the probability of compromising a dispersal
> system is exactly 1?

I am arguing that P(EC/DC^K) = 1, so the probability of compromise is P
(DC)^K + P(EC/DC) * P(DC) + P(EC/!DC) + .... I don't like the presence
of P(DC)^K at all, especially because P(DC2/DC1) <= P(DC), i.e. the
probability of hacking DC2 having hacked DC1 may be lower (not proven
to be independent!), and P(DC) is significantly larger than P(EC/!DC).
The key shares or keys are kept on machines that I trust to handle
plaintext data.


> The only time when one can keep the shares offline and have ready
> access to their data is when they keep a full copy of the key on some
> other machine.  This key will not be protected nearly as well as it
> would by a threshold scheme (online or offline) because it is a single
> point of compromise and by definition is online.
>
> > > > b) splitting keys into shares is done for purposes other than high
> > > > availability of the decryption function - in fact, quite the opposite
> > > > is the intention
>
> > > Often secret sharing schemes are employed to create secure backups of
> > > keys and those shares are not kept online, for the reasons you cite.
>
> > Secret sharing - yes. Each share is kept as securely as the key should
> > - easy to do and guarantee for a trifling of data that key material
> > is. In your scheme the requirement applies to the whole data, since
> > the key is part of the data now.
>
> So is your objection based on the idea that it is easier to secure a
> small piece of data than a large one?  In a sense the opposite can be
> true, it is much easier and faster to copy a small key than it is to
> copy what might be a very large slice.

Can be, but not so. I can afford to keep the key in my pocket where it
will be really hard to steal, unlike a vault with my money in the
middle of Mojave desert.

This point is hardly worth talking over any more.


> > > However in these backup scenarios the complete key will often exist in
> > > a single physical location and often stored durably (on a hard drive,
> > > in a key card, etc.) so that it can be used conveniently (more
> > > convenient in the sense that one doesn't have to go and gather off-
> > > line copies to use the key).  With dispersal no single copy of a key
> > > is stored durably in any location, it only exists ephemerally while
> > > the data is reassembled from the slices.
>
> > But you can't guarantee the same level of confidentiality to all data
> > in all locations.
>
> The confidentiality doesn't have to be the same level in all
> locations, so long as the most confidential site among any threshold
> number of sites meets your minimum confidentiality requirements, you
> should be good.  For there to be a problem, there would need to be a
> threshold number of blunders across your sites all at the same time.

Yes. But remember that mean time to break in to a data center is not
measured in trillions of years that mean time to break an AES
encrypted piece.


> > Just as you say, I can go ahead and store key shares
> > in vaults with 10 inch thick walls. I have no problem having the
> > hassle of reassembling the key, if my copy of the complete key gets
> > corrupt
>
> Where do you keep this copy of the complete key?  How do you keep it
> secure against compromise, or the machine it lives on from being
> stolen or remotely compromised?  Dispersal has the advantage that the
> complete key doesn't live anywhere when at rest, and when in use
> exists only in RAM of the machine reading the data.  It also uses a
> unique key for every piece of data stored, so if one source being read
> has its key intercepted in RAM, the rest of the data is still secure.

You keep ignoring the mean time to compromise K data stores,
especially because you alluded that the strength of RSA is scarily
small.


> > - that's the value of key sharing. This is very different to
> > key shares stored alongside the data on hard drives around the world,
> > which may become accessible via TCP/IP because of a blunder in WiFi
> > configuration at a data center, or some other such mishap for which
> > there is no mathematical measure of security.
>
> Let's generally assign a probability for accidental or malicious
> disclosure for a key stored at a single location.  Let's call it P
> (D).  For dispersal, the probability of disclosure will be P(D)^T,
> where T is the threshold.  No matter what number you chose for P(D),
> except for 0 or 1, the probability of disclosure for dispersal will
> always be lower.  This equation doesn't hold if the disclosures are
> non-independent, but even if they are highly correlated P(D) will be
> greater than ( P(D) * P(Sufficient number of additional correlated
> compromises at other sites to reach threshold) ) so long as the
> threshold is more than 1, which for dispersal it always is.  Thus even
> if the exact math for modeling rates of compromise is hard to model,
> regardless of what the true probability is, dispersal is more
> confidential than keeping a copy of the key at one location when the
> confidentiality of the sites keeping slices is as good as the
> confidentiality of the site keeping your single copy of the key.

How about dispersing the encrypted data, but not the key? This can be
trivially shown to have higher confidentiality than the dispersal
scheme you propose, because then P(EC/DC^K)=P(EC/DC)=P(EC/!DC).

Then to compromise the data, one would need to hack K data centers
with the encrypted data, AND get the key from my pocket/laptop/cloud
app.


> > > In this sense it provides a
> > > greater degree of confidentiality than a system where there is a key
> > > in one online location of which off-line shares are maintained as a
> > > backup.
>
> > The One online location being.... my data processing CPUs! They have
> > the data in plain text, or the credentials to access the dispersed
> > storage anyway.
>
> The advantage of credentials is that they can be disabled, changed, or
> reset whenever needed while the keys for the data cannot.  Thus users
> could use a very long password and not have to worry about forgetting
> it as it can be reset, or if an employee storing dispersed data left
> the company those who manage the authentication system may still
> recover the data.

Yes. I can even store the AES key encrypted with an asymmetric public
key. Then it is back to the argument about will the RSA break soon or
not.
Well, I mentioned a few times. Here it comes again:

It doesn't matter where the key is stored so long as the plaintext key
is available only in those places where plaintext data is needed.


> > * Make me want to rely on 10x the security of a single third-party
> > data center for confidentiality of the decryption key
>
> Explain your approach for keeping your decryption key confidential and
> I will attempt to show you where it is lacking.

I will go to the extent of having a PKCS#12 file on my machine with a
really long password and a key sharing scheme allowing to recover my
key. The shares are sunk in the concrete block under the cornerstone
of my building. The location is 12 feet away from the corpse in the
direction of its stretched hand.

The decryption key is encrypted to my private key and is distributed
in encrypted form along with the encrypted data. A service that needs
to process the data sends a request to decrypt the key. We mutually
authenticate each other and I decrypt the decryption key retrieved
with the data and send back to the service.

You will agree the hardness of the problem is bounded by a few things,
all of which require either cracking the cornerstone, the service
requesting the decryption key off me, or the PKCS#12 file I've got.

As to the key management, people have been doing this for many years
now. It is pain, but that's the cost of not relying on just P(DC)^K
that store my data.


Sassa

>
> Jason

Jason

unread,
Jul 1, 2009, 8:29:44 PM7/1/09
to Cloud Computing
I think you are still confusing the intended use of MTBF, as I think
you undersand it is not a predictor of when the first failure will
occur. Although with the proper equation it can be used to do so, as
I will show.

If the exponential failure law applies (it does when there is a
constant failure rate) then the probability of failure over a time can
be determined as follows:

1-e^(-T/MTBF)

Where e is Euler's number, T is the interval over which the device is
running, and MTBF is the Mean time between failures of the system.

As you can see, plugging in T for MTBF:

1-e^(-T/T) = 1 - e^-1 = 0.63

So if a device had an unlimited usable life, the odds of it lasting as
long as its MTBF is only 37%. I think you arrive at the same
conclusion below.


> > As to the math behind the simple division, consider that whenever one
> > is running multiple simultaneous arrays in a large system, data loss
> > on any single array counts as data loss for the system.  This is
> > identical to RAID 0: the failure of any single disk in the array
> > causes failure of the system.  Notice that the calculation for MTBF is
> > calculated for RAID 0:http://en.wikipedia.org/wiki/Raid_0#RAID_0_failure_rate
>
> This is exactly what I considered:
>
> 512 independent failures DON'T occur with probability poisson(0,years/
> 1000) each. So the probability of 512 arrays to survive X years
> together is poisson(0,X/1000)^512, and a probability of 512 array
> failure during X years is 1-poisson(0,X/1000)^512, hence for X=2 the
> probability of 512-array failure is ~64%.
>

I see the result is very close to the 63% I gave, perhaps this is just
due to a rounding error, or a lack of precision..

> 1.4 years is the point when half the failures are expected to happen.
> Well, but now I recall the mean *time* to failure for Poissonian
> actually is at the ~63% point (the bias from 50% is caused by later
> failures moving the average time up). So 2 years is the MTBF for 512
> arrays indeed.
>
>

Cool you will have to show me how you used the Poissonian to derive
this.

>
>
>
> > Consider what MTBF is, it is the mean number of device-hours between
> > observed failures.  Thus if a particular hard drive has a MTBF of 20
> > years, and I maintain 1,000 of them over 1 year I would expect to
> > encounter 50 failures, as there are 1,000 device years.  These 50
> > errors would be more or less evenly distributed over the 1 year
> > period, thus the mean time to the first failure will be 1/50th a year.
>
> > > Anyhow, if you are using 28 x RAID 6 arrays with 512/28 disks, you
> > > still get 512 arrays and the same failure rate.
>
> > We won't use RAID arrays; dispersal is a replacement for RAID offering
> > arbitrarily high levels of fault tolerance.  Instead of using RAID 6
> > we would use a 10 of 16 configuration which would have a mean time to
> > data loss in the billions or trillions of years such that a system of
> > many hundreds of concurrent "dispersed arrays" has a MTTDL in the many
> > millions of years.
>
> Show these fantastic billions, too.
>

I knew this would eventually come up. It is one of the topics of our
paper submission, so it might do well to have a mini review of it
here.

Observe the following trend in RAID array reliability calculations,
where P is the number of parity drives, D is the total number of
drives, and F is the fault tolerance, or (D - P):

RAID 0:

MTTF(array) = MTTF(disk) / D

As soon as the first disk fails the system fails, so we simply divide
the MTTF(disk) by the number of disks. Since the MTTF is the inverse
of the failure rate, another way to think of it is:

FAILURE_RATE(array) = FAILURE_RATE(disk) * D

Which should intuitively make sense.

RAID 5:

MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
(disk))

The first part of the equations looks just like the one for RAID 0.
That part represents the Mean time to first disk loss. The latter
part which was added an be thought of as the rarity of a second drive
failure within the rebuild time of the first disk. Note that here we
use D-1, because 1 disk has already failed and we can only consider
operational disks as candidates for failure. We divide by MTTR (mean
time to repair) because it is within this short interval the second
failure must occur.

RAID 6:

MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
(disk)) * (MTTF(disk) / (D-2)) * (2 * MTTR(disk))

Again the first part of the equation is just like that of the previous
RAID 5 example, but we have now added a new section which relates to
the third disk failure. Note that it follows he same form as what we
added for the second disk failure in RAID 5, only it uses (D-2) since
two disks have failed, and it uses 2 * MTTR, because two disks are in
a repair state. When the second disk failed, chances are the first
disk was mid-way through the rebuild, so the time interval on which
the third drive must fail is not MTTR, but 1/2 MTTR.

Triple parity:

MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
(disk)) * (MTTF(disk) / (D-2)) * (2 * MTTR(disk)) * (MTTF(disk) /
(D-3)) * (3 * MTTR(disk))

F parity:

MTTF(array) = MTTF(disk)^(F+1) / ( MTTR(disk)^F * (D choose F) )

Ignoring the D choose F, since it has a comparatively insignificant
effect, you can see that for each tolerable disk failure we multiply
the MTTF by:

MTTF(disk)/MTTR(disk)

Typically the MTTF is a time in years, and the MTTR is a time given in
hours. Let's say we use a disk with MTTF of 20 years, and a MTTR of
24 hours. This works out to a factor of 7300. To estimate more
conservatively, divide the factor by the number of disks, so if
considering an X of 16 dispersal, divide 7300 by 16 = 450. Therefore
each additional fault we can tolerate in an X of 16 dispersal will
increase the MTTF by 450 times. Thus a 10 of 16, being able to handle
4 times the number of faults as RAID 6 will have a MTTF which is 450^4
= 41,006,250,000 times longer.

For a source, see http://storageadvisors.adaptec.com/2005/11/01/raid-reliability-calculations/
but note that they leave out the multiplier next to MTTR(disk). For
RAID 5 it doesn't make a difference, and RAID 6 it only changes things
by a factor of 2 so it is not surprising it went unnoticed by them,
but for dispersal when many simultaneous failures are tolerable it
becomes more significant.

Hope this helps.

Jason

Jason

unread,
Jul 1, 2009, 11:40:01 PM7/1/09
to Cloud Computing
I think we have been going back and forth on the same things
needlessly. Let me make sure that you understand what I believe:

This is roughly how I would rate the confidentiality level of provided
by different components of encryption systems, when considered
entirely in isolation:

Shamir Secret Sharing Scheme > One Time Pad > AES > Dispersal+AONT >
RSA > Cracking a long password > breaking into a machine > stealing
media.

SSSS I put at the top, because it has the same unbreakable information
theoretic security as OTP encryption, but it requires one gains access
to more than just 1 key, but multiple shares. OTP encryption is above
AES in difficulty, because it is not crackable. AES is extremely
secure, no known threat to its strength is on the horizon. Dispersal
+AONT, depending on a symmetric cipher such as AES to work, cannot be
harder to defeat than AES. RSA is also very secure, but one should be
cautious as it rests on unproven assumptions and could be invalidated
by a quantum computer. I would say it is still more secure to use RSA
than a long password. Long passwods, espeaially when salt and key-
strengthening are used, can be very difficult to crack, but must be
very random and long, longer than the type most people use. I would
say, however that a sufficiently long password 15-20 characters is
likely harder to crack than to break into a machine. Machines can
have a huge range of security, from being locked down, fire walled and
running no remote services to being an unpatched early service pack
version of Windows that runs half a dozen services by default, but
based on the rate at which security patches are published, it is
probably still easier to break into a machine than brute force a good
password. Lastly, what I consider least secure is relying on physical
protection to prevent media from being stolen or lost, but it is very
close. Looking at this list of publicly acknowledged exposures about
half seem to be due to hacking while the other half due to loss or
insiders: http://www.privacyrights.org/ar/ChronDataBreaches.htm#CP

One thing to understand from my ordering is that no one component
alone builds a complete encryption system. Ultimately data needs to
be stored on some media, and data must be encrypted/decrypted on some
machine. Now one thing you point out is that dispersal relies on the
security of the machines which store slices, this I entirely agree
with. However the same is true for the system you proposed. With
dispersal, physical theft of a sufficient number becomes much more
difficult, and breaking into a threshold number of machines will
harder than compromising a single machine. Dispersal also does not
rely on a single all-important password or RSA key, which if
intercepted unveils all the data.

You proposed keeping a PKCS#12 encrypted private key on your computer,
which is used to encrypt the AES decryption keys which ultimately
protect your data. You retrieve the encrypted data and RSA-encrypted
intermediate AES keys after authenticating to the system storing them.

This confidentiality of this system relies on the following things:
Security of RSA (since your public key is presumably not protected in
the PKCS 12 file) and thus if RSA is broken, the public key will be
enough to derive your private key. The strength of your password,
which to achieve security equivalent to that of a 128-bit key will
need to be about 20 characters long. The security of your machine,
which if compromised one could intercept your private key in its
decrypted state or key-log your password to get the private key which
encrypts all your data. If your computer is stolen as opposed to
compromised, one would have to either crack your password or wait
until such time as RSA might be broken, so I see compromise of the
machine as the most likely path to the compromise of this system.
Overall the security provided this system is quite good, and about the
best that is possible using today's technology, but I think due to its
reliance of the inviolability of one machine it falls short of
achieving the security that is possible with dispersal.

If the computer which accessed dispersed data were compromised there
would be no single private key to be stolen, the best that could be be
taken would be whatever data is read while the system is compromised.
The other approach would be to nearly simultaneously steal a threshold
number of servers from different locations. This is beyond the
sophistication of most thieves so it might be easier in this case to
remotely compromise a threshold number of servers. This is going to
be harder than breaking into one machine especially given the fact
that at least threshold number of machines must all be susceptible to
remote exploit, especially when highly secure OSes like OpenBSD are
used, which have only had a 2-3 remote exploits in their default
install in the past 12 or so yeas. If one keeps the servers patched
regularly, the odds of all servers being simultaneously vulnerable can
be vanishingly small.

> Excuse me, but that is true only if EC and DC are independent. The
> true picture will be P(EC/DC) * P(DC) + P(EC/!DC) + ..., where EC/DC
> is the event of EC compromise upon condition that the DC has been
> compromised.
>

I think your objection here is due to a misunderstanding of what was
mean by the combined use of an existing system with dispersal. See
below.

>
> The key shares or keys are kept on machines that I trust to handle
> plaintext data.
>

Certainly, but why can't the machine storing slices be trusted by
you? A little background here: we do not expect individual users to
create their own dispersed storage systems, we expect primarily that
only organizations will have the resources to do this using their own
trusted sites. Unless you are like McCain and have 7 houses to keep
secure servers in, few people will be able to field their own
dispersed storage network. Under this context, I can see your
objection to trusting arbitrary hosting companies with your slices,
but remember it is not the same as trusting them with an un-protected
copy of a key, like the secret sharing scheme, you may give a friend a
post-it note with a share of your password even if you don't trust him
to have your password.

> > The confidentiality doesn't have to be the same level in all
> > locations, so long as the most confidential site among any threshold
> > number of sites meets your minimum confidentiality requirements, you
> > should be good. For there to be a problem, there would need to be a
> > threshold number of blunders across your sites all at the same time.
>
> Yes. But remember that mean time to break in to a data center is not
> measured in trillions of years that mean time to break an AES
> encrypted piece.
>

I agree, based on my ordering above we see brute forcing an AES key is
many times harder than breaking into a machine, the question is where
are those AES keys kept? On single machines that can be compromised
if they are late on a patch, or on flash drives which can be stolen or
lost? It's always how keys are stored that is the weakest link, no
the ciphers themselves, and you won't get any argument from me about
that. The benefit of dispersal is that it improves upon what is the
weakest link in existing encryption systems, by forcing a threshold
number of servers to be stolen or compromised.

> > Where do you keep this copy of the complete key? How do you keep it
> > secure against compromise, or the machine it lives on from being
> > stolen or remotely compromised? Dispersal has the advantage that the
> > complete key doesn't live anywhere when at rest, and when in use
> > exists only in RAM of the machine reading the data. It also uses a
> > unique key for every piece of data stored, so if one source being read
> > has its key intercepted in RAM, the rest of the data is still secure.
>
> You keep ignoring the mean time to compromise K data stores,
> especially because you alluded that the strength of RSA is scarily
> small.
>

I don't know what the Mean time to system compromise (MTTSC) is, I
suspect it varies immensely based on a large number of factors. All I
know is that in general it will be a much greater time to compromise K
data stores than a single system storing a key. To increase the MTTSC
one can do a number of things, deploy a VPN between all machines, use
firewalls, install intrusion detection systems, log / audit access to
remote machines, run minimum number of services, use OSes which
support the NX bit, patch regularly, follow the NSA's guide on machine
hardening, etc. If the MTTSC can be made much higher than the average
life of the system (which may be replaced every 3 yeas) then it
becomes highly unlikely that a threshold number of machines would ever
be compromised at once. Combined with some periodic re-writing
process which perhaps runs once every year, it might be impossible for
an attacker to ever collect a threshold number of slices for any piece
of data. I think such a thing would be very powerful, if it can be
attained.

> > Let's generally assign a probability for accidental or malicious
> > disclosure for a key stored at a single location. Let's call it P
> > (D). For dispersal, the probability of disclosure will be P(D)^T,
> > where T is the threshold. No matter what number you chose for P(D),
> > except for 0 or 1, the probability of disclosure for dispersal will
> > always be lower. This equation doesn't hold if the disclosures are
> > non-independent, but even if they are highly correlated P(D) will be
> > greater than ( P(D) * P(Sufficient number of additional correlated
> > compromises at other sites to reach threshold) ) so long as the
> > threshold is more than 1, which for dispersal it always is. Thus even
> > if the exact math for modeling rates of compromise is hard to model,
> > regardless of what the true probability is, dispersal is more
> > confidential than keeping a copy of the key at one location when the
> > confidentiality of the sites keeping slices is as good as the
> > confidentiality of the site keeping your single copy of the key.
>
> How about dispersing the encrypted data, but not the key? This can be
> trivially shown to have higher confidentiality than the dispersal
> scheme you propose, because then P(EC/DC^K)=P(EC/DC)=P(EC/!DC).
>

I thought that is what was meant by combining the two systems as I
described above. Pre-encrypting the data using whatever system you
have for encryption before handing it to our software to have the AONT
applied and slices dispersed to different machines. Did you imagine
something different? By encrypting the data first and then dispersing
it the data gains the confidentiality benefits of both techniques,
which I thought was obvious but you asked for me to prove it was more
secure.

It sounds like you may have thought I proposed that the key also be
stored using dispersal as opposed to using the existing key management
system, is that correct?

> Then to compromise the data, one would need to hack K data centers
> with the encrypted data, AND get the key from my pocket/laptop/cloud
> app.
>

Exactly.

> > The advantage of credentials is that they can be disabled, changed, or
> > reset whenever needed while the keys for the data cannot. Thus users
> > could use a very long password and not have to worry about forgetting
> > it as it can be reset, or if an employee storing dispersed data left
> > the company those who manage the authentication system may still
> > recover the data.
>
> Yes. I can even store the AES key encrypted with an asymmetric public
> key. Then it is back to the argument about will the RSA break soon or
> not.
>

Well any time RSA is thrown it, it becomes another link in the chain
and one more thing which could potentially fail. My point with even
bringing it up is that RSA and passwords can be avoided altogether
when dispersal is used, two less things to worry about. Many key
management systems rely on RSA due to its ability to secure many keys
at once, therefore each user need only worry about the confidentiality
of one key: their RSA private key, since all other keys can be
encrypted with the corresponding RSA public key. This simplifies many
key management systems, but it puts a dependency on the assumptions of
RSA standing up to mathematical analysis and the inability for
researchers or governments to create large enough quantum computers.
Maybe the odds of a quantum computer being built in the next 10 years
is just 1%, but that is a risk not factored in with dispersal.

>
> Well, I mentioned a few times. Here it comes again:
>
> It doesn't matter where the key is stored so long as the plaintext key
> is available only in those places where plaintext data is needed.
>

For existing systems this is true, the best one could hope for is only
keeping the key where the plaintext is accessed.

However with dispersal the key need not live anywhere, especially in
archival situations where data is written once and rarely accessed.
For existing systems that key always must live somewhere but with
dispersal, unless you are currently reading that data, the key is
essentially non-existent. It only pops into existence once a quorum
of slices are had, and disappears just as quickly. Unlike most
systems which rely on some master key, password, or RSA key, no such
key exists in a dispersed system. Every segment of a file or disk
block will have its own unique AES key associated with it.

>
> As to the key management, people have been doing this for many years
> now. It is pain, but that's the cost of not relying on just P(DC)^K
> that store my data.
>

Dispersal alleviates the pains of key management, while at the same
time allowing greater machine level/physical security. If you think
relying on the security of one additional machine (the one that stores
the key) adds enough additional confidentiality to merit its negative
effect on availability, you can combine both and achieve a system more
confidential than any other that I am aware of. Dispersal doesn't
have to replace encryption systems, it can work in tandem with it to
increase the difficulty of machine-based or physical attacks.

Jason

Sassa

unread,
Jul 2, 2009, 8:10:29 AM7/2/09
to Cloud Computing
You see, I am questioning your shortcut reliability formulas, because
I didn't work with them and I don't buy without conviction.

The failure isn't exponential. If events are distributed uniformly in
time, their distribution is described by Poissonian, which is L^k*e^-L/
k! with k being the # of occurrences of the event with the average
occurrence rate for the same period being L. Hence it is not apparent
that K independent devices will fail at K times the original rate L.

Then, K independent devices fail exactly the same # of times with
probability (L^k*e^-L/k!)^K. This boils down to e^(-L*K) only for k=0,
i.e. only non-failure rate of all devices is proportionate to the # of
devices.


> 1-e^(-T/MTBF)
>
> Where e is Euler's number, T is the interval over which the device is
> running, and MTBF is the Mean time between failures of the system.
>
> As you can see, plugging in T for MTBF:
>
> 1-e^(-T/T) = 1 - e^-1 = 0.63
>
> So if a device had an unlimited usable life, the odds of it lasting as
> long as its MTBF is only 37%. I think you arrive at the same
> conclusion below.
>
> > > As to the math behind the simple division, consider that whenever one
> > > is running multiple simultaneous arrays in a large system, data loss
> > > on any single array counts as data loss for the system. This is
> > > identical to RAID 0: the failure of any single disk in the array
> > > causes failure of the system. Notice that the calculation for MTBF is
> > > calculated for RAID 0:http://en.wikipedia.org/wiki/Raid_0#RAID_0_failure_rate
>
> > This is exactly what I considered:
>
> > 512 independent failures DON'T occur with probability poisson(0,years/
> > 1000) each. So the probability of 512 arrays to survive X years
> > together is poisson(0,X/1000)^512, and a probability of 512 array
> > failure during X years is 1-poisson(0,X/1000)^512, hence for X=2 the
> > probability of 512-array failure is ~64%.
>
> I see the result is very close to the 63% I gave, perhaps this is just
> due to a rounding error, or a lack of precision..

64% is for 2 years exactly.


> > 1.4 years is the point when half the failures are expected to happen.
> > Well, but now I recall the mean *time* to failure for Poissonian
> > actually is at the ~63% point (the bias from 50% is caused by later
> > failures moving the average time up). So 2 years is the MTBF for 512
> > arrays indeed.
>
> Cool you will have to show me how you used the Poissonian to derive
> this.

Taking into account the above, simultaneous non-failure of 512 arrays
in 1 year is described using new MTBF denoted X as
poisson(0,1/X)=poisson(0,1/1000)^512=e^(-512/1000)=e^(-1/X), hence
X=1000/512=1.953, what reliability engineers knew all along :-)

Also, the probability of failure at the mean time to failure is 1-
poisson(0,X/X)=1-poisson(0,1)=1-0.3678=0.6321
(and can be shown through poisson(0,X)^D)


> RAID 5:
>
> MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
> (disk))
>
> The first part of the equations looks just like the one for RAID 0.
> That part represents the Mean time to first disk loss. The latter
> part which was added an be thought of as the rarity of a second drive
> failure within the rebuild time of the first disk. Note that here we
> use D-1, because 1 disk has already failed and we can only consider
> operational disks as candidates for failure. We divide by MTTR (mean
> time to repair) because it is within this short interval the second
> failure must occur.
>
> RAID 6:
>
> MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
> (disk)) * (MTTF(disk) / (D-2)) * (2 * MTTR(disk))
>
> Again the first part of the equation is just like that of the previous
> RAID 5 example, but we have now added a new section which relates to
> the third disk failure. Note that it follows he same form as what we
> added for the second disk failure in RAID 5, only it uses (D-2) since
> two disks have failed, and it uses 2 * MTTR, because two disks are in
> a repair state. When the second disk failed, chances are the first
> disk was mid-way through the rebuild, so the time interval on which
> the third drive must fail is not MTTR, but 1/2 MTTR.

Why 1/2 MTTR? Recall that mean time will be biased towards the longer
end.


> Triple parity:
>
> MTTF(array) = (MTTF(disk) / D) * (MTTF(disk) / (D-1)) * (1 * MTTR
> (disk)) * (MTTF(disk) / (D-2)) * (2 * MTTR(disk)) * (MTTF(disk) /
> (D-3)) * (3 * MTTR(disk))
>
> F parity:
>
> MTTF(array) = MTTF(disk)^(F+1) / ( MTTR(disk)^F * (D choose F) )
>
> Ignoring the D choose F, since it has a comparatively insignificant
> effect, you can see that for each tolerable disk failure we multiply
> the MTTF by:
>
> MTTF(disk)/MTTR(disk)
>
> Typically the MTTF is a time in years, and the MTTR is a time given in
> hours. Let's say we use a disk with MTTF of 20 years, and a MTTR of
> 24 hours. This works out to a factor of 7300. To estimate more
> conservatively, divide the factor by the number of disks, so if
> considering an X of 16 dispersal, divide 7300 by 16 = 450. Therefore
> each additional fault we can tolerate in an X of 16 dispersal will
> increase the MTTF by 450 times. Thus a 10 of 16, being able to handle
> 4 times the number of faults as RAID 6 will have a MTTF which is 450^4
> = 41,006,250,000 times longer.
>
> For a source, seehttp://storageadvisors.adaptec.com/2005/11/01/raid-reliability-calcul...
> but note that they leave out the multiplier next to MTTR(disk). For
> RAID 5 it doesn't make a difference, and RAID 6 it only changes things
> by a factor of 2 so it is not surprising it went unnoticed by them,
> but for dispersal when many simultaneous failures are tolerable it
> becomes more significant.

I think what you quote is not a complete picture, because it doesn't
mention more than F failures, but they can happen.

what you say is modelled using Poissonian roughly like this:

p_irrecoverable_failure_within_time_X >= (1-poisson(0,X/MTTF(disk))^D)*
(1-poisson(F-1,X/MTTR(disk))^(D-1))

i.e. probability of at least one failure in array of D disks,
multiplied by the probability of at least F failures during MTTR of
the first failed disk. This may need more thought.

Then you should find mean X.



Sassa

> Hope this helps.
>
> Jason

Peglar, Robert

unread,
Jul 2, 2009, 8:12:42 AM7/2/09
to cloud-c...@googlegroups.com
Not to jump into the middle of a two-person dialogue, but hard disk
failure is not a Poisson distribution.

The best fit for observed disk failure in the field is a Weibull
distribution, a.k.a. the 'bathtub' curve.

Rob




---

Robert Peglar
Vice President, Technology, Storage Systems Group

Email: Robert...@xiotech.com
Office: 952 983 2287
Mobile: 314 308 6983
Fax: 636 532 0828

Xiotech Corporation
1606 Highland Valley Circle
Wildwood, MO 63005

www.xiotech.com : www.xiotech.com/demo : Toll-Free 866 472 6764
Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.2/2213 - Release Date:
07/01/09 18:07:00

Sassa

unread,
Jul 2, 2009, 9:42:35 AM7/2/09
to Cloud Computing
On Jul 2, 4:40 am, Jason <jasonre...@gmail.com> wrote:
> I think we have been going back and forth on the same things
> needlessly.  Let me make sure that you understand what I believe:
>
> This is roughly how I would rate the confidentiality level of provided
> by different components of encryption systems, when considered
> entirely in isolation:
>
> Shamir Secret Sharing Scheme > One Time Pad > AES > Dispersal+AONT >
> RSA > Cracking a long password > breaking into a machine > stealing
> media.

Thank you for laying it out this way. RSA > breaking into a machine -
no problem.

Somehow you state that Dispersal+AONT=k*(breaking into a machine) >
RSA: prove it, because this is not apparent. Even if RSA is based on
unproven security, so is breaking into a machine. The difference is
that we see break-ins on a daily basis, unlike RSA getting broken.

I agree that AES > Dispersal+RSA > RSA (subject to recommended key
lengths and a bunch of other factors)
That's a very big if, though.


> enough to derive your private key.  The strength of your password,
> which to achieve security equivalent to that of a 128-bit key will
> need to be about 20 characters long.  The security of your machine,

Easy. Several of mine are 13, which I regard is adequate for my
purposes.

I won't comment on the rest at this time.


Sassa
> ...
>
> read more »

Sassa

unread,
Jul 2, 2009, 9:50:21 AM7/2/09
to Cloud Computing
On Jul 2, 4:40 am, Jason <jasonre...@gmail.com> wrote:
...
> It sounds like you may have thought I proposed that the key also be
> stored using dispersal as opposed to using the existing key management
> system, is that correct?

Doesn't your diagram show exactly that? That the key is split and
dispersed alongside the data?

Don't you keep banging on about no need to store the key anywhere,
because it is next to the data?


Sassa

Jason

unread,
Jul 2, 2009, 11:37:34 AM7/2/09
to Cloud Computing


On Jul 2, 7:12 am, "Peglar, Robert" <Robert_Peg...@xiotech.com> wrote:
> Not to jump into the middle of a two-person dialogue, but hard disk
> failure is not a Poisson distribution.
>
> The best fit for observed disk failure in the field is a Weibull
> distribution, a.k.a. the 'bathtub' curve.
>
> Rob

The bathtub curve might be a better fit, it makes intuitive sense why
complex mechanical devices would follow one, but empirical studies on
hard drives have actually failed to find a bathtub curve, manufactures
do some light burn-in tests to find drives which immediately fail
which may be good enough to remove the initial high failure rate
observed by users of the drive. Instead of the bathtub curve, Google
found the rate staying roughly flat for the first year, jumping up by
the second and leveling off: http://arstechnica.com/old/content/2007/02/8917.ars

Our model assumes the curve is close to being flat over the drive's
expected life, and it allows us to simplify the equation used by
assuming it is constant. To make sure we do not over-estimate the
reliability we may use the highest rate along the curve.

Jason

Jason

unread,
Jul 2, 2009, 11:57:58 AM7/2/09
to Cloud Computing
That is understandable.

> The failure isn't exponential. If events are distributed uniformly in
> time, their distribution is described by Poissonian, which is L^k*e^-L/
> k! with k being the # of occurrences of the event with the average
> occurrence rate for the same period being L. Hence it is not apparent
> that K independent devices will fail at K times the original rate L.

The failure rates are distributed uniformly over time. Therefore the
exponential failure law applies. See http://en.wikipedia.org/wiki/Exponential_distribution
which says at the top:

"In probability theory and statistics, the exponential distributions
are a class of continuous probability distributions. They describe the
times between events in a Poisson process, i.e. a process in which
events occur continuously and independently at a constant average
rate."

So I think we are both talking about the same thing, but in different
ways.

>
> Then, K independent devices fail exactly the same # of times with
> probability (L^k*e^-L/k!)^K. This boils down to e^(-L*K) only for k=0,
> i.e. only non-failure rate of all devices is proportionate to the # of
> devices.
>

Does your model include replacement of failed drives? That is a
crucial part of keeping a constant number of failures over a set
period of time.

> > 1-e^(-T/MTBF)
>
> > Where e is Euler's number, T is the interval over which the device is
> > running, and MTBF is the Mean time between failures of the system.
>
> > As you can see, plugging in T for MTBF:
>
> > 1-e^(-T/T) = 1 - e^-1 = 0.63
>
> > So if a device had an unlimited usable life, the odds of it lasting as
> > long as its MTBF is only 37%.  I think you arrive at the same
> > conclusion below.
>
> > > > As to the math behind the simple division, consider that whenever one
> > > > is running multiple simultaneous arrays in a large system, data loss
> > > > on any single array counts as data loss for the system.  This is
> > > > identical to RAID 0: the failure of any single disk in the array
> > > > causes failure of the system.  Notice that the calculation for MTBF is
> > > > calculated for RAID 0:http://en.wikipedia.org/wiki/Raid_0#RAID_0_failure_rate
>
> > > This is exactly what I considered:
>
> > > 512 independent failures DON'T occur with probability poisson(0,years/
> > > 1000) each. So the probability of 512 arrays to survive X years
> > > together is poisson(0,X/1000)^512, and a probability of 512 array
> > > failure during X years is 1-poisson(0,X/1000)^512, hence for X=2 the
> > > probability of 512-array failure is ~64%.
>
> > I see the result is very close to the 63% I gave, perhaps this is just
> > due to a rounding error, or a lack of precision..
>
> 64% is for 2 years exactly.
>

Okay, and you used the 1.953 that results from 1000/512.
This is the part that is new and which I added to the equation so it
requires the most explanation on my part. Assume this is RAID 6, one
drive has already failed. Now we learn that sometime later (but must
be within MTTR amount of time of the first failure) another drive has
failed. The question is, what is the expected amount of rebuild
progress to expect the first drive to have made? It is analogous to
choosing a random point along the interval from 0 to MTTR, being
uniformally distributed we should expect on average for it to be 1/2
MTTR. Thus the third disk must fail within 1/2 MTTR on average,
otherwise we rebuild the first failed disk.

For triple parity there will have been two disk failures, so if you
pick any two random points along the interval 0 to MTTR, on average
the one which will have made the most progress will fall at 2/3 MTTR,
therefore when the third disk fails we expect the fourth failure must
occur within 1/3 MTTR.

In a uniform distribution you expect picking N random points along a
range will average out to be uniformally distributed across that
range. So N points should be each interval/(N+1) from each other.
This puts the maximum at N/(N+1), and thus we can predict what the
expected progress will be on the disk which has been in repair the
longest time. Please let me know if this part makes sense.
At the point more than F failures happen, the data is irrecoverably
lost, as you can no longer meet a threshold and rebuild the data.
Hence that final failure is the last failure that need be considered.

> what you say is modelled using Poissonian roughly like this:
>
> p_irrecoverable_failure_within_time_X >= (1-poisson(0,X/MTTF(disk))^D)*
> (1-poisson(F-1,X/MTTR(disk))^(D-1))
>
> i.e. probability of at least one failure in array of D disks,
> multiplied by the probability of at least F failures during MTTR of
> the first failed disk. This may need more thought.
>

Let me know if you come up with an alternative model and in particular
if your calculations come out with close to the same results.

Thanks,

Jason

Peglar, Robert

unread,
Jul 2, 2009, 12:12:19 PM7/2/09
to cloud-c...@googlegroups.com
Jason said:

>On Jul 2, 7:12 am, "Peglar, Robert" <Robert_Peg...@xiotech.com> wrote:
>>? Not to jump into the middle of a two-person dialogue, but hard disk
>> failure is not a Poisson distribution.
>>
>> The best fit for observed disk failure in the field is a Weibull
>> distribution, a.k.a. the 'bathtub' curve.
>>
>> Rob

>The bathtub curve might be a better fit, it makes intuitive sense why
>complex mechanical devices would follow one, but empirical studies on
>hard drives have actually failed to find a bathtub curve, manufactures
>do some light burn-in tests to find drives which immediately fail
>which may be good enough to remove the initial high failure rate
>observed by users of the drive. Instead of the bathtub curve, Google
>found the rate staying roughly flat for the first year, jumping up by
>the second and leveling off: http://arstechnica.com/old/content/2007/02/8917.ars

The Google study (Pinheiro et al) studied consumer-level ATA drives in very poor environmental conditions - velcro and cookie sheets - and also did not attempt to fit a distribution into their observed failures. Their study was much more about correlating system parameters (especially 'SMART' indicators) to observed failures.

The definitive study of enterprise disk is the Schroeder & Gibson study, which I have attached. It finds that the Weibull (or gamma) distribution is by far the best fit for observed field failures. See especially sections 5 and 7 of the paper, and its conclusions.


>Our model assumes the curve is close to being flat over the drive's
>expected life, and it allows us to simplify the equation used by
>assuming it is constant. To make sure we do not over-estimate the
>reliability we may use the highest rate along the curve.

That's good, but not good enough. Without sounding preachy, you'd be far better off modeling Weibull with shape of 0.7-0.8, just as the Schroeder/Gibson study indicates.

Happy modeling,

Rob






Checked by AVG - www.avg.com
Version: 8.5.375 / Virus Database: 270.13.2/2214 - Release Date: 07/02/09 05:54:00

Peglar, Robert

unread,
Jul 2, 2009, 12:12:52 PM7/2/09
to cloud-c...@googlegroups.com
This time with attachment...sorry about that




---

Robert Peglar
Vice President, Technology, Storage Systems Group

Email: Robert...@xiotech.com
Office: 952 983 2287
Mobile: 314 308 6983
Fax: 636 532 0828

Xiotech Corporation
1606 Highland Valley Circle
Wildwood, MO 63005

www.xiotech.com : www.xiotech.com/demo : Toll-Free 866 472 6764

-----Original Message-----
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Jason
Sent: Thursday, July 02, 2009 10:38 AM
To: Cloud Computing
Subject: [ Cloud Computing ] Re: A Follow-Up: "3 Reasons Why Encryption is Overrated" -- http://bit.ly/e8CyM




Checked by AVG - www.avg.com
failure-fast07.pdf

Sassa

unread,
Jul 2, 2009, 1:04:30 PM7/2/09
to Cloud Computing
Yep. The middle bit with constant rate can be modelled with Poisson.

Sassa

On Jul 2, 1:12 pm, "Peglar, Robert" <Robert_Peg...@xiotech.com> wrote:
> Not to jump into the middle of a two-person dialogue, but hard disk
> failure is not a Poisson distribution.
>
> The best fit for observed disk failure in the field is a Weibull
> distribution, a.k.a. the 'bathtub' curve.
>
> Rob
>
> ---
>
> Robert Peglar
> Vice President, Technology, Storage Systems Group
>
> Email: Robert_Peg...@xiotech.com
> Office: 952 983 2287
> Mobile: 314 308 6983
> Fax: 636 532 0828
>
> Xiotech Corporation
> 1606 Highland Valley Circle
> Wildwood, MO 63005
>
> www.xiotech.com:www.xiotech.com/demo: Toll-Free 866 472 6764
>
> -----Original Message-----
> From: cloud-c...@googlegroups.com
>
> [mailto:cloud-c...@googlegroups.com] On Behalf Of Sassa
> Sent: Wednesday, July 01, 2009 6:00 PM
> To: Cloud Computing
> Subject: [ Cloud Computing ] Re: A Follow-Up: "3 Reasons Why Encryption
> is Overrated" --http://bit.ly/e8CyM

Sassa

unread,
Jul 2, 2009, 1:47:01 PM7/2/09
to Cloud Computing
oh, ok. Gamma or Weibull then.


Alex

On Jul 2, 5:12 pm, "Peglar, Robert" <Robert_Peg...@xiotech.com> wrote:
> This time with attachment...sorry about that
>
> ---
>
> Robert Peglar
> Vice President, Technology, Storage Systems Group
>
> Email: Robert_Peg...@xiotech.com
>  failure-fast07.pdf
> 367KViewDownload

Jason

unread,
Jul 2, 2009, 3:31:33 PM7/2/09
to Cloud Computing


On Jul 2, 8:42 am, Sassa <sassa...@gmail.com> wrote:
> On Jul 2, 4:40 am, Jason <jasonre...@gmail.com> wrote:
>
> > I think we have been going back and forth on the same things
> > needlessly.  Let me make sure that you understand what I believe:
>
> > This is roughly how I would rate the confidentiality level of provided
> > by different components of encryption systems, when considered
> > entirely in isolation:
>
> > Shamir Secret Sharing Scheme > One Time Pad > AES > Dispersal+AONT >
> > RSA > Cracking a long password > breaking into a machine > stealing
> > media.
>
> Thank you for laying it out this way. RSA > breaking into a machine -
> no problem.
>

Okay.

> Somehow you state that Dispersal+AONT=k*(breaking into a machine) >
> RSA: prove it, because this is not apparent. Even if RSA is based on
> unproven security, so is breaking into a machine. The difference is
> that we see break-ins on a daily basis, unlike RSA getting broken.
>

I said that ordering only applies when considered in isolation. With
AONT+Dispersal, determining the data with less than a threshold of
slices is of equivalent difficulty to breaking the AES key used in the
transformation. That is why I placed it above RSA, because I believe
breaking an AES key to be more difficult than an RSA key. Below that
I explain every crypto system is based on multiple components, and
rely on things such as machine and media security, so when analyzing
the system the full picture must be analyzed.
Assuming you use full alphabet, upper and lower case, numbers and
symbols and fully random combinations of such you can get up to 7 bits
of security per character. So 91 bits should be sufficient.

> I won't comment on the rest at this time.
>

These exchanges are getting quite long, it took a few hours last night
for me to write the last posts.

Regards,

Jason

Jason

unread,
Jul 2, 2009, 3:45:15 PM7/2/09
to Cloud Computing


On Jul 2, 8:50 am, Sassa <sassa...@gmail.com> wrote:
> On Jul 2, 4:40 am, Jason <jasonre...@gmail.com> wrote:
> ...
>
> > It sounds like you may have thought I proposed that the key also be
> > stored using dispersal as opposed to using the existing key management
> > system, is that correct?
>
> Doesn't your diagram show exactly that? That the key is split and
> dispersed alongside the data?
>
> Don't you keep banging on about no need to store the key anywhere,
> because it is next to the data?
>

The original diagram shows that, but we were momentarily talking about
the combined use of dispersal with existing encryption systems. This
would be more confidential but less available.

Jason
Reply all
Reply to author
Forward
0 new messages