bloom_test failure on big endian archs

67 views
Skip to first unread message

Yehuda Sadeh

unread,
Jul 2, 2012, 5:29:06 PM7/2/12
to lev...@googlegroups.com
When running bloom_test on big endian machines it fails due to unacceptable false positive rate. I've looked into the issue and it seems that the reason for that is that it passes a different input than when it runs on little endian. When transforming the input to be little endian it behaves as expected.
This issue holds up inclusion of ceph to debian due to ceph's use of leveldb. The fix can be to bump up the acceptable false positives, e.g.,:

diff --git a/util/bloom_test.cc b/util/bloom_test.cc
index 4a6ea1b..61323af 100644
--- a/util/bloom_test.cc
+++ b/util/bloom_test.cc
@@ -139,7 +139,7 @@ TEST(BloomTest, VaryingLengths) {
       fprintf(stderr, "False positives: %5.2f%% @ length = %6d ; bytes = %6d\n",
               rate*100.0, length, static_cast<int>(FilterSize()));
     }
-    ASSERT_LE(rate, 0.02);   // Must not be over 2%
+    ASSERT_LE(rate, 0.03);   // Must not be over 2%
     if (rate > 0.0125) mediocre_filters++;  // Allowed, but not too often
     else good_filters++;
   }


Or maybe translating the input:

diff --git a/util/bloom_test.cc b/util/bloom_test.cc
index 4a6ea1b..6de7acc 100644
--- a/util/bloom_test.cc
+++ b/util/bloom_test.cc
@@ -5,6 +5,7 @@
 #include "leveldb/filter_policy.h"
 
 #include "util/logging.h"
+#include "util/coding.h"
 #include "util/testharness.h"
 #include "util/testutil.h"
 
@@ -12,8 +13,22 @@ namespace leveldb {
 
 static const int kVerbose = 1;
 
+static inline void EncodeFixed(char *buf, int32_t val)
+{
+    EncodeFixed32(buf, val);
+}
+
+static inline void EncodeFixed(char *buf, int64_t val)
+{
+    EncodeFixed64(buf, val);
+}
+
 static Slice Key(int i, char* buffer) {
-  memcpy(buffer, &i, sizeof(i));
+  if (port::kLittleEndian) {
+    memcpy(buffer, &i, sizeof(i));
+  } else {
+    EncodeFixed(buffer, i);
+  }
   return Slice(buffer, sizeof(i));
 }
 


Yehuda

Sanjay Ghemawat

unread,
Jul 2, 2012, 6:07:30 PM7/2/12
to lev...@googlegroups.com
On Mon, Jul 2, 2012 at 2:29 PM, Yehuda Sadeh <yeh...@inktank.com> wrote:
> When running bloom_test on big endian machines it fails due to unacceptable
> false positive rate. I've looked into the issue and it seems that the reason
> for that is that it passes a different input than when it runs on little
> endian. When transforming the input to be little endian it behaves as
> expected.

Thanks for doing this analysis.

> This issue holds up inclusion of ceph to debian due to ceph's use of
> leveldb. The fix can be to bump up the acceptable false positives, e.g.,:
>
> diff --git a/util/bloom_test.cc b/util/bloom_test.cc
> index 4a6ea1b..61323af 100644
> --- a/util/bloom_test.cc
> +++ b/util/bloom_test.cc
> @@ -139,7 +139,7 @@ TEST(BloomTest, VaryingLengths) {
> fprintf(stderr, "False positives: %5.2f%% @ length = %6d ; bytes =
> %6d\n",
> rate*100.0, length, static_cast<int>(FilterSize()));
> }
> - ASSERT_LE(rate, 0.02); // Must not be over 2%
> + ASSERT_LE(rate, 0.03); // Must not be over 2%
> if (rate > 0.0125) mediocre_filters++; // Allowed, but not too often
> else good_filters++;

The preceding fix seems fine to me as a quick solution for your
problem. I will do something like what you propose below as a longer
term solution to make sure we are measuring the same thing regardless
of endian-ness.
Reply all
Reply to author
Forward
0 new messages