is it normal to have voldemort readonly store size much large than the original data size on hdfs?

14 views
Skip to first unread message

Xiao Zhou

unread,
Jun 10, 2016, 7:38:32 PM6/10/16
to project-voldemort
Hi, I have a store with the following config:
<stores>
  <store>
    <name>readonlyusers</name>
    <persistence>read-only</persistence>
    <description>Readonly store</description>
    <routing>client</routing>
    <replication-factor>2</replication-factor>
    <required-reads>1</required-reads>
    <required-writes>1</required-writes>
    <key-serializer>
      <type>string</type>
    </key-serializer>
    <value-serializer>
      <type>protobuf</type>
      <schema-info>java=com.audiencescience.data.protobuf.UserProfile$Profile</schema-info>
    </value-serializer>
  </store>
</stores>
The original data size is 8.7T and the RO store size is more than 40T, even consider the replication factor of 2, it is more than 2 times the original size. Is this normal.
I have tried value compress in value-serializer but did not reduce the final size.
Thanks,


Arunachalam

unread,
Jun 10, 2016, 8:16:39 PM6/10/16
to project-...@googlegroups.com
Hadoop uses block compression, Voldemort can compress only individual key/value. So it is quiet normal for the store to have twice or thrice the size. On average on a well compressed data, we have observed 6 times the original size with replication factor of 2.

Thanks,
Arun.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages