Tranforms in Voldemort

53 views
Skip to first unread message

Chinmay Gupte

unread,
Jun 9, 2016, 1:12:43 PM6/9/16
to project-voldemort
Hi community,

We are running into NPEs while doing proxy GETALLs, with the following stack trace,

java.lang.NullPointerException
        at com.google.protobuf.ByteString.copyFrom(ByteString.java:204)
        at voldemort.client.protocol.pb.ProtoBuffClientRequestFormat.writeGetAllRequest(ProtoBuffClientRequestFormat.java:136)
        at voldemort.store.socket.clientrequest.GetAllClientRequest.formatRequestInternal(GetAllClientRequest.java:53)
        at voldemort.store.socket.clientrequest.AbstractClientRequest.formatRequest(AbstractClientRequest.java:52)
        at voldemort.store.socket.clientrequest.BlockingClientRequest.formatRequest(BlockingClientRequest.java:84)
        at voldemort.store.socket.clientrequest.ClientRequestExecutor.addClientRequest(ClientRequestExecutor.java:124)
        at voldemort.store.socket.SocketStore.request(SocketStore.java:287)
        at voldemort.store.socket.SocketStore.getAll(SocketStore.java:217)
        at voldemort.store.rebalancing.RedirectingStore.proxyGetAll(RedirectingStore.java:588)
        at voldemort.store.rebalancing.RedirectingStore.proxyGetAllAndLocalPut(RedirectingStore.java:654)

AFAIK, it indicates either the key or value being read from the transform is null at the following line of code,


We are still trying to reproduce the issue with several test cases like, putting nulls as values in the transforms map, putting partial set of keys in the transforms map of the GETALL request , putting nulls as keys in transforms (which gets invalidated on the client side itself) but no luck yet.


Few questions,


1) Anything obvious we are missing to reproduce the NPE?

2) Transforms seem to be pretty unused feature overall in voldemort with the exception of ViewStorageEngine. Since there are no proxy GETALLs for Views, I am pretty sure they are not causing problems but overall what are the use cases for transforms other than for Views?


Thanks always for your help.


Cheers,

Chinmay




Arunachalam

unread,
Jun 9, 2016, 1:22:18 PM6/9/16
to project-...@googlegroups.com
What version  are you using for the Client ? The exception is happening inside the Protobuf API, which is weird.

Thanks,
Arun.




--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Chinmay Gupte

unread,
Jun 9, 2016, 1:34:06 PM6/9/16
to project-voldemort
Hi Arun,

Client version is 1.10.3 but voldemort server version is 1.6.9. Because these are exceptions with proxy GETALLs we should be looking at server version? 

Can the version discrepancy between client and server cause this issue?

Thanks,
Chinmay

Arunachalam

unread,
Jun 9, 2016, 1:42:50 PM6/9/16
to project-...@googlegroups.com
I see, you are right this is server using the client library. So it will be using 1.6.9.

Also we did not have any bug fix for this file after 1.6.9. Though I would highly recommend upgrading the server as it has more fixes in other areas.

On looking at the code, the bug seems to be around, if the store has transform, then the server doing proxy may not initialize the transform correctly. Can you post your store definition ?

At LinkedIn, we don't use transforms, so it is not battle tested, but there are other people in the community using the transforms.

Thanks,
Arun.

Félix GV

unread,
Jun 9, 2016, 1:56:00 PM6/9/16
to project-voldemort
You should also post your Protobuf version. Voldemort is usually unstable with any version of PB other than 2.3.

-F



--
--
Félix

Chinmay Gupte

unread,
Jun 9, 2016, 2:26:14 PM6/9/16
to project-voldemort
Thanks Arun and Felix for quick turn-around. Here is our store,

<store>
        <name>TEST</name>
        <persistence>bdb</persistence>
        <routing>client</routing>
        <routing-strategy>zone-routing</routing-strategy>
        <replication-factor>9</replication-factor>
        <preferred-reads>2</preferred-reads>
        <required-reads>1</required-reads>
        <required-writes>2</required-writes>
        <zone-replication-factor>
            <replication-factor zone-id="0">3</replication-factor>
            <replication-factor zone-id="1">3</replication-factor>
            <replication-factor zone-id="2">3</replication-factor>
        </zone-replication-factor>
        <zone-count-reads>0</zone-count-reads>
        <zone-count-writes>0</zone-count-writes>
        <enable-hinted-handoff>true</enable-hinted-handoff>
        <hinted-handoff-strategy>proximity-handoff</hinted-handoff-strategy>
        <hint-preflist-size>2</hint-preflist-size>
        <key-serializer>
            <type>string</type>
        </key-serializer>
        <value-serializer>
            <type>identity</type>
            <compression>
                <type>lzf</type>
            </compression>
        </value-serializer>
    </store>

Clearly, we do not have any transform serializers defined. FWIW, there is also a view on it, though it is not used for production purposes, just for operational needs,

<view>
        <name>TEST-view</name>
        <view-of>TEST</view-of>
        <view-class>VIEW_CLAZZ</view-class>
        <value-serializer>
            <type>protobuf</type>
            <schema-info>java=CLAZZ</schema-info>
            <compression>
                <type>lzf</type>
            </compression>
        </value-serializer>
        <transforms-serializer>
            <type>json</type>
            <schema-info version="0">["int32"]</schema-info>
        </transforms-serializer>
    </view>

@Felix, we are using protobuf 2.5.0. :/ Let me run a few tests on those lines.

Thanks,
Chinmay

-F

To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.




--
--
Félix

Arunachalam

unread,
Jun 9, 2016, 2:35:20 PM6/9/16
to project-...@googlegroups.com
Deleting the view during the migration, will make this problem most likely go away. You can re-define the views later if required.

Thanks,
Arun.

--
Félix

Chinmay Gupte

unread,
Jun 9, 2016, 2:38:47 PM6/9/16
to project-voldemort
Ok. Let me give it a shot.

-F

To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldemort+unsubscribe@googlegroups.com.




--
--
Félix

Chinmay Gupte

unread,
Jun 10, 2016, 1:21:04 PM6/10/16
to project-voldemort
We have tried following tests to reproduce the NPEs (which we observed in production fwiw) but have been unable to do so,

1) Add view along with the store definition and do a GETALL against the store on a node while it is rebalancing so that it proxy GETALLs the data from the previous owner of partition. (Returns the data successfully and server logs do not have any NPEs)

2) Re-compiling the protos being used by client with 2.5.0 protobuf jar and also add the 2.5.0 jar as a dependency on class path. (Still no issues observed with both returning the data successfully and server logs)

3) Various mix-match with passing the nulls in transforms map during the GETALL request.

Any other ideas to reproduce this issue?

Thanks for your help.

Chinmay

Arunachalam

unread,
Jun 10, 2016, 2:08:10 PM6/10/16
to project-...@googlegroups.com
Chinmay,
      From the call stack it is for sure, NPE is coming from the Views and transform. If the view is not production critical, can you remove the view, rebalance and add the view back again ?

This is just a philosophy, but there are known bugs and there are cost of investigating/fixing bugs. If the cost is not justifiable, we should probably just work around. 

Part of the problem is, views and transforms are not used at LinkedIn and we don't have any good experience with it. There are few in the community, but I am not sure if any of them are active in discussion groups.

Only way for you to narrow down is to instrument the code, log more information when NPE happens and try to reproduce. I don't see other good ways to do it.

Thanks,
Arun.

Chinmay
--
Félix

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at https://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages