JG recovery is not working with a 2 node scyllaDB cluster as backend

29 views
Skip to first unread message

SAURABH VERMA

unread,
Apr 10, 2019, 1:11:51 PM4/10/19
to JanusGraph developers
Hi all

I am trying to recover a JG cluster backed by scyllaDB using steps at https://groups.google.com/d/msg/aureliusgraphs/WyJpzZ4Wcuw/AW4-1GXRfI0J

I am always getting the error as below

Could not find type for id: 2313

I am following below steps:-

- systemctl stop scylla-server
- rm -rf /var/lib/scylla/data/*
- rm -rf /var/lib/scylla/commitlog/*
- systemctl start scylla-server
- schema registration
- systemctl stop scylla-server
- data copy
- sudo chown -R scylla:scylla /var/lib/scylla/data/idgraph1
- systemctl start scylla-server
- nodetool repair

Please guide me whats the correct sequence of steps for recovery or any other way to recover JG data?

Thank
Saurabh

Ryan Stauffer

unread,
Apr 11, 2019, 9:48:59 AM4/11/19
to janusgr...@googlegroups.com
Big picture, my understanding is that you're trying to backup and restore the underlying Scylla keyspace ("idgraph1"), right (procedure described here https://docs.scylladb.com/operating-scylla/procedures/backup-restore/)?  If that restore isn't successful, or incomplete, you can end up with some "interesting" behavior from JG.

The error you're getting implies that the underlying keyspace isn't fully intact, and rows are missing, so let's take a look at the underlying backup/restore procedure.  I'd also make sure that there are no JG instances trying to communicate with Scylla during this process.

Are you backing up and restoring the same keyspace on the same Scylla cluster, or is one of those variables changing?  Are you running the backup and restore on all nodes of the scylla cluster (backup and restore is a per-node operation)?

Thanks,
Ryan  


--
You received this message because you are subscribed to the Google Groups "JanusGraph developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-de...@googlegroups.com.
To post to this group, send email to janusgr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-dev/82ac6954-6bc7-4ae8-a06b-e82fbd3cc091%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Saurabh Verma

unread,
Apr 11, 2019, 1:03:06 PM4/11/19
to janusgr...@googlegroups.com
Hi Ryan

Thanks for the response.

Below answers to your questions:-

1. Are you backing up and restoring the same keyspace on the same Scylla cluster, or is one of those variables changing? - The keyspace is same idgraph1

2. Are you running the backup and restore on all nodes of the scylla cluster (backup and restore is a per-node operation)? - Yes I am running this as mentioned at https://docs.scylladb.com/operating-scylla/procedures/backup-restore/

3. Is the data intact? - I compared the names & sizes of edgestore, graphindex & janusgraph_ids tables on source and destination machines, both are name listing & sizes are exactly the same in both source and destination clusters

Still getting the same error.

'Could not find type for id: 3597'

When I checked g.V(3597) in source cluster, it represents 'node' label in the source cluster, but g.V(3597) gives nothing.

Thanks
Saurabh Verma 
PE

 


*DISCLAIMER - This email message and any attachments to it are confidential and intended solely for the addressee(s). If you are not the intended recipient, please delete the message and notify the sender immediately. Also, you are notified that any unauthorized disclosure, use, copying, or storage of this message or its attachments is strictly prohibited. Imprint



 


Saurabh Verma

unread,
Apr 11, 2019, 1:03:06 PM4/11/19
to janusgr...@googlegroups.com
Hey Ryan

Point 2, I missed the question around same scylla cluster, I am moving it to a different scylla cluster the IPs are changed.
Does that matter?

Ryan Stauffer

unread,
Apr 11, 2019, 1:51:14 PM4/11/19
to janusgr...@googlegroups.com
So you backed up data on each node of “cluster1”, and then restored data on each node of “cluster1”?


For more options, visit https://groups.google.com/d/optout.
--
Ryan Stauffer
Founder, Enharmonic, Inc.

SAURABH VERMA

unread,
Apr 11, 2019, 1:53:58 PM4/11/19
to janusgr...@googlegroups.com
Hey Ryan

I backed up data on each node of “cluster1”, and then restored data on each node of “cluster2”

Thanks 


For more options, visit https://groups.google.com/d/optout.
--
Thanks & Regards,
Saurabh Verma,
India


Ryan Stauffer

unread,
Apr 11, 2019, 1:55:29 PM4/11/19
to janusgr...@googlegroups.com
Missed your response about the different cluster. That’s the underlying issue then.

Because of the way Scylla works, you’ll need to make sure that the cluster on which the restore occurs has the same token distribution as the source cluster.  If you need help doing this let me know and I can send over some commands later once I get back to my computer. 

Saurabh Verma

unread,
Apr 11, 2019, 1:59:55 PM4/11/19
to janusgr...@googlegroups.com
Hey Ryan

It would be really great if you could guide me along these lines you mentioned above.

I would be available if you need any more information.

Saurabh Verma

unread,
Apr 11, 2019, 4:06:24 PM4/11/19
to janusgr...@googlegroups.com
Hi Ryan

Please send me the commands whenever possible

Thanks a lot 
--
Saurabh Verma 
Inline images
                            
Principal Engineer

m: +917976984604
skype: saurabh.verma-zeotap

Ryan Stauffer

unread,
Apr 11, 2019, 4:33:47 PM4/11/19
to janusgr...@googlegroups.com
For terminology, I'm going to call your 2 clusters "source" and "replica".  "Source" is the Scylla cluster that you want to backup, and "replica" is the cluster that you want to copy the data to.  For this to work, you need a mapping of "source" node to exactly one "replica" node.

Ex:
Source cluster = 3 Nodes (source1, source2, source3)
Replica cluster = 3 Nodes (replica1, replica2, replica3)

Where we have a mapping:
source1 -> replica1
source2 -> replica2
source3 -> replica3

The end result of this process will be that the replica cluster tokens mirror those of the source cluster. 

1. Start with the replica cluster shutdown. ($ sudo systemctl stop scylla-server)

2. On each node of the source cluster, run the following:
HOST_IP=`grep -e '^listen_address' /etc/scylla/scylla.yaml | awk '{ print $NF }'`
nodetool ring | grep $HOST_IP | awk '{print $NF ","}' | xargs | sed 's/,$//g'

This produces a comma-separated list of tokens for that particular node...

3. Take this list of tokens and plug it into the corresponding replica node's scylla.yaml file under the initial_token: property.

Now finish the normal restore procedure on the replica cluster (replacing the keyspace data w/ the back-upped data from the source cluster)

4. Start up the replica cluster, run nodetool repair, and once everything's up, you should be good to go...

Good luck!




Saurabh Verma

unread,
Apr 12, 2019, 7:31:01 AM4/12/19
to janusgr...@googlegroups.com
Hi Ryan

The above method works successfully for JG recovery, thank you so much Ryan for the timely assistance, and sharing this crucial & fundamental information.

Thanks a lot once again

Ryan Stauffer

unread,
Apr 12, 2019, 11:50:19 AM4/12/19
to janusgr...@googlegroups.com
No problem at all, glad it worked out!

A good reference as well is the Scylla community slack channel - the engineers on there are very helpful in troubleshooting, and anyone using open source is welcome to join the conversation. 


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages