Undestanding No Host Available

Ian Stavros

unread,

Sep 18, 2015, 12:18:12 PM9/18/15

to DataStax Python Driver for Apache Cassandra User Mailing List

Hey all,

I am doing a massive insert

I apologize if this seems redundant. I will get the error cassandra.cluster.NoHostAvailable : ('Unable to complete the operation against any hosts', {}), I have attached two snippets of code one of the connection to the host and my insert. and a snapshot of the error.

I have been looking around for some answers.

if interested see links below

http://datastax.github.io/python-driver/getting_started.html

https://datastax.github.io/python-driver/api/cassandra/query.html

https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/

https://datastax-oss.atlassian.net/browse/PYTHON-42

https://www.mail-archive.com/us...@cassandra.apache.org/msg41747.html

http://stackoverflow.com/questions/29544110/unable-to-complete-the-operation-against-any-hosts

http://www.datastax.com/dev/blog/datastax-python-driver-multiprocessing-example-for-improved-bulk-data-throughput

http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/exceptions/NoHostAvailableException.html

here is the google search on the error

https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1WPZB_enUS655US655&ion=1&espv=2&ie=UTF-8#q=no+host+available+exception+cassandra+python+datastax

It seems that I might be maxing out with prepared statements? and saw that I needed to push these statements to the server?

I would really appreciate someone taking a look at my code and seeing if it is something I missed, usually is. and expanding some more on why this happens.

insert.png

error.png

connection.png

Ryan Svihla

unread,

Sep 18, 2015, 12:19:01 PM9/18/15

to python-dr...@lists.datastax.com

Probably just physics. How big is the insert?

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

--

Thanks,

Ryan Svihla

Ian Stavros

unread,

Sep 18, 2015, 12:32:13 PM9/18/15

to python-dr...@lists.datastax.com

well honestly, I wish I could tell you

there is 105 columns

and as for rows it changes so I estimated at different numbers I have seen be inserted with a count print out

# of inserts

low end( based off one set of insert size)
11 times 142

middle 246 times 142

and high 465 times 142

it is varying amount each time

so on the low end 1562

on the middle 34,932

on the high, 66,030

And i did read somewhere it has a limit, which i should push once I get to that limit

Ryan Svihla

unread,

Sep 18, 2015, 12:34:21 PM9/18/15

to python-dr...@lists.datastax.com

There is a bottleneck really driven by your system performance. In other words, if you have a slow disk and a slow cpu..the 'largest write' will be smaller than a system with a fast disk and fast cpu. The best recommendation is to actually split up large writes into many smaller writes then you can have a data model that works on slow or fast hardware and then you can just speed it up as you add hardware.

Ian Stavros

unread,

Sep 18, 2015, 12:34:25 PM9/18/15

to python-dr...@lists.datastax.com

ultimately, this is going to be for every gene in the human dna, but each column family(table name) is separated by own gene

Ryan Svihla

unread,

Sep 18, 2015, 12:38:32 PM9/18/15

to python-dr...@lists.datastax.com

You're going to have to think about scale with that kind of data model, and the Cassandra user list is good for that (this list is more about the Python driver and probably not appropriate for that depth of data modeling.

I suggest reading http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modelingand rethinking the entire approach, because I can already tell you that data model will fail spectacularly .Around me there is a great deal of genetics research so I've done some data modeling with genetics and Cassandra so I know you can get to a data model that fits your use cases, but you have to understand the basics first. That link should help you.

Ian Stavros

unread,

Sep 18, 2015, 1:47:09 PM9/18/15

to python-dr...@lists.datastax.com

So the problem, is I am switch from SQL and relational understanding,

and I did forget to change my Primary Key back down to what I really need

which is now

Primary Key ((mutation name), anchor_tm, ID

which would be limiting the number of keys

and if I am understanding this properly

Currently, I have the tables of the gene

then am partitioning that by the mutation_name or cds known by people in the field which will be about on average 142

which that number varies

or

would it be better to separate into the table gene then partition on that gene name? as this will limit the number of partitions and move the mutation name as a clustering key?

Ryan Svihla

unread,

Sep 18, 2015, 1:57:52 PM9/18/15

to python-dr...@lists.datastax.com

You're going to find a bigger pool of data modeling expertise to pull from if you ask this instead on the Cassandra users group.

Thanks,

Ryan

Ian Stavros

unread,

Sep 18, 2015, 2:01:47 PM9/18/15

to python-dr...@lists.datastax.com

Alright, I will try there.

Thanks for the help

Reply all

Reply to author

Forward