Undestanding No Host Available

800 views
Skip to first unread message

Ian Stavros

unread,
Sep 18, 2015, 12:18:12 PM9/18/15
to DataStax Python Driver for Apache Cassandra User Mailing List
Hey all,

I am doing a massive insert

I apologize if this seems redundant. I will get the error cassandra.cluster.NoHostAvailable : ('Unable to complete the operation against any hosts', {}), I have attached two snippets of code one of the connection to the host and my insert. and a snapshot of the error. 

I have been looking around for some answers.

if interested see links below

http://datastax.github.io/python-driver/getting_started.html





here is the google search on the error
https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1WPZB_enUS655US655&ion=1&espv=2&ie=UTF-8#q=no+host+available+exception+cassandra+python+datastax

It seems that I might be maxing out with prepared statements? and saw that I needed to push these statements to the server?

I would really appreciate someone taking a look at my code and seeing if it is something I missed, usually is. and expanding some more on why this happens.
insert.png
error.png
connection.png

Ryan Svihla

unread,
Sep 18, 2015, 12:19:01 PM9/18/15
to python-dr...@lists.datastax.com
Probably just physics. How big is the insert?

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.



--

Thanks,

Ryan Svihla

Ian Stavros

unread,
Sep 18, 2015, 12:32:13 PM9/18/15
to python-dr...@lists.datastax.com
well honestly, I wish I could tell you


there is 105 columns
and as for rows it changes so I estimated at different numbers I have seen be inserted with a count print out

# of inserts
low end( based off one set of insert size)
11 times 142
middle 246 times 142
and high 465 times 142

it is varying amount each time

so on the low end 1562
on the middle 34,932
on the high, 66,030

And i did read somewhere it has a limit, which i should push once I get to that limit



Ryan Svihla

unread,
Sep 18, 2015, 12:34:21 PM9/18/15
to python-dr...@lists.datastax.com
There is a bottleneck really driven by your system performance. In other words, if you have a slow disk and a slow cpu..the 'largest write' will be smaller than a system with a fast disk and fast cpu. The best recommendation is to actually split up large writes into many smaller writes then you can have a data model that works on slow or fast hardware and then you can just speed it up as you add hardware.

Ian Stavros

unread,
Sep 18, 2015, 12:34:25 PM9/18/15
to python-dr...@lists.datastax.com
ultimately, this is going to be for every gene in the human dna, but each column family(table name) is separated by own gene 

Ryan Svihla

unread,
Sep 18, 2015, 12:38:32 PM9/18/15
to python-dr...@lists.datastax.com
You're going to have to think about scale with that kind of data model, and the Cassandra user list is good for that (this list is more about the Python driver and probably not appropriate for that depth of data modeling.

I suggest reading http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modelingand rethinking the entire approach, because I can already tell you that data model will fail spectacularly .Around me there is a great deal of genetics research so I've done some data modeling with genetics and Cassandra so I know you can get to a data model that fits your use cases, but you have to understand the basics first. That link should help you.

Ian Stavros

unread,
Sep 18, 2015, 1:47:09 PM9/18/15
to python-dr...@lists.datastax.com
So the problem, is I am switch from SQL and relational understanding,

and I did forget to change my Primary Key back down to what I really need

which is now 

Primary Key ((mutation name), anchor_tm, ID

which would be limiting the number of keys

and if I am understanding this properly

Currently, I have the tables of the gene

then am partitioning that by the mutation_name or cds known by people in the field which will be about on average 142

which that number varies

or

would it be better to separate into the table gene then partition on that gene name?  as this will limit the number of partitions and move the mutation name as a clustering key?

Ryan Svihla

unread,
Sep 18, 2015, 1:57:52 PM9/18/15
to python-dr...@lists.datastax.com
You're  going to find a bigger pool of data modeling expertise to pull from if you ask this instead on the Cassandra users group.

Thanks,
Ryan

Ian Stavros

unread,
Sep 18, 2015, 2:01:47 PM9/18/15
to python-dr...@lists.datastax.com
Alright, I will try there.

Thanks for the help
Reply all
Reply to author
Forward
0 new messages