pandas_factory to query results but how can it be combined with fetch_size?from cassandra.query import SimpleStatement
def pandas_factory(colnames, rows):
return pd.DataFrame(rows, columns=colnames)
session.row_factory = pandas_factory
query = "SELECT * FROM my_table"
statement = SimpleStatement(query, fetch_size=10)
rslt = session.execute(statement, timeout=None)
df = rslt._current_rows
| statement = SimpleStatement("SELECT * FROM users", fetch_size=10) |
| for user_row in session.execute(statement): |
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
process_user() do?pandas_factory, if I should?row_factory, then I can iterate through the fetches, but at that case the results are Row objects and not dataframe...dict_factory then your iteration works, I get all my records as dictionaries. But with the above pandas_factory function, I get only the column names as many times as (number of records // fetch size) + 1To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
pandas_factory to return with a list of pandas.DataFrame() because the problem with the original pandas_factory and the PagedResult object in the for cycle was that the for cycle iterated over not on the PagedResult objects that are pandas.DataFrame() objects in my case (and the number of them is (number of records // fetch size) + 1) but on the pandas.DataFrame() object itself. That's why I got only the column names of the pandas.DataFrame() without the rows of it.pandas_factory in case of setting fetch size is:from cassandra.query import SimpleStatement
def pandas_factory(colnames, rows):
res = []
res.append(pd.DataFrame(rows, columns=colnames))
return res
session.row_factory = pandas_factory
query = "SELECT * FROM my_table"
statement = SimpleStatement(query, fetch_size=10)
df = pd.DataFrame()
for user_row in session.execute(statement):
df = df.append(user_row, ignore_index=True)
I've run a test with a table having ~130.000 records with the following fetch_size and query execution time:fetch_size = 1000; query execution time = 0:00:34.374
fetch_size = 5000; query execution time = 0:00:19.854
fetch_size = 10; query execution time = 0:29:56.726
fetch_size = 10000; cassandra.ReadFailure :(
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
map objects / fields (I've struggled getting map fields into pandas before).pandas_factory function that converts Cassandra map object into Python dictionary in order to place the dict into pandas.DataFrame() column:def pandas_factory(colnames, rows):
# Convert tuple items of 'rows' into list (elements of tuples cannot be replaced)rows = [list(i) for i in rows]
# Convert only 'OrderedMapSerializedKey' type list elements into dict
for idx_row, i_row in enumerate(rows):
for idx_value, i_value in enumerate(i_row):
if type(i_value) is OrderedMapSerializedKey:
rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])
# Place pandas.DataFrame() result into list to be able to iterate over PagedResult if number of records > fetch_size
res = []
res.append(pd.DataFrame(rows, columns=colnames))
return res
query.py as another option of row_factoryTo unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.