(2) Are there any benchmarks between the the pandas based indexing and PyTables based indexing? Is one faster than the other? Does one take up more disk space (i.e. larger HDF5 file) than the other?
Thank you for any help! Apologies for so many questions recently
--
Thanks, Evan
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+unsubscribe@googlegroups.com.
To post to this group, send email to pytables-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks, Evan
To post to this group, send email to pytable...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Francesc Alted
Thanks, Evan
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-user...@googlegroups.com.
To post to this group, send email to pytable...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Francesc Alted
Thank you for the prompt response.
So just to be clear, users should index the columns using
indexedrows = table.cols.identity.create_index()
before filling the tables with records?
That is, users how instantiate the `tables` and `rows` object, then use
indexedrows = table.cols.identity.create_index(), e.g. with the example below
...
...Does the order when columns are indexed matter?
Question 2:
So, `indexedrows` is not a traditional python variable?
If I were to index on threes columns 'col1', 'col2', 'col3', I would use
`indexedrows` must automatically execute these commands. For a traditional "variable", the only thing saved is `table.cols3.identity.create_index()`---the other two have been written over.
table.cols.var1.create_index() table.cols.var2.create_index() table.cols.var3.create_index()
Hope this helps,
Francesc
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+unsubscribe@googlegroups.com.
To post to this group, send email to pytables-users@googlegroups.com.
table.cols.var1.create_index() table.cols.var2.create_index() table.cols.var3.create_index()
table.cols.idnumber.create_index() table.cols.speed.create_index()
class MyTable(IsDescription):
COL1 = Int16Col()
COL2 = Int16Col()
COL3= StringCol(64)
COL4= StringCol(64)
COL5= StringCol(64)
COL6= StringCol(64)
COL7 = Int32Col()
# Open a file in write mode
h5file = open_file("file1.h5", mode = "w")
my_key = "key"
# Create group
group = h5file.create_group("/", "my_table")
table = h5file.create_table(group, my_key, MyTable, "table of values")
row = table.row
# user decides which indices to create
field1 = "COL1" # create index on column 1, COL1
field2 = "COL2" # create index on column 2, COL2
# import dictionary 'dictionary1"
for dict in dictionary1:
row["COL1"] = dict["COL1"]
row["COL2"] = dict["COL2"]
row["COL3"] = dict["COL3"]
row["COL4"] = dict["COL4"]
row["COL5"] = dict["COL5"]
row["COL6"] = dict["COL6"]
row["COL7"] = dict["COL7"]
# This injects the Record values
table.cols.field1.create_index()
table.cols.field2.create_index()
row.append()
# Flush the table buffers
table.flush()
"""
Now, when I run this, I get the following error:
"ValueError: Index(6, medium, shuffle, zlib(1)).is_csi=False for column 'COL1' already exists. If you want to re-create it, please, try with reindex() method better"
Where above am I indexing on all columns? Surely I would have most faster queries if I only indexed on one/two columns, and queried those, right?
So...I'm making a mistake somewhere.
Thanks for the help, we'll soon have this figured out, Evan