Hello,
I wrote my first sample program to load a file with about 7,000 lines to cassandra using gocql interface. It is using the NewBatch, batch.Query, ExecuteBatch interface very similar to the unit-test cassandra_test.go within gocql codebase.
The issue is that after a successful load (insert statements) of the whole file, I end up with just 40 rows. That is the result of "select count(*) from foobar" via cqlsh. Now, I do understand it could be a mistake made in the selection of row key, but I have double-checked everything. Moreover, loading again, reducing the 'batch' to a size of 1, I am able to get about 6500 or so rows in the resulting table.
The table / column family is declared in cassandra as
create table foobar (
ey ascii,
u int,
g ascii,
et timestamp, -- COMMENT 'Event Time',
[.. snip .. around 30+ columns]
PRIMARY KEY ( (ey, g), et )
);
I am following the guide here on the schema design
http://www.opensourceconnections.com/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/Here is what I have been able to troubleshoot / check.
1. rowkey is unique. There is at least 5,000+ unique combinations of (ey + g). There is no way we should end up with 40. These are web logs and (ey + g) identifies a browser. The time component comes in mainly to store multiple events form the same browser.
2. I understand cassandra has special meaning to 'TIMESTAMP' column, but the timestamp is part of data (it is the time the event happened). I also see reference to timestamp and BATCH mode, not sure where to control it.
3. ExecuteBatch does not report any errors.
4. This is a two node cassandra installation. Two hosts, both serving as seeds.
Is there a way to turn on logging to see what goes to cassandra?
Now the golang part of it
----------------------
As I said earlier, I am following the cassandra_test.go example, but of course with minor changes to accommodate the 30+ columns I have. Since not all columns are strings, I add them to
a variable declared this way.
rowdata := make([]interface{}, len(columns))
Then looping over each column data (from the line that is being processed), I fill row_data with string / int / float32 / bool as appropriate. Then when I call batch.Query(query, rowdata...)
Apart from using that variable-arguments golang feature, there is not much there. I am new to golang, so I thought I should mention that.
Any help/hints are appreciated. How do I debug further?
--
Harry