Scylladb Node.JS driver

94 views
Skip to first unread message

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 17, 2024, 7:04:25 AMApr 17
to ScyllaDB users
Hello there,

I am using "cassandra-driver": "^4.7.2" and I am having an issue with it,
The problem is I am not getting all the results for some queries, its only returning a portion of the results or none when using this driver in node, When I tried running the same query in cqlsh and with python driver it's giving correct results. 

I have tried using older versions like 3.5.0, 4.2.0. No luck there.

Thanks,

Avi Kivity

<avi@scylladb.com>
unread,
Apr 17, 2024, 11:04:39 AMApr 17
to scylladb-users@googlegroups.com
Hi,

You didn't mention which ScyllaDB version you are using, your schema, the query, and how replication is configured, so there is very little that can be done to help you.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/fb79378e-8295-4694-8e5b-708dfa3cf688n%40googlegroups.com.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 17, 2024, 11:39:53 PMApr 17
to ScyllaDB users
Apologies,

Here is the info,

Running on 1 Node on amazon i3, self-hosted using AMI provided by scylla.

Keyspace : 

CREATE KEYSPACE real_estate_properties WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

Table :

CREATE TABLE real_estate_properties.forsaleproperty (

    addressstate text,

    addresscity text,

    created_at timestamp,

    zpid text,

    address text,

    addressstreet text,

    addresszipcode text,

    area text,

    attributes text,

    availabilitycount text,

    availabilitydate text,

    badgeinfo text,

    baths text,

    beds text,

    brokername text,

    buildingname text,

    cansavebuilding boolean,

    carouselphotos text,

    countrycurrency text,

    detailurl text,

    has3dmodel boolean,

    hasadditionalattributions boolean,

    hasimage boolean,

    hasvideo boolean,

    hdpdata text,

    hoafee text,

    id text,

    imgsrc text,

    isbuilding boolean,

    isfeaturedlisting boolean,

    ishomerec boolean,

    issaved boolean,

    isshowcaselisting boolean,

    isundisclosedaddress boolean,

    isuserclaimingowner boolean,

    isuserconfirmedclaim boolean,

    iszillowowned boolean,

    latlong text,

    list boolean,

    lotid text,

    marketingstatussimplifiedcd text,

    parking text,

    pgapt text,

    pool text,

    price text,

    providerlistingid text,

    rawhomestatuscd text,

    relaxed boolean,

    rooms text,

    sgapt text,

    shouldshowzestimateasprice boolean,

    statustext text,

    statustype text,

    streetviewmetadataurl text,

    streetviewurl text,

    tracking text,

    unformattedprice text,

    units text,

    variabledata text,

    zestimate text,

    PRIMARY KEY ((addressstate, addresscity), created_at, zpid)

) WITH CLUSTERING ORDER BY (created_at DESC, zpid ASC)

    AND bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}

    AND comment = ''

    AND compaction = {'class': 'SizeTieredCompactionStrategy'}

    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.0

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99.0PERCENTILE';

Query : 

SELECT * FROM ForSaleProperty WHERE addressstate = 'California' AND addresscity = 'Cathedral City' AND created_at >= 1711909800000 AND created_at < 1714415400000 LIMIT 1000;

Avi Kivity

<avi@scylladb.com>
unread,
Apr 18, 2024, 5:32:57 AMApr 18
to scylladb-users@googlegroups.com
Please capture the CQL port with wireshark and examine the conversation between driver and server. You can attach the capture here if there's no private data in the transfer.

My guess is that the driver misinterprets a page with no rows as end-of-query instead of looking at the Has_more_pages flag. We've seen such bugs in other drivers.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 18, 2024, 11:38:39 AMApr 18
to ScyllaDB users

Attaching .pcapng file,
Noticed something weird
when querying all the columns it gives 174 results, but when querying only one column it gives all 350 results.
Is this is supposed to be a configuration issue?
query.pcapng

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 18, 2024, 12:11:10 PMApr 18
to ScyllaDB users
addition
selected these columns -> addresscity, address,addressstate,zpid, tracking, carouselphotos
now it gives 211 results.
can we conclude that the problem is related to some sort of size?

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 19, 2024, 12:31:57 PMApr 19
to ScyllaDB users
Hey there,
So what do you think ?
I have also tried manually setting the "fetchSize" in query options. It did not seem to work in my case.
Thanks.

Avi Kivity

<avi@scylladb.com>
unread,
Apr 20, 2024, 1:13:41 PMApr 20
to scylladb-users@googlegroups.com
It's a long capture. Which packet contains the query?
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 20, 2024, 1:59:34 PMApr 20
to ScyllaDB users
I think it's starting from packet number 29. I followed the stream and scrolled down a bit.
somewhere between 100 - 111

Screenshot 2024-04-20 at 11.24.03 PM.png

Yaniv Kaul

<yaniv.kaul@scylladb.com>
unread,
Apr 21, 2024, 2:48:31 AMApr 21
to scylladb-users@googlegroups.com
On Sat, Apr 20, 2024 at 8:13 PM 'Avi Kivity' via ScyllaDB users <scyllad...@googlegroups.com> wrote:
It's a long capture. Which packet contains the query?

You can filter with 'cql'.

There are several bugs / missing dissections in Wireshark to dissect it properly. Already submitted one fix and will get to fix another soon, but I'm not sure I have all done .
Regardless, there's at least one query ('SELECT 1;') that is obviously failing and I'm not sure where it's coming from.
Y. 

Yaniv Kaul

<yaniv.kaul@scylladb.com>
unread,
Apr 21, 2024, 3:47:32 AMApr 21
to scylladb-users@googlegroups.com
On Sun, Apr 21, 2024 at 9:47 AM Yaniv Kaul <yaniv...@scylladb.com> wrote:


On Sat, Apr 20, 2024 at 8:13 PM 'Avi Kivity' via ScyllaDB users <scyllad...@googlegroups.com> wrote:
It's a long capture. Which packet contains the query?

You can filter with 'cql'.

There are several bugs / missing dissections in Wireshark to dissect it properly. Already submitted one fix and will get to fix another soon, but I'm not sure I have all done .
Regardless, there's at least one query ('SELECT 1;') that is obviously failing and I'm not sure where it's coming from.
Y. 

Are you sure the driver handles paging correctly? I do see 174 results - and I don't see any packet after that - perhaps it's missing (packet 1911).

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 21, 2024, 4:03:27 AMApr 21
to ScyllaDB users
Shall I take the snapshot of packets when using python driver? maybe if we compare them, we can know what's going on.

Yaniv Kaul

<yaniv.kaul@scylladb.com>
unread,
Apr 21, 2024, 4:25:15 AMApr 21
to scylladb-users@googlegroups.com
On Sun, Apr 21, 2024 at 11:03 AM Bhardwaj Thummar <bhardwa...@gmail.com> wrote:
Shall I take the snapshot of packets when using python driver? maybe if we compare them, we can know what's going on.

Yes, it'd be great to compare them.
Y. 

Avi Kivity

<avi@scylladb.com>
unread,
Apr 21, 2024, 6:12:57 AMApr 21
to scylladb-users@googlegroups.com
It's likely an application or driver bug. The query executes in 127, the response is in 132. The flags byte with Has_more_pages isn't decoded because wireshark wasn't able to reassemble the entire response, but the response size 982075 bytes indicates the server split the result into pages. The driver isn't fetching the next page, or you're using it incorrectly.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 21, 2024, 7:58:31 AMApr 21
to ScyllaDB users
Thanks,
Here's the test setup.


import cassandra from 'cassandra-driver'

async function connectToScyllaDB() {
// Set up connection options
const client = new cassandra.Client({
contactPoints: process.env.SCYLLA_CONTACT_POINTS.split(','),
localDataCenter: process.env.SCYLLA_DC,
credentials: { username: process.env.SCYLLA_USER, password: process.env.SCYLLA_PASS },
keyspace: process.env.SCYLLA_KEYSPACE,
queryOptions: { prepare: true, fetchSize: 10000000 },
socketOptions: { coalescingThreshold: 100, readTimeout: 0, keepAlive: true },
pooling: {
coreConnectionsPerHost: {
[process.env.SCYLLA_CONTACT_POINTS]: 1
}
},
encoding: {
map: Map,
set: Set,
useBigIntAsLong: false
}
});

// Connect to the cluster
await client.connect();

console.log('Connected to ScyllaDB');

// Return the connected client
return client;
}

connectToScyllaDB()
.then(async client => {
// Use the client to execute queries
let response = await client.execute(
`SELECT * FROM forsaleproperty WHERE addressstate = 'California' AND addresscity = 'Cathedral City' LIMIT 1000;`
,[],{
fetchSize : 1000
});
console.log(response.rows.length);
})
.catch(err => {
console.error(err);
});

Avi Kivity

<avi@scylladb.com>
unread,
Apr 21, 2024, 8:20:18 AMApr 21
to scylladb-users@googlegroups.com
I don't know enough about the node.js driver to comment if it's the right way of using it.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 22, 2024, 1:25:36 PMApr 22
to ScyllaDB users
Shall I raise an issue on cassandra-driver node repo? 
can you please guide me with this?
I am currently using a splinter, python script for handling queries and executing the python script in shell from node.

Yaniv Kaul

<yaniv.kaul@scylladb.com>
unread,
Apr 23, 2024, 10:32:44 AMApr 23
to scylladb-users@googlegroups.com
On Mon, Apr 22, 2024 at 8:25 PM Bhardwaj Thummar <bhardwa...@gmail.com> wrote:
Shall I raise an issue on cassandra-driver node repo? 

Yes, I think it makes sense. Keep us updated with the results.
 
can you please guide me with this?

https://github.com/datastax/nodejs-driver?tab=readme-ov-file#getting-help has the links to their mailing list and bug database.
Y.
 

Yaniv Kaul

<yaniv.kaul@scylladb.com>
unread,
Apr 24, 2024, 4:28:05 AMApr 24
to scylladb-users@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.

Raphael S. Carvalho

<raphaelsc@scylladb.com>
unread,
Apr 24, 2024, 5:22:57 AMApr 24
to ScyllaDB users
A very wild guess: maybe related to nodejs driver's auto paging feature (try with it disabled)? I browsed through the driver code and it sets page state (for fetching next row) if has_more_page flag is set but perhaps it bails out if last page is empty as Avi suggested.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 24, 2024, 5:40:06 AMApr 24
to ScyllaDB users
I am using the fetchSize : 1000 option here in the query please refer here,
https://groups.google.com/g/scylladb-users/c/weLgoO7PuK4/m/utfqgX-UAAAJ

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 24, 2024, 5:41:42 AMApr 24
to ScyllaDB users
My guess is that it's related to data sizing or whatever related to size, because when I restricted the columns it fetched more rows. I have tried many combinations of fetchSize and column selections.

Avi Kivity

<avi@scylladb.com>
unread,
Apr 24, 2024, 5:56:23 AMApr 24
to scylladb-users@googlegroups.com
That's normal as when you change the column count ScyllaDB will change the row count until it fills 1MB.

What's not normal is that the driver only fetched one page.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Apr 24, 2024, 5:58:05 AMApr 24
to ScyllaDB users
Understood.

Bhardwaj Thummar

<bhardwajthumber@gmail.com>
unread,
Jun 10, 2024, 11:58:24 AMJun 10
to ScyllaDB users
Hey there,
I have added a work around from my side for now, I think I'm going to migrate to python or make a GRPC server in rust.

On second thoughts, can we use a different database?
I will be going to store 1.5M new records every month in a db. So which database would be a conscious choice, should I stick to Scylla DB (doesn't support pagination/offset).
Or should I pick Postgres?

Thanks in advance!!!
Reply all
Reply to author
Forward
0 new messages