How to make cassandra read faster?

16 views
Skip to first unread message

caribbean

unread,
Jun 10, 2010, 2:44:07 PM6/10/10
to jassandra-user
Hi,

I am testing the performance of cassandra. We write 200k records to
database and each record is 1k size. Then we read these 200k records.
It takes more than 400s to finish the read which is much slower than
mysql (20s around). I read some discussion online and someone suggest
to make multiple connections to make it faster. But I am not sure how
to do it, do I need to change my storage setting file or just change
the java client code?

Here is my read code,

Properties info = new Properties();
info.put(DriverManager.CONSISTENCY_LEVEL,
ConsistencyLevel.ONE.toString());

IConnection connection = DriverManager.getConnection(
"thrift://localhost:9160", info);

// 2. Get a KeySpace by name
IKeySpace keySpace = connection.getKeySpace("Keyspace1");

// 3. Get a ColumnFamily by name
IColumnFamily cf = keySpace.getColumnFamily("Standard2");

ByteArray nameFirst = ByteArray.ofASCII("first");
ICriteria criteria = cf.createCriteria();
long readBytes = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < numOfRecords; i++) {
int n = random.nextInt(numOfRecords);
userName = keySet[n];

criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst,
nameFirst, 10);
Map<String, List<IColumn>> map = criteria.select();
List<IColumn> list = map.get(userName);
ByteArray bloc = list.get(0).getValue();
byte[] byteArrayloc = bloc.toByteArray();
loc = new String(byteArrayloc);
System.out.println(userName+" "+loc);
readBytes = readBytes + loc.length();
}

long finish=System.currentTimeMillis();

I once commented these lines

ByteArray bloc = list.get(0).getValue();
byte[] byteArrayloc = bloc.toByteArray();
loc = new String(byteArrayloc);
System.out.println(userName+" "+loc);
readBytes = readBytes + loc.length();

And the performance doesn't improve much.

Any suggestion is welcome. Thanks,

Dop Sun

unread,
Jun 10, 2010, 4:47:15 PM6/10/10
to jassandra-user
For this question, I suggest you ask in us...@cassandra.apache.org,
since Jassandra is just a client API. What you asked is directly
related with Cassandra, which you can get better answer from the user
group of Cassandra.

Dop Sun

unread,
Jun 10, 2010, 4:51:09 PM6/10/10
to jassandra-user
And based on my understanding, make multiple connections, means doing
it with multi-threading (with different range), and/ or doing the
query against different nodes (setup more than 1 Cassandra nodes in a
cluster, and make multiple thread/ parallel query with different range
to different nodes).

That definitely will help your performance. But again, check with
us...@cassandra.apache.org. Cassandra developers are actively response
requests there.

On Jun 11, 2:44 am, caribbean <caribbean...@gmail.com> wrote:

caribbean

unread,
Jun 10, 2010, 5:33:54 PM6/10/10
to jassandra-user
Thanks, already sent to the address,

On Jun 10, 1:51 pm, Dop Sun <dop...@gmail.com> wrote:
> And based on my understanding, make multiple connections, means doing
> it with multi-threading (with different range), and/ or doing the
> query against different nodes (setup more than 1 Cassandra nodes in a
> cluster, and make multiple thread/ parallel query with different range
> to different nodes).
>
> That definitely will help your performance. But again, check with
> u...@cassandra.apache.org. Cassandra developers are actively response
Reply all
Reply to author
Forward
0 new messages