Memory leak problem with driver for c#

185 views
Skip to first unread message

Jonathan D. Monroy

unread,
Jan 4, 2017, 2:13:17 PM1/4/17
to DataStax C# Driver for Apache Cassandra User Mailing List
Hi everyone, thanks in advance for your responses, this is my first post and I am a beginner using this amazing database engine:

 I have published an application to production enviroment in my company, We use an App Service from Windows Azure and I realized that the service started to have a high memory usage, we scale out our service to more instances but it was not enough, 
our main approach is save lots of information daily for that reason we need to have a high availability.

In summary, our save function is something like this:

        public void Save(T entity)
       
{
           
try
           
{
               
string insertQuery = SwCqlQueryBuilder.GenerateInsertQuery(entity, _conn._session.Keyspace, _table.ToString());
                _conn
._session.Execute(insertQuery);
           
}
           
catch (Exception)
           
{
               
           
}
       
}

I debugged the Cassandra dll and I found a class called MapperFactory.cs which save some kind of cache Dictionaries;

        private readonly ConcurrentDictionary<Tuple<Type, string>, Delegate> _mapperFuncCache;
       
private readonly ConcurrentDictionary<Tuple<Type, string>, Delegate> _valueCollectorFuncCache;

Also, there is a function called GetMappers() which is executed every time we call the execute method from the session object:

        public Func<Row, T> GetMapper<T>(string cql, RowSet rows)
       
{
           
Tuple<Type, string> key = Tuple.Create(typeof(T), cql);
           
Delegate mapperFunc = _mapperFuncCache.GetOrAdd(key, _ => CreateMapper<T>(rows));
           
return (Func<Row, T>)mapperFunc;
       
}

The problem origin was the _mapperFunctCache because it store every "insert query" even if it's not a prepared statement then that dictionary started to increase consuming all the memory,
The temporary solution was clean periodically the dictionary then our memory consumption decreased, I would like to ask for some help in this context:

     • Is it necessary use prepared statements as a rule?
     • What would you recommend? What are we doing wrong?
     • Is it could be considered as a bug in the cassandra dll?

Thanks in advance and regards.

Jonathan D. Monroy

unread,
Jan 5, 2017, 12:24:48 PM1/5/17
to DataStax C# Driver for Apache Cassandra User Mailing List
As a correction, the problem is with selects not with inserts as I mention, the method is this:

   
public CassyResult<T> GetPage(CassyQueryBag queryBag, CassySearchCriteria sc)
       
{
           
using (var mapper = new SwMapper(_conn._session))
           
{
               
Int64 total = _conn._session.Execute(queryBag.CountQuery).First().GetValue<Int64>("count");
               
int totalPages = ((int)total + sc.PageSize - 1) / sc.PageSize;
               
Cql cq = new Cql(queryBag.Queries.First());
               
//cq.QueryOptions.DoNotPrepare();
               
var items = mapper.Fetch<T>(cq);


               
return new CassyResult<T>()
               
{
                   
Total = total,
                   
Items = sc.CurrentPage < 1 ? new List<T>() : items.ToList(),
                   
PagingStates = null,
                   
Page = sc.CurrentPage,
                   
TotalPages = totalPages,
                   
Results = items.Count(),
                   
PageSize = sc.PageSize
               
};
           
}
       
}

We implement the Mapper class to make some validation queries during our processes.

Thanks.

Jorge Bay Gondra

unread,
Jan 9, 2017, 3:33:28 AM1/9/17
to csharp-dr...@lists.datastax.com
Hi,
Sorry for the late reply, the mapper will use prepared statement by default as prevents the query to be parsed each time server side.
As stated in the docs, you should avoid stringifying the parameters in your queries, you should use query markers (`?`) instead:

Instead of (bad):
var users = mapper.Fetch<T>("SELECT id, email FROM users where country = 'UK'");

Use query markers:
var users = mapper.Fetch<T>("SELECT id, email FROM users where country = ?", "UK");

That way, the Mapper will internally prepare 1 query "SELECT id, email FROM users where country = ?" and bound it to different parameters each time, maintaining memory consumption constrained.

Maybe we should state it more clearly in the docs? The current docs states: 

When using parameters, use query markers (?) instead of hardcoded stringified values, this improves serialization performance and lower memory consumption.

Saying something like

The Mapper maintains a dictionary of prepared queries, you should limit the amount of different queries sent through the Mapper: when using parameters, use query markers (?) instead of hardcoded stringified values, this improves serialization performance and lowers memory consumption.

What do you think?

Thanks,
Jorge

--
You received this message because you are subscribed to the Google Groups "DataStax C# Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csharp-driver-user+unsub...@lists.datastax.com.

Jonathan D. Monroy

unread,
Jan 9, 2017, 1:22:28 PM1/9/17
to DataStax C# Driver for Apache Cassandra User Mailing List
Hi Jorge, thanks for your response. We will check your solution doing some stress tests in order to watch the memory consumption behaviour, we didn't use prepared statements at the begining because we didn't see a difference doing our queries in that way also is still not clear what is the reason of using dictionaries to store the queries (I would like to know a little more about how cassandra works internally).

Thanks again, we will make some changes following your recommendation and I will write soon with our results.

Regards.
Reply all
Reply to author
Forward
0 new messages