Potential concurrency issues with a global hash to reduce DB

1 view
Skip to first unread message

Frank Poo

unread,
Aug 17, 2009, 7:56:12 PM8/17/09
to rubyonra...@googlegroups.com

Hi folks:

I'm building something akin to Google Analytics and currently I'm doing
real time database updates. However this isn't really scaling anymore,
so I need to stop doing synchronous DB operations. In the short term,
I'm trying to reduce DB writes, so I'm thinking about a global hash (say
declared in environment.rb) that is accessible from my controllers and
models that I can write to in lieu of writing to the DB. Every so often
I can have a task write the updates that need to be written to the DB.

Question:
1) Does this sound reasonable?
2) Will I run into any concurrency problems? Which ones?
3) How does this compare with writing to the file system and importing
later?

Thanks so much in advance, you ROCK!
--
Posted via http://www.ruby-forum.com/.

Maurício Linhares

unread,
Aug 17, 2009, 8:12:19 PM8/17/09
to rubyonra...@googlegroups.com
If you're not running on rails threadsafe! mode you'll have no
concurrency issues. A messaging queue could also be a solution to your
problem.

-
Maurício Linhares
http://codeshooter.wordpress.com/ | http://twitter.com/mauriciojr

Marnen Laibow-Koser

unread,
Aug 17, 2009, 10:45:39 PM8/17/09
to rubyonra...@googlegroups.com
Frank Poo wrote:
>
>
> Hi folks:
>
> I'm building something akin to Google Analytics and currently I'm doing
> real time database updates. However this isn't really scaling anymore,
> so I need to stop doing synchronous DB operations. In the short term,
> I'm trying to reduce DB writes, so I'm thinking about a global hash (say
> declared in environment.rb) that is accessible from my controllers and
> models that I can write to in lieu of writing to the DB. [...]

> 1) Does this sound reasonable?

No! Databases should scale as much as you need them to -- that's what
they are designed for. If your DB is not scaling, there is something
wrong, either in your DB setup or in the way the app is using the DB.
Find the problem and fix it rather than ripping out the DB.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
mar...@marnen.org

Frank Poo

unread,
Aug 17, 2009, 11:12:53 PM8/17/09
to rubyonra...@googlegroups.com
Marnen: here are some more details. I need to be able to handle at least
10 API requests / second. Let met clarify what happens:

1) User makes a RESTful API request
2) I find a record in a database, return it as JSON
3) I record the request counter for the user in the database (i.e. if I
user makes 2 API calls, I record '2').

#1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow,
because it's an UPDATE. In the real world, my database (MySQL) is NOT
scaling. According to New Relic, #3 is taking most of the time.

Any suggestions?

Marnen Laibow-Koser

unread,
Aug 17, 2009, 11:22:30 PM8/17/09
to rubyonra...@googlegroups.com
Frank Poo wrote:
> Marnen: here are some more details. I need to be able to handle at least
> 10 API requests / second. Let met clarify what happens:
>
> 1) User makes a RESTful API request
> 2) I find a record in a database, return it as JSON
> 3) I record the request counter for the user in the database (i.e. if I
> user makes 2 API calls, I record '2').
>
> #1 and #2 are really fast in SQL - they are SELECTs. #3 is really slow,
> because it's an UPDATE. In the real world, my database (MySQL) is NOT
> scaling.

Well, there's your first problem -- you're using mySQL. :) Try
PostgreSQL instead.

(That was admittedly snarky. Although PostgreSQL is certainly the
better choice, mySQL should still work.)

> According to New Relic, #3 is taking most of the time.

What does that query look like? A simple UPDATE query should not be a
major performance hit, so there must be something else going on.

Frank Poo

unread,
Aug 17, 2009, 11:39:28 PM8/17/09
to rubyonra...@googlegroups.com
The query is just an UPDATE with a COALESCE keyword in it. All it does
is increment counters:

UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`,
?) + ? WHERE (`id` = ?)

This one took 8,553 ms according to New Relic. The weird part is that
there is an index on 'id' (as well as a few other columns) on this
table. I just can't figure out why the query's taking so long ...

Marnen Laibow-Koser

unread,
Aug 17, 2009, 11:45:28 PM8/17/09
to rubyonra...@googlegroups.com
Frank Poo wrote:
> The query is just an UPDATE with a COALESCE keyword in it. All it does
> is increment counters:
>
> UPDATE `statistics_api` SET `count_request` = COALESCE(`count_request`,
> ?) + ? WHERE (`id` = ?)
>
> This one took 8,553 ms according to New Relic. The weird part is that
> there is an index on 'id' (as well as a few other columns) on this
> table. I just can't figure out why the query's taking so long ...

8.5 seconds? For that?!? You might want to ask someone who knows more
about mySQL configuration than I do, but clearly your DB setup has
problems. If you run that query when your DB is not otherwise being hit
by your app, how long does it take?

BTW, what does the first placeholder represent? I'd assume it would
always represent 0, in which case you can hard-code it.

Frank Poo

unread,
Aug 18, 2009, 1:08:13 AM8/18/09
to rubyonra...@googlegroups.com
This query is lightning fast when my server is NOT under load. Just dead
slow when I'm seeing a lot of load. You're right, the first placeholder
is probably redundant..

Greg Willits

unread,
Aug 18, 2009, 3:25:53 AM8/18/09
to rubyonra...@googlegroups.com
> Marnen Laibow-Koser wrote:
>> Frank Poo wrote:
>>> The query is just an UPDATE with a COALESCE keyword in it. All it
>>> does
>>> is increment counters:
>>>
>>> UPDATE `statistics_api` SET `count_request` =
>>> COALESCE(`count_request`,
>>> ?) + ? WHERE (`id` = ?)
>>>
>>> This one took 8,553 ms according to New Relic. The weird part is
>>> that
>>> there is an index on 'id' (as well as a few other columns) on this
>>> table. I just can't figure out why the query's taking so long ...
>>
>> 8.5 seconds? For that?!? You might want to ask someone who knows
>> more
>> about mySQL configuration than I do, but clearly your DB setup has
>> problems. If you run that query when your DB is not otherwise
>> being hit
>> by your app, how long does it take?
>>

> Frank Poo wrote:
> This query is lightning fast when my server is NOT under load. Just
> dead
> slow when I'm seeing a lot of load. You're right, the first
> placeholder
> is probably redundant..


That would likely indicate that the DB is pretty dern busy fulfilling
reads (though someone should analyse that to prove it). When doing
this kind of logging it often becomes necessary to have a second
database which is optimized for writes to take the continual updates
so that your read/write-infrequent data set is left alone. Those log
writes should also be fired off as an asyncronous task so that the
rest of the page/request is not slowed down waiting for the write to
occur.

I agree that yanking the DB in favor of in-memory workload doesn't
seem appropriate yet.

--
def gw
writes_at 'www.railsdev.ws'
end

Frank Poo

unread,
Aug 18, 2009, 8:25:46 AM8/18/09
to rubyonra...@googlegroups.com
Greg, what do you recommend to do async writes? I'm a newbie to async
tasks in Rails, but doesn't something like delayed_job ALSO us the
database to keep track of jobs?

Greg Willits wrote:
>
> That would likely indicate that the DB is pretty dern busy fulfilling
> reads (though someone should analyse that to prove it). When doing
> this kind of logging it often becomes necessary to have a second
> database which is optimized for writes to take the continual updates
> so that your read/write-infrequent data set is left alone. Those log
> writes should also be fired off as an asyncronous task so that the
> rest of the page/request is not slowed down waiting for the write to
> occur.
>
> I agree that yanking the DB in favor of in-memory workload doesn't
> seem appropriate yet.
--

Posted via http://www.ruby-forum.com/.

Marnen Laibow-Koser

unread,
Aug 18, 2009, 11:29:12 AM8/18/09
to rubyonra...@googlegroups.com
Frank Poo wrote:
> Greg, what do you recommend to do async writes? I'm a newbie to async
> tasks in Rails, but doesn't something like delayed_job ALSO us the
> database to keep track of jobs?

It may be possible to use the database's own clustering features to do
this and still let your app treat it as if it were one simple DB. If
not, would the masochism plugin help?

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
mar...@marnen.org

Wojciech Piekutowski

unread,
Aug 19, 2009, 8:42:28 AM8/19/09
to Ruby on Rails: Talk
Which MySQL engine do you use for this table? MyISAM is faster for
read operations, but locks the whole table for an update or an insert
and this might affect the performance a lot. InnoDB engine uses row-
level locking. Note that PostgreSQL does too.

Have you thought about using document DBs like Tokyo Cabinet, MongoDB
or others? They can handle huge loads.

On Aug 18, 5:39 am, Frank Poo <rails-mailing-l...@andreas-s.net>
wrote:

Frank Poo

unread,
Aug 19, 2009, 11:40:57 AM8/19/09
to rubyonra...@googlegroups.com
We use InnoDB. I did some more research on this, and it turns out that
my app was literally overloading the db (non-clustered) with UPDATEs.
Since all these updates do is increment counters, I cached the
increments in memory and flush them every once in a while. This
dramatically raised my app's performance, without the need for a
complicated database setup.

Wojciech Piekutowski wrote:
> Which MySQL engine do you use for this table? MyISAM is faster for
> read operations, but locks the whole table for an update or an insert
> and this might affect the performance a lot. InnoDB engine uses row-
> level locking. Note that PostgreSQL does too.
>
> Have you thought about using document DBs like Tokyo Cabinet, MongoDB
> or others? They can handle huge loads.
>
> On Aug 18, 5:39�am, Frank Poo <rails-mailing-l...@andreas-s.net>

--
Posted via http://www.ruby-forum.com/.

Reply all
Reply to author
Forward
0 new messages