rubyrep hangs with 100% cpu load, and does not replicate anymore

243 views
Skip to first unread message

lownoize

unread,
Mar 19, 2013, 4:47:09 AM3/19/13
to rub...@googlegroups.com
Hello all,

I have a rubyrep setup which replicates 2 databases over a slow high latency wan link.
Everything worked fine for some time, but now rubyrep just hangs from time to time with 100% cpu load.
For me it looks like this happens mostly when i run big updates on the left database, rr_pending_changes fills up with 30000 changes
and jruby/java just hangs.

Has anybody an idea how i could debug this error? If i kill the process and restart rubyrep again the same thing happens after a short time.

Cheers

Christian



I run rubyrep with the following options, and don't see any errors:

JAVA_MEM="-Xmx1280m" ./rubyrep replicate verbose -c rep_ProKom.conf
Verifying RubyRep tables
Checking for and removing rubyrep triggers from unconfigured tables
Verifying rubyrep triggers of configured tables
Starting replication




My configuration looks like this:

RR::Initializer::run do |config|
  config.left = {
    :adapter  => 'postgresql',
    :database => 'db1',
    :username => 'postgres',
    :password => 'postgres',
    :host     => '192.168.2.228',
    :port     => '5432',
    :encoding => 'utf-8'
  }

  config.right = {
    :adapter  => 'postgresql',
    :database => 'db1',
    :username => 'postgres',
    :password => 'postgres',
    :host     => '192.168.1.226',
    :port     => '5432',
    :encoding => 'utf-8',
    :proxy_host => '192.168.1.226',
    :proxy_port => '9876'
  }

  config.options[:database_connection_timeout] = 300
  config.include_tables /./
  config.options[:initial_sync] = :true
  config.options[:rep_prefix] = 'rr'
  config.options[:table_ordering] = :true
  config.options[:adjust_sequences] = :true
  config.options[:sequence_increment] = 2
  config.options[:left_sequence_offset] = 0
  config.options[:right_sequence_offset] = 1
  config.options[:replication_interval] = 1
  config.options[:auto_key_limit] = 2
  config.options[:replication_conflict_handling] = :left_wins
end

lownoize

unread,
Mar 25, 2013, 4:08:56 AM3/25/13
to rub...@googlegroups.com
Hello,

now I have some more information:
Today  I have started with a clean setup, (dumped database from left and restored it on the right),

After that i started rubyrep replicate and everything worked fine for a few hours, then i got the following exception and the process hangs again with 100% cpu load.

Starting replication
2013-03-25T07:44:35+01:00 Exception caught: undefined method `next?' for nil:NilClass
                          (druby://192.168.1.226:9876) org/jruby/RubyKernel.java:2051:in `send'
                          (druby://192.168.1.226:9876) /usr/local/rubyrep-1.2.0/./jruby/lib/ruby/1.8/drb/drb.rb:1593:in `perform_without_block'
                          (druby://192.168.1.226:9876) /usr/local/rubyrep-1.2.0/./jruby/lib/ruby/1.8/drb/drb.rb:1553:in `perform'
                          (druby://192.168.1.226:9876) /usr/local/rubyrep-1.2.0/./jruby/lib/ruby/1.8/drb/drb.rb:1627:in `main_loop'
                          (druby://192.168.1.226:9876) org/jruby/RubyKernel.java:1418:in `loop'
                          (druby://192.168.1.226:9876) /usr/local/rubyrep-1.2.0/./jruby/lib/ruby/1.8/drb/drb.rb:1623:in `main_loop'
                          (druby://192.168.1.226:9876) org/jruby/RubyProc.java:268:in `call'
                          (druby://192.168.1.226:9876) org/jruby/RubyProc.java:232:in `call'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/noisy_connection.rb:21:in `next?'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/logged_change_loader.rb:112:in `update'
                          AbstractScript.java:41:in `(root)'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/logged_change_loader.rb:30:in `update'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/logged_change_loader.rb:30:in `each'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/logged_change_loader.rb:30:in `update'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_run.rb:52:in `load_difference'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_run.rb:85:in `run'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_run.rb:83:in `loop'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_run.rb:83:in `run'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_run.rb:80:in `run'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/replication_runner.rb:123:in `execute_once'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/task_sweeper.rb:14:in `timeout'
                          /usr/local/rubyrep-1.2.0/lib/rubyrep/task_sweeper.rb:62:in `timeout'
                          org/jruby/RubyProc.java:268:in `call'
                          org/jruby/RubyProc.java:232:in `call'
2013-03-25T08:34:10+01:00 Exception caught: undefined method `next?' for nil:NilClass


When I try to restart rubyrep afterwards, the process hangs and rr_pending_changes fills up and nothing gets replicated.

I'm running postgresql 9.2.1 and rubyrep 1.2.0

Has anybody a clue whats wrong here?


Cheers

Christian
Reply all
Reply to author
Forward
0 new messages