csv export failure

37 views
Skip to first unread message

Sebastien Billard

unread,
Sep 17, 2015, 11:56:34 AM9/17/15
to All Our Ideas
Delayed job log looks like this:

2015-09-17T11:52:05-0400: [Worker(delayed_job host:ip-XXX.internal pid:2218)] MungeAndNotifyJob failed with RuntimeError: Export URL returned response code of 404 for http://domain.xxx/exports/wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI - 1 failed attempts

2015-09-17T11:52:05-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] MungeAndNotifyJob failed with RuntimeError: Export URL returned response code of 404 for http://domain.xxx/exports/wikisurvey_24_ideas_2015-09-17T15:42:12Z_BqVZ - 4 failed attempts

2015-09-17T11:52:05-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] PERMANENTLY removing MungeAndNotifyJob because of 5 consecutive failures.

2015-09-17T11:52:05-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] Running on_permanent_failure hook

2015-09-17T11:52:07-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] 3 jobs processed at 1.3570 j/s, 3 failed ...

2015-09-17T11:53:07-0400: [Worker(delayed_job host:ip-xxx.ec2.internal pid:2218)] MungeAndNotifyJob failed with RuntimeError: Export URL returned response code of 404 for http://domain.xxx/exports/wikisurvey_52_ideas_2015-09-17T15:50:13Z_EerI - 2 failed attempts

2015-09-17T11:53:07-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] MungeAndNotifyJob failed with RuntimeError: Export URL returned response code of 404 for http://domain.xxx/exports/wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI - 2 failed attempts

2015-09-17T11:53:07-0400: [Worker(delayed_job host:ip-xxx.internal pid:2218)] 2 jobs processed at 5.7691 j/s, 2 failed ...


It was working fine, an it just stopped working on August 26th.  I can't figure out why, or what changed on that date.

I've restarted the services and he delayed jobs, and still nothing.


Sebastien Billard

unread,
Sep 17, 2015, 12:13:33 PM9/17/15
to All Our Ideas
Additional info:
 

Processing ExportsController#show (for  at 2015-09-17 12:00:10) [GET]

  Parameters: {"id"=>"wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI"}

Completed in 2ms (View: 0, DB: 0) | 404 Not Found [http://domain.xxxexports/wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI]



Processing ExportsController#show (for  at 2015-09-17 12:08:35) [GET]

  Parameters: {"id"=>"wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI"}

Redirected to http://pairwise.gointengo.com/session/new

Filter chain halted as [:authenticate] rendered_or_redirected.

Completed in 1ms (DB: 0) | 302 Found [http://domain.xxx/exports/wikisurvey_52_ideas_2015-09-17T15:50:39Z_BDQI]


Luke Baker

unread,
Sep 17, 2015, 4:39:59 PM9/17/15
to allou...@googlegroups.com
Sebastien,

Several of those were failed attempts that will get retried. It looks
like there was only one that failed the maximum number of times (5)
and was removed. It's hard to tell from these logs, but it is possible
that some of them did succeed.

These jobs that are failing because they are unable to download the
CSV from the pairwise service, so you'll want to inspect the logs from
the pairwise service to figure out why it isn't generating the
expected CSV files.

Here's a quick overview of the architecture of the CSV export feature:

1. A request comes in from an admin on the AOI site to generate the CSV file.
2. The AOI site contacts the pairwise server and tells it to generate
a CSV file. The pairwise and AOI server agree on a name for this CSV
file.
3. The pairwise site creates a delayed job that will actually create
the CSV file.
4. The AOI site creates a delayed job in its database for fetching the
generated CSV from pairwise and modifying it before notifying the
user.

The delayed_job in #3 needs to complete before #4 can successfully
fetch the CSV file. The logs you've included are showing that job
created in #4 is unable to fetch the CSV file from the pairwise
service. This could be because the CSV was never generated by the
pairwise service or taking too long to be generated.

Does that help you in your work in debugging the problem?

Luke
> --
> You received this message because you are subscribed to the Google Groups
> "All Our Ideas" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to allourideas...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Luke Baker

unread,
Sep 18, 2015, 10:18:55 AM9/18/15
to allou...@googlegroups.com
Sebastien,

I've updated our wiki to include this information and possibly a few
more details. You can find there here:

https://github.com/allourideas/allourideas.org/wiki/CSV-Data-Exports

Luke

Kyle Burnam

unread,
Sep 22, 2015, 12:17:53 PM9/22/15
to All Our Ideas
Luke, thanks for the suggestions. I'm working with Sebastien to try and troubleshoot this on our end, but I'll warn you that I'm more of an an end user than an admin! Here's what I'm seeing:
  • All exports started to fail on 8/26/15 at 2:34pm MT. This was after a successful export at 2:25pm MT. There was no change to the codebase during this 9 minute interval - it just stopped working.
  • Since the issue occurred, two types of actions are consistently getting stuck in delayed_jobs. The first is the export (all types), the second is add_prompt_to_queue, which I believe is the method that increments prompts_on_left_count and prompts_on_right_count. Wins and losses are being incremented properly, but left/right prompts are not, and oddly seem to increment at random until both equal exactly (# of choices associated with the question minus 1). For example, in one experiment, 64 choices were associated with a question. After the votes came in, for all 64 choices the prompts_on_left_count and prompts_on_right_count was 63. In another experiment with 9 choices the prompts_on_left_count and prompts_on_right_count for each choice was 8.
  • All entries to pairwise-api/log/delayed_job.log ended on 8/26. The only entries since appear to be the result of Sebastien's attempts to restart the job worker.
It would appear, based on your previous note, that all four steps are happening, however the CSV file is not being generated. A delayed job is being created on the delayed_jobs table within allourideas_production for each request, but each attempt fails and nothing changes on the exports table.

Not sure if any of this helps? Very much appreciate your input.

Best,

Kyle

Paul Barrick

unread,
Sep 22, 2015, 1:53:30 PM9/22/15
to All Our Ideas


Hey Luke, 
I'm the lead developer on the team. Im trying to assist with debugging things. 

Is there an IRC channel or quicker way I can ping you on so I can help figure things out?

Luke Baker

unread,
Sep 23, 2015, 10:33:39 AM9/23/15
to allou...@googlegroups.com
It sounds like your delayed jobs are being processed properly on AOI
but not pairwise. I think you'll need to investigate why that process
isn't running the delayed_jobs properly. Perhaps you need to restart
the process or investigate what error it might be encountering when
running the delayed jobs. First, I'd check to see if the delayed_jobs
in the pairwise database show any indication that they were attempted.
There should be a column for attempts and a column that stores errors
that occurred when running the delayed job. If you don't see any
attempts, then your process isn't attempting those delayed jobs and
you should investigate why that is the case.

Luke

Luke Baker

unread,
Sep 23, 2015, 10:35:19 AM9/23/15
to allou...@googlegroups.com
Paul,

No, we don't have any IRC or chat system setup for supporting All Our
Ideas installations. This email group is the best place for that sort
of stuff.

Luke

Paul Barrick

unread,
Sep 23, 2015, 10:40:48 AM9/23/15
to All Our Ideas
Hi Luke, 

That does seem to be the case. The delayed jobs are in the db but I guess they arent running on PW. 
There were 5 attempts and this is the stack trace: 
(eval):3:in `block in invoke_job_with_newrelic_transaction_trace'
(eval):2:in `invoke_job_with_newrelic_transaction_trace'
/usr/local/rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/timeout.rb:58:in `timeout'
/usr/local/rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/benchmark.rb:310:in `realtime'
/usr/local/rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/benchmark.rb:310:in `realtime'
./script/delayed_job:5:in `<main>'

Luke Baker

unread,
Sep 23, 2015, 10:43:02 AM9/23/15
to allou...@googlegroups.com
That stack trace references code in /var/www/allourideas.org. That is
a stack trace from AOI not pairwise.

Luke

Paul Barrick

unread,
Sep 23, 2015, 11:01:02 AM9/23/15
to All Our Ideas

So we confirmed we restarted both the PW and the AOI workers. Is there a way to get more info or some 
other fix we should try?

Luke Baker

unread,
Sep 23, 2015, 12:21:16 PM9/23/15
to allou...@googlegroups.com
It depends on how you're starting / running the workers. There may be
a log file specifically for the delayed_job worker and you may see
logs in the production.log.

Luke

Kyle Burnam

unread,
Sep 23, 2015, 12:50:05 PM9/23/15
to All Our Ideas
Luke, not sure if this helps, but entries to the delayed_jobs log end on 8/26 (the same day exports stopped working).  Here are the last few entries, followed by our attempts to re-start the job worker:

2015-08-26T13:17:12-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] Question#add_prompt_to_queue completed after 0.0062
2015-08-26T13:17:12-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] Question#add_prompt_to_queue completed after 0.0058
2015-08-26T13:17:12-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] 2 jobs processed at 84.6761 j/s, 0 failed ...
2015-08-26T13:17:17-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] Question#add_prompt_to_queue completed after 0.0284
2015-08-26T13:17:17-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] 1 jobs processed at 16.7299 j/s, 0 failed ...
2015-08-26T16:37:28-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2445)] Question#export completed after 6.9199
2015-09-11T12:53:39-0400: [Worker(host:ip-172-31-17-4.ec2.internal pid:2858)] New Relic Ruby Agent Monitoring DJ worker host:ip-172-31-17-4.ec2.internal pid:2858
2015-09-11T12:53:40-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2858)] Starting job worker
2015-09-11T13:34:58-0400: [Worker(host:ip-172-31-17-4.ec2.internal pid:3938)] New Relic Ruby Agent Monitoring DJ worker host:ip-172-31-17-4.ec2.internal pid:3938
2015-09-11T13:34:58-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:3938)] Starting job worker
2015-09-17T11:23:26-0400: [Worker(host:ip-172-31-17-4.ec2.internal pid:6074)] New Relic Ruby Agent Monitoring DJ worker host:ip-172-31-17-4.ec2.internal pid:6074
2015-09-17T11:23:27-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:6074)] Starting job worker
2015-09-17T11:36:40-0400: [Worker(host:ip-172-31-17-4.ec2.internal pid:2149)] New Relic Ruby Agent Monitoring DJ worker host:ip-172-31-17-4.ec2.internal pid:2149
2015-09-17T11:36:40-0400: [Worker(delayed_job host:ip-172-31-17-4.ec2.internal pid:2149)] Starting job worker

And below are the entries from the production.log when the 404 errors began:

Processing QuestionsController#export to xml (for 54.208.117.91 at 2015-08-26 16:37:17) [POST]
  Parameters: {"question"=>{"active"=>false, "choices_count"=>9, "created_at"=>2015-08-21 15:23:45 UTC, "creator_id"=>4527, "id"=>51, "inactive_choices_count"=>0, "information"=>nil, "it_should_autoactivate_ideas"=>false, "local_identifier"=>"1", "name"=>"[QUESTION REDACTED]", "prompts_count"=>72, "show_results"=>true, "site_id"=>1, "tracking"=>nil, "updated_at"=>2015-08-21 15:23:56 UTC, "uses_catchup"=>true, "version"=>2, "votes_count"=>71293, "item_count"=>9}, "key"=>"wikisurvey_51_ideas_2015-08-26T20:37:17Z_PySo", "type"=>"ideas", "id"=>"51"}


Processing QuestionsController#show to xml (for 54.208.117.91 at 2015-08-26 16:37:23) [GET]
  Parameters: {"id"=>"51"}
Completed in 23ms (View: 0, DB: 4) | 200 OK [http://pairwise.gointengo.com/questions/51.xml]


Processing QuestionsController#export to xml (for 54.208.117.91 at 2015-08-26 16:37:23) [POST]
  Parameters: {"question"=>{"active"=>false, "choices_count"=>9, "created_at"=>2015-08-21 15:23:45 UTC, "creator_id"=>4527, "id"=>51, "inactive_choices_count"=>0, "information"=>nil, "it_should_autoactivate_ideas"=>false, "local_identifier"=>"1", "name"=>"[QUESTION REDACTED]", "prompts_count"=>72, "show_results"=>true, "site_id"=>1, "tracking"=>nil, "updated_at"=>2015-08-21 15:23:56 UTC, "uses_catchup"=>true, "version"=>2, "votes_count"=>71293, "item_count"=>9}, "key"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn", "type"=>"votes", "id"=>"51"}


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:37:56) [GET]
  Parameters: {"id"=>"wikisurvey_51_ideas_2015-08-26T20:37:17Z_PySo"}
Sending data 


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:37:59) [GET]
  Parameters: {"id"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn"}


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:38:59) [GET]
  Parameters: {"id"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn"}


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:39:59) [GET]
  Parameters: {"id"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn"}


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:42:00) [GET]
  Parameters: {"id"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn"}


Processing ExportsController#show (for 54.208.117.91 at 2015-08-26 16:47:00) [GET]
  Parameters: {"id"=>"wikisurvey_51_votes_2015-08-26T20:37:23Z_IdAn"}


Processing QuestionsController#show to xml (for 54.208.117.91 at 2015-08-26 16:47:34) [GET]
  Parameters: {"id"=>"51"}
Completed in 4ms (View: 0, DB: 1) | 200 OK [http://pairwise.gointengo.com/questions/51.xml]


As you'll see, I ran a successful Ideas export at 2015-08-26 16:37:56 followed by a Votes export attempt at 2015-08-26 16:37:59. The Votes export failed, and all exports have failed since.

Does this help? We truly appreciate your input, Luke!

Thanks,

Kyle

Luke Baker

unread,
Sep 23, 2015, 12:52:52 PM9/23/15
to allou...@googlegroups.com
You'll need to figure out what the delayed_jobs process is doing. Does
it die? What is it doing if it is running but not taking jobs off the
delayed_job queue? Are there any entries in the production.log when
you start the delayed job worker?

Luke

Paul Barrick

unread,
Sep 23, 2015, 1:15:16 PM9/23/15
to All Our Ideas
Hey Luke, 
I was trying to see certain values inside of some of these methods. 
My route im trying to inspect is: http://ideas.gointengo.com/153/admin 

I think that it is calling the questions_controller admin method. So I did a puts and exit in there to try and see if that is 
whats being called and what certain values are. Instead of seeing values though I just get 

Error - Unknown Server Error

Whoops! Looks like an error occured. Sorry about that.

I tried abort('here') first but that didnt work either. 

Is there a way to see what's going on inside the controller? Thanks.


def admin
puts "here"
exit
    logger.info "@question = Question.find_by_name(#{params[:id]}) ..."
    @earl = Earl.find params[:id]

    unless ((current_user.owns?(@earl)) || current_user.admin?)
      logger.info("Current user is: #{current_user.inspect}")
      flash[:notice] = t('user.not_authorized_error')
      redirect_to( "/#{params[:id]}") and return
    end

    @question = @earl.question
    @partial_results_url = "#{@earl.name}/results"

    @choices = Choice.find(:all, :params => {:question_id => @question.id, :include_inactive => true})

  end

Luke Baker

unread,
Sep 23, 2015, 1:24:06 PM9/23/15
to allou...@googlegroups.com
You can send details to the log file via the logger variable. You can
see that logger.info is already being used in that method.

More details are available here:
http://guides.rubyonrails.org/v2.3.8/debugging_rails_applications.html#the-logger

Luke

Paul Barrick

unread,
Sep 23, 2015, 2:25:10 PM9/23/15
to All Our Ideas

Hi Luke, 

Thanks for all the direction, 
it's been super helpful!

I think we're closer to narrowing down the issue. 
Im trying to restart delayed_job using this syntax: 


It tells me it cant find some gems.

If I run bundle install it tells me the follwing issues: 

Thanks again. We really appreciate your guidance.
 
- Paul

Luke Baker

unread,
Sep 24, 2015, 11:48:42 AM9/24/15
to allou...@googlegroups.com
Paul,

Those errors appear to be in your AOI installation and not pairwise.

The installation requirements[1] suggest Ruby 1.9.3. It looks like
bundler says you're running something less than 1.9.2. You should be
running Ruby 1.9.3.

Luke

[1] https://github.com/allourideas/allourideas.org/wiki/Installing
Reply all
Reply to author
Forward
0 new messages