write_chunk

184 views
Skip to first unread message

Allan Cochrane

unread,
Jun 5, 2015, 10:04:06 AM6/5/15
to mojol...@googlegroups.com
Hi,

I'm having trouble understanding just how to make write_chunk work for me.

My issue is that I'm querying a large database and emitting JSON. Sometimes there are hundreds of thousands of records to write.

My algorithm looks like:

  // $query = DBIx::Simple query that return a DB row as a hash
 while (my $row = $query->hash) {
    my $txt = ... # JSON encode hash
    $controller->write_chunk($txt);
    $count++;
    $self->debug("Written $count/$row_count records") if ($count % 1000) == 0;
  }

But that buffers the JSON strings and never seems to drain so my Perl process grows to a massive size. What I cannot understand is just how to restructure the code above to use a callback which is supposed to drain the buffer and prevent the huge memory usage.

The documentation shows how to write "Hello world" using write_chunk with a callback but it's just a bit too terse for me to work out how to make use of it in the context above. I'm sure it's a simple concept but I just cannot grasp it at the moment.

Thanks,

Allan



Allan Cochrane

unread,
Jun 5, 2015, 1:33:04 PM6/5/15
to mojol...@googlegroups.com
I should also say that the code really looks like this:


  # in the controller:
  $self->write_chunk($initial_json_stuff);
  ...
  # In a model, invoked by controller
  while (my $row = $query->hash) {
    my $txt = ... # JSON encode hash
    $controller->write_chunk($txt);
    $count++;
    $self->debug("Written $count/$row_count records") if ($count % 1000) == 0;
  }
  ... 
  # back in controller
  $self->write_chunk($final_json_stuff);

I have tried a bit, without much success, to turn this into callback driven code but cannot work it out!

As an aside, I don't think I need to actually write in chunks, since Mojolicious' write() method also takes a callback to cause the output buffer to flush.

Allan 

sri

unread,
Jun 5, 2015, 1:45:16 PM6/5/15
to mojol...@googlegroups.com, allan.c...@gmail.com
As an aside, I don't think I need to actually write in chunks, since Mojolicious' write() method also takes a callback to cause the output buffer to flush.

We don't ever flush, that would be a blocking operation. The callback only notifies you that the buffer is empty, so you have to use recursion.

--
sebastian

Allan Cochrane

unread,
Jun 5, 2015, 1:57:08 PM6/5/15
to mojol...@googlegroups.com, allan.c...@gmail.com
Hi,

is there a pattern that I should be following in the example above? Can you point me to example code perhaps?

Thanks,

Allan

sri

unread,
Jun 5, 2015, 2:05:22 PM6/5/15
to mojol...@googlegroups.com, allan.c...@gmail.com
is there a pattern that I should be following in the example above? Can you point me to example code perhaps?

Actually, i always wanted to add an example to the documentation. But a nice one would be using the __SUB__ feature from Perl 5.16. Sadly the community at large has voted against using new Perl features in the documentation, so i'm afraid there's currently no example.

--
sebastian 

Allan Cochrane

unread,
Jun 5, 2015, 2:21:43 PM6/5/15
to mojol...@googlegroups.com, allan.c...@gmail.com
Ok, but if you _could_ add an example, what would it look like?  :-)

Allan

Allan Cochrane

unread,
Jun 8, 2015, 3:58:05 PM6/8/15
to mojol...@googlegroups.com
Hi,

so my code looks like:

# in the controller:
  $self->write_chunk($initial_json_stuff);
  ...
  # In a model, invoked by controller
  $controller->write_chunk($some_json);
  $self->write_results($query);
  # return data to controller

  sub write_results {
    my $cb;
    $cb = sub {
    if(my $row = $query->hash) {
      my $txt = ... # JSON encode hash
      $controller->write_chunk($txt,$cb);
    }
    $cb->();
  }
  ... 
  # back in controller
  $self->write_chunk($final_json_stuff);
  $self->finish();


The problem is that although the $cb anonymous sub is called the correct number of times, the model's sub routine exits and the controller finishes the transaction so the outcome is that only 1 record gets written to the output before the controller completes the transaction.

I can get it to work by having the model write $final_json_stuff

This is extremely annoying as now my model has to know about controller related stuff (e.g. is this a JSONP request), whereas I'd like the model to just write out the raw JSON as an array to an output stream that it's been given and not have to be given extra knowledge about the request or the controller.

Does anybody else have any ideas on how to write in a blocking, non-buffering manner to the output stream?

Thanks,

Allan

Roger Crew

unread,
Jun 8, 2015, 9:59:55 PM6/8/15
to mojol...@googlegroups.com

Your problem (if I'm reading your code right) is that you don't have anything to catch when the query has exhausted itself. Instead it looks like you're immediately calling write_chunk($final_json_stuff) after the first callback returns, which then closes the JSON object being constructed, and when the other rows get sent, the most they'll do is trigger a syntax error at the far end (because they'll be outside of the } that you sent in $final_json_stuff) and if the far end is sufficiently badly coded, it'll just snarf enough of the stream to see a complete JSON object and ignore what follows, and then rows 2-n get silently dropped on the floor.


Meaning the if in your write_results needs a else (i.e., $query->hash returns false signalling that there are no more rows), and that is where you want to write your } or whatever it is you're doing to close out the JSON thingie.


However, I also agree that this stuff really shouldn't be in the model. If we're really doing MVC-separation right, then the question of generating JSON is really the job of the Viewer, what the controller invokes after getting the result from the model....


At which point your problem is a model that returns partial results with a thunk to either get more or indicate we're done -- whether you want to just pass the Mojo::Pg::Results object around or have something more abstract wrapping it is up to you -- but, unless I'm missing something, Mojo doesn't seem to have a notion of "partial template" that can deal with these things (I'm not even sure what it would look like)...


... so you have to fake it in the controller, which I believe will go something like


  # in the model
  sub get_data {
     ...
     return $pg->db->query('SELECT whatever...');
  }

  # in the controller
  $qresult = $model->get_data(...);
  $self->write_chunk('{ ..., "rows": [');
  my $cb;
  my $comma='';
  $cb = sub {
      my $row = eval { $qresult->hash; };
      if($@) {
	  $controller->finish('], "status" : "error", "msg": '.to_json($@).'}');
      }
      elsif ($row) {
	  $controller->write_chunk($comma . to_json($row), $cb);
	  $comma = ',';
      }
      else {
	  $controller->finish('], "status" : "ok" }');
      }
  }
  $cb->();
  $self->render_later;  # not sure if this is necessary
  # DO NOTHING FURTHER HERE

Allan Cochrane

unread,
Jun 8, 2015, 10:27:44 PM6/8/15
to mojol...@googlegroups.com
Hi,

I eventually came up with almost the same code except I pushed the writing down to the model whereas you've pulled it up to the controller. I like the idea of a viewer (aka conductor or presenter) and will see if I can restructure the code to use such a construct.

Unfortunately my DB driver is not Mojo::Pg but is ODBC based and in some cases I'm streaming millions of rows of data. Initially I bundled up a few thousand rows before calling to_json to avoid having millions of callbacks on the stack since it looks like a recursive call to the callback but it doesn't seem to be called recursively.

It is annoying not to have a way to just write to the output stream without resorting to callbacks, in this scenario.

Thanks for your help.

Allan

Roger Crew

unread,
Jun 9, 2015, 1:03:02 AM6/9/15
to mojol...@googlegroups.com
> will see if I can restructure the code to use such a construct.

It would actually be pretty straightforward, in my code, at least, to encapsulate everything from that first write_chunk call down to the initial $cb->() invocation as a helper function -- it's all pretty generic -- and then that's essentially your "viewer".

> Unfortunately my DB driver is not Mojo::Pg

Doesn't matter.  For our purposes here, Mojo::Pg::Result is just like a DBI statement handle, so if you're using DBI directly, just use those, call ->fetchrow_hashref instead of ->hash, and so on...

Where use of  Mojo::Pg would matter is if you wanted to do things backwards and instead of the controller pulling data from the model, you have the model pushing data to the controller (sort of like in your code, but instead of explicitly passing the controller in you just pass in a callback sub that accepts rows of data or an "I'm done now" and issues the write_chunk/finish calls accordingly -- point being that this sub then lives on the controller side and once again you're keeping the write_chunk crap out of the model),... at which point you need a database driver that's able to send data back asynchronously, which is a Postgresql feature that's being taken advantage of in Mojo::Pg.  Other DB drivers may have something similar but you'd have to dig that out yourself

sri

unread,
Jun 9, 2015, 4:02:17 AM6/9/15
to mojol...@googlegroups.com, allan.c...@gmail.com
It is annoying not to have a way to just write to the output stream without resorting to callbacks, in this scenario.

The cost of scalability. If you could perform blocking writes to the output stream, you would only be able to handle one request at a time per worker process, instead of thousands.

--
sebastian 

Allan Cochrane

unread,
Jun 9, 2015, 8:46:55 AM6/9/15
to mojol...@googlegroups.com, allan.c...@gmail.com
Hi,

can I suggest that something akin to Roger's comment:

# DO NOTHING FURTHER HERE, MAY BE IGNORED BY CLIENT

be added to the hello world example in the documentation as it might help folk understand that all subsequent output must be done in the callback to appear in the stream in the correct place.

The examples show what to do to write non-blocking via a callback but there's a higher level concept that's missing and IMHO needs further explanation.

Allan

sri

unread,
Jun 9, 2015, 8:51:16 AM6/9/15
to mojol...@googlegroups.com, allan.c...@gmail.com
can I suggest that something akin to Roger's comment:

# DO NOTHING FURTHER HERE, MAY BE IGNORED BY CLIENT

be added to the hello world example in the documentation as it might help folk understand that all subsequent output must be done in the callback to appear in the stream in the correct place.

That sounds wrong, you might be misunderstanding a few things.

--
sebastian 

Allan Cochrane

unread,
Jun 9, 2015, 9:51:37 AM6/9/15
to mojol...@googlegroups.com, allan.c...@gmail.com
In what way does it sound wrong? Where is my misunderstanding?

Using https://gist.github.com/AllanCochrane/1dd34f0fd3649ece7689 as an example, the callback code looks recursive (and you alluded to that in a comment above) but it actually isn't. My naive interpretation is that the callback immediately invokes the the callback upon writing the text and the final write is not performed until the callbacks have all completed.

Hence some explanatory text would help the reader understand the real execution flow.

If I can misunderstand then I'm sure others do too (or am I the only one? :-)

Allan

sri

unread,
Jun 9, 2015, 10:01:42 AM6/9/15
to mojol...@googlegroups.com, allan.c...@gmail.com
Using https://gist.github.com/AllanCochrane/1dd34f0fd3649ece7689 as an example, the callback code looks recursive (and you alluded to that in a comment above) but it actually isn't.

Of course it is recursive. It appears you're not familiar with event loops yet, because then that example would make perfect sense. The callback is only invoked once the data has actually been written. Also, your example is leaking memory, you might want to get familiar with Devel::Cycle as well.

--
sebastian

Roger Crew

unread,
Jun 10, 2015, 6:51:35 PM6/10/15
to mojol...@googlegroups.com


On Tuesday, June 9, 2015 at 5:46:55 AM UTC-7, Allan Cochrane wrote:
Hi,

can I suggest that something akin to Roger's comment:

# DO NOTHING FURTHER HERE, MAY BE IGNORED BY CLIENT

No.  What I actually meant was this:

# DO NOTHING FURTHER HERE; WE HAVE TO WAIT FOR THE CALLBACKS TO ACTUALLY GET CALLED

Allan Cochrane

unread,
Jun 10, 2015, 10:47:11 PM6/10/15
to mojol...@googlegroups.com
Hi,

just nit-picking here but since the callbacks do not actually invoke themselves is it really recursion?

To my mind the callback runs and as part of its functionality schedules itself for later execution, it doesn't invoke itself directly as 'classical' recursion.

Can anyone explain where my thinking has gone wrong?

Allan

Allan Cochrane

unread,
Jun 10, 2015, 10:48:52 PM6/10/15
to mojol...@googlegroups.com
Hi,


On Wednesday, 10 June 2015 17:51:35 UTC-5, Roger Crew wrote:

No.  What I actually meant was this:

# DO NOTHING FURTHER HERE; WE HAVE TO WAIT FOR THE CALLBACKS TO ACTUALLY GET CALLED


that sounds better than what I had written

Thanks,

Allan 

Roger Crew

unread,
Jun 11, 2015, 2:33:03 AM6/11/15
to mojol...@googlegroups.com
> To my mind the callback runs and as part of its functionality schedules itself for later execution, it doesn't invoke itself directly as 'classical' recursion.

This much is correct, i.e., you're only creating a closure once, and all calls to it except for the first are coming from the event loop (i.e., each time the write buffer is emptied, the callback is invoked).  However there is still recursion in the sense that the function is referenced from within its own body, which means there's a loop of pointers ($cb refers to the sub and the sub refers to $cb) that the reference-counting garbage collector will never free up unless you do something to break it explicitly.(weaken($cb) but do it after that first call.)

Allan Cochrane

unread,
Jun 11, 2015, 9:48:51 AM6/11/15
to mojol...@googlegroups.com


On Thursday, 11 June 2015 01:33:03 UTC-5, Roger Crew wrote:
> To my mind the callback runs and as part of its functionality schedules itself for later execution, it doesn't invoke itself directly as 'classical' recursion.

This much is correct, i.e., you're only creating a closure once, and all calls to it except for the first are coming from the event loop (i.e., each time the write buffer is emptied, the callback is invoked).  However there is still recursion in the sense that the function is referenced from within its own body, which means there's a loop of pointers ($cb refers to the sub and the sub refers to $cb) that the reference-counting garbage collector will never free up unless you do something to break it explicitly.(weaken($cb) but do it after that first call.)



Ah, that's where the recursion is, thanks.

Allan
 

Daniel Mantovani

unread,
Jun 11, 2015, 6:59:39 PM6/11/15
to mojol...@googlegroups.com
A few weeks ago I run in a similar problem trying to repetitively run a group of steps for a simple get operation without blocking the controller.

After receiving some advices from the #mojo irc channel folks I end up with a plugin that should be able to repeat a set of steps without block the controller several times, controlled by a flag that allows you to stop and finish the rendering.


You should have a working postgresql installation, and a "test" named database that your username can access. Postgresql is used just as an example here, to have some query to retrieve several times before the controller finish the operation.

It has some tests, you should be able to pass them.

Important routes are:

get /blocking/number-of-records-to-retrieve,  and
get  /non-blocking/number-of-records-to-retrieve

you should be able to run this two cases:

$> ./pg_nb2.pl get /blocking/10000

and

$> ./pg_nb2.pl get /non-blocking/10000

and you will note that the non-blocking version starts right away, while the blocking version needs to buffer everything before getting the output.

You should be able to get /non-blocking/40000 for instance and your records will start flowing inmediatelly. (Unfortunatelly in my laptop when I try to get 40000 blocking I get a timeout, despite I set the timeout to 300 seconds, have to take a look on that). With non-blocking I tried 100000 and also works fine, but of course you will need to wait a lot more to get the full 100k records. I also tryied 1000000 records but didn't have the patience to wait for it, :(, it looked fine after 300k something records that I interrupted it anyway. 

Hope the example helps. It prints some extra information with each record (PID of the Mojo daemon, PID of Mojo::Pg, some timers) that you probably don't need.
Reply all
Reply to author
Forward
0 new messages