Process waiting for action from user

29 views
Skip to first unread message

Adrien Kohlbecker

unread,
Apr 4, 2013, 11:51:55 AM4/4/13
to openwfe...@googlegroups.com
Hello John,

We are still slowly migrating our receipt processing to ruote and we hit the following roadblock :

We have a master process, launched by an API call from the user
Right after that call, the user uploads one or more pictures asynchronously. Each picture is processed independently.
After each picture is processed, the master process should resume

I was looking at the way you implemented AlertParticipant in ruote-amqp but I don't think it works for our use case :
- The pictures could finish uploading in two month, if the user closes our app and does not reopen it right away, or in the worst case never, if the user never reopens the app, so we can't have each AlertParticipant polling a queue until uploaded
- If the user is on a fast connection, the upload could finish before our backend has had time to launch the master process, so we need a way to cache the notifications until they are ready to be processed.

The issue here is that the subprocess is not launched by ruote so there is no workitem to store and receive once the upload is done (especially since the reply could happen before any workitem exists)

How would you go about implementing something like that ?

Regards,
Adrien

John Mettraux

unread,
Apr 4, 2013, 5:38:41 PM4/4/13
to openwfe...@googlegroups.com
Hello Adrien,

how do you correlate the pictures and the main process? I guess there is some
kind of shared identifier.

You could start the process after the picture upload. Ok, it doesn't work if
you want the user to deal with the process while the pictures load...

Back to the equivalent of an AlertParticipant. How about placing a timeout
attribute on it?

http://ruote.rubyforge.org/common_attributes.html#timeout

This could help solve the "it took 2 months" issue. That doesn't solve the
"it's already here" one.

What about having a participant that looks like:

```
class ImageGrabber < Ruote::Participant

def on_workitem

# somehow a blocking call...
# returns as soon as the pictures are done uploading
# returns immediately if the pictures are already uploaded
#
workitem.fields['images'] = ImageUploadService.fetch_uris(case_id)

reply
end

# probably called on_timeout
#
def on_cancel

# kills the pending .fetch_uris if present
#
ImageUploadService.stop_fetching_uris(case_id)
end

protected

# use the ruote wfid as correlation id
#
def case_id

workitem.wfid
end
end
```
?

The pictures are uploaded somewhere in some service. Let's communicate with
that service.

You could also bypass the "let's wait for the pictures to upload" participant
and consider that the pictures are there (unless you do OCR, the pictures are
probably not worth much for the application). You could complain later on,
when you notice that the pictures are not here yet.

I guess the trick is not to wait for some notifications about the images, but
to poll for them. Could that participant be summarized as "wait until the
pictures are uploaded"?

<parentheses>

Ruote is not meant to build wizards, you'd better gather all the info for the
launch step and then launch. The same can be said for any "human" step,
Trying to piggyback a ruote process with a web wizard is hard. Having a
dedicated screen (set of screens) for a ruote "human" step is better.

```
sequence do
fill_in_form_details
end
```

is better than

```
sequence do
fill_in_form_name
fill_in_form_address
fill_in_form_whatever
# ...
end
```

Of course, that second flow makes sense if the human participant is different
at each step...

</parentheses>

Ah, a warning, do not put the pictures themselves in the workitem or in queue
messages, place the URIs or whatever id of the picture. Sorry, it may already
be obvious to you.


I hope that this confusing mix will help anyway, best regards,

--
John Mettraux - http://lambda.io/jmettraux

Adrien Kohlbecker

unread,
Apr 4, 2013, 6:38:25 PM4/4/13
to openwfe...@googlegroups.com
Hello John,

We are doing OCR, so we need the pictures to be there :) But besides the initial API call (receipt creation) and the upload of the pictures, there is no user interaction at this point in the process. 

I've given some thought about what you suggested, but I still see some flaws :
- If we have Ruote poll our database for the status of the upload, we may hit some performance problems, as we would have 10-20 threads polling our db during peak time. 
- If we implement a timeout in the participant, the master process will fail, and then if a user reopens the app and the upload restarts the master process will not pick it up (correct me if I'm wrong regarding the timeout implementation). This regularly happens, especially with our iOS app in which we can't upload in the background.

I guess the issue here is that we are trying to be as fast as possible with the OCR so we need to treat each picture as soon as they are uploaded and in parallel.

On the other hand, if a single thread is polling the db then it's possible, what about a special kind of Receiver that knows "these are the waiting participants with their picture id, and these are the uploaded pictures in the last X seconds" ?

From another angle, we could also do the synchronization after the processing of each each picture, which could be easier to implement ? If each upload launches a separate ruote process, is there a way for these subprocesses to communicate with a master process inside Ruote ? They are still not launched by the master process, but this time everything happens inside Ruote

I think that having a way to do inter-process communication inside ruote could be nice to have, what do you think ?

Best regards,
Adrien

John Mettraux

unread,
Apr 4, 2013, 8:43:29 PM4/4/13
to openwfe...@googlegroups.com

On Thu, Apr 04, 2013 at 03:38:25PM -0700, Adrien Kohlbecker wrote:
>
> We are doing OCR, so we need the pictures to be there :) But besides the
> initial API call (receipt creation) and the upload of the pictures, there
> is no user interaction at this point in the process.

Hello,

so why not start the process when the pictures are uploaded?


> I've given some thought about what you suggested, but I still see some
> flaws :
>
> - If we have Ruote poll our database for the status of the upload, we may
> hit some performance problems, as we would have 10-20 threads polling our
> db during peak time.
>
> - If we implement a timeout in the participant, the master process will
> fail, and then if a user reopens the app and the upload restarts the master
> process will not pick it up (correct me if I'm wrong regarding the timeout
> implementation). This regularly happens, especially with our iOS app in
> which we can't upload in the background.

What do you currently have in place for the uploads?

Can it be wrapped in a service that ruote talks to via a participant (and/or
a receiver as you suggested below)?


> I guess the issue here is that we are trying to be as fast as possible with
> the OCR so we need to treat each picture as soon as they are uploaded and
> in parallel.

OK.


> On the other hand, if a single thread is polling the db then it's possible,
> what about a special kind of Receiver that knows "these are the waiting
> participants with their picture id, and these are the uploaded pictures in
> the last X seconds" ?

Yes, much better than my Participant + ImageUploadService suggestion. It
would have to survive a restart though. That knowledge in the receiver is a
cache somehow. When the participant "arrives" it may be allowed a db query
(cache miss, pass through) to see if the images are really not there (the
other case would be the images are there, but the receiver forgot about
them).


> From another angle, we could also do the synchronization after the
> processing of each each picture, which could be easier to implement ? If
> each upload launches a separate ruote process, is there a way for these
> subprocesses to communicate with a master process inside Ruote ? They are
> still not launched by the master process, but this time everything happens
> inside Ruote
>
> I think that having a way to do inter-process communication inside ruote
> could be nice to have, what do you think ?

There is the "listen" expression that brings some kind of inter-process
communication, although participants [+ receivers] are probably better fits,
they are sitting between ruote processes and external systems (other ruote
processes are external systems somehow).

```
RuoteEngine = Ruote::Dashboard.new(...)

main_flow =
Ruote.define do
wait_for_images
# ... the rest
end

upload_flow =
Ruote.define do
wait_for_upload
perform_ocr
notify_main_flow
end

main_wfid =
RuoteEngine.launch(
main_flow, 'images' => images)

images.each do |image_info|
RuoteEngine.launch(
upload_flow, 'main_wfid' => main_wfid, 'image_info' => image_info)
end

#
# OR
#

main_flow =
Ruote.define do
iterator :on => 'f:images', :to => 'f:image_info' do
wait_for_upload
perform_ocr
notify_main_flow
end
# ... the rest
end

# ...
```

I prefer the first version, smaller processes interacting.

So many ways to skin a cat.

Though I still prefer launching the main flow when the images are uploaded
and OCRized.

Not sure if it's a case of "hey ruote needs to be adapted to our needs",
sounds more like the classical case of "let's clearly define services so that
ruote and other components in our architecture can leverage them".

Please remember that I know nothing of your
architecture/intentions/requirements, just letting my imagination loose.

Thanks for reminding me of the receivers.


Best regards,

Adrien Kohlbecker

unread,
Apr 5, 2013, 11:06:12 AM4/5/13
to openwfe...@googlegroups.com
Hello John, 

So I worked on this issue today and this is what I implemented :

A synchronization mechanism between two processes (1-1 relation) that only let the processes continue when both processes have reached the same synchronization participant. 
It works by storing the synchronization key along with the first workitem to reach this point. Once a second participant is called with the same key, we fetch the workitem from the storage, receive it and thus resume the first process, and we reply immediately to the second one, continuing the second process.

So there are no db hits when there is no activity, and it does not matter which process gets to the synchronization first :)

master process 

    pdef = Ruote.process_definition :name => 'receipt_processing' do

      concurrent_iterator :on_field => 'receipt_subpicture_uuids', :to_var => 'uuid' do
        synchronize :key => "processed_image-${v:uuid}"
        do_something_with_the_ocr_results
      end

      rest of master process...

    end

child processes

    pdef = Ruote.process_definition :name => 'receipt_subpicture_processing' do

      process_image
      synchronize :key =>"processed_image-${f:receipt_subpicture_uuid}"

    end


Best regards,
Adrien


--
--
you received this message because you are subscribed to the "ruote users" group.
to post : send email to openwfe...@googlegroups.com
to unsubscribe : send email to openwferu-use...@googlegroups.com
more options : http://groups.google.com/group/openwferu-users?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "ruote" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openwferu-users/Y75D_6cXf3M/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to openwferu-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



John Mettraux

unread,
Apr 5, 2013, 6:02:41 PM4/5/13
to openwfe...@googlegroups.com

On Fri, Apr 05, 2013 at 05:06:12PM +0200, Adrien Kohlbecker wrote:
>
> So I worked on this issue today and this is what I implemented :
>
> A synchronization mechanism between two processes (1-1 relation) that only
> let the processes continue when both processes have reached the same
> synchronization participant.
> It works by storing the synchronization key along with the first workitem
> to reach this point. Once a second participant is called with the same key,
> we fetch the workitem from the storage, receive it and thus resume the
> first process, and we reply immediately to the second one, continuing the
> second process.
>
> So there are no db hits when there is no activity, and it does not matter
> which process gets to the synchronization first :)

Hello Adrien,

> *master process *
>
> pdef = Ruote.process_definition :name => 'receipt_processing' do
>
> concurrent_iterator :on_field => 'receipt_subpicture_uuids', :to_var
> => 'uuid' do
> synchronize :key => "processed_image-${v:uuid}"
> do_something_with_the_ocr_results
> end
>
> rest of master process...
>
> end
>
> *child processes*
>
> pdef = Ruote.process_definition :name =>
> 'receipt_subpicture_processing' do
>
> synchronize :key =>"processed_image-${f:receipt_subpicture_uuid}"
>
> end

Which is equivalent to

```
# *master process *

pdef = Ruote.process_definition :name => 'receipt_processing' do

concurrent_iterator :on_field => 'receipt_subpicture_uuids', :to_var
=> 'uuid' do
process_image
do_something_with_the_ocr_results
end

# rest of master process...
end
```

can also be split into process and subprocess

```
# *master process *

pdef = Ruote.process_definition :name => 'process_receipts' do

define 'process_receipt_pictures' do
citerator :on_f => 'receipt_subpicture_uuids', :to => 'v:uuid' do
process_image
do_something_with_the_ocr_results
end
end

process_receipt_pictures

# rest of master process...
end
```

"So there are no db hits when there is no activity, and it does not matter
which process gets to the synchronization first"

And still, the master process doesn't really start until all the images got
OCRized. You could split into two processes but instead of synchronizing
them, you make the process_pictures process start the process_receipts
process once when all the pictures are ready. Maybe it's too simple...


> The code is available on GitHub :
>
> https://github.com/adrienkohlbecker/ruote-synchronize

That is nice, that deserves a README ;-)
Reply all
Reply to author
Forward
0 new messages