Feed and pick retry in OpenPnP 2.0

786 views
Skip to first unread message

Jason von Nieda

unread,
Apr 16, 2020, 12:17:19 AM4/16/20
to ope...@googlegroups.com
Hi all,

This thread is to discuss needed changes to the feed retry, pick retry, and maybe other retry systems in OpenPnP 2.0.

There are a number of bugs and problems, so I think it's not even that useful to discuss how it works now, or what is broken, and instead focus on how we want it to work.

After discussion with a few people on Discord today, I think we have a good plan. The flow would look like this:

alertOrDeferIfError
    alignRetry(part.alignRetryCount)
        pickRetry(part.pickRetryCount)
            feedRetry(feeder.feedRetryCount)
                feed
            repickRetry(part.repickRetryCount)
                pick
        align
    place

This is sort of how things are supposed to work now, but there are bugs. There's also a few changes to the current design:

1. pickRetryCount moves from feeder to part: Pick retry encapsulates everything needed to get a part on the nozzle. I think this belongs on part because there may be multiple feeders feeding the same part, and in my experience pick issues tend to be related to a specific part, e.g. too heavy, too slippery, picks up sideways, etc.

2. The addition of alignRetry and part.alignRetryCount: Align retry includes picking and feeding. The most common use case here is using bottom vision for good pick detection, with a size check, for instance. If the size check fails we will repeat the entire feed and pick process.

3. The addition of repickRetry and part.repickRetryCount: This covers the case where the feed succeeded but the vacuum check after pick failed. Some people may prefer to try to repick without feeding again on expensive parts in case the part was just stuck or something.

All of these counts can be set to 0 to disable the retry, of course.

Additionally, I think I would like to move error handling (Alert / Defer) from placement to part. I don't think there is a reason that you'd want to treat, say, one R0402-10k placement differently from another R0402-10k placement, and additionally, there are really no "problems" that can happen during placement. Once we've passed alignment, the only thing left to do is move to the position and set the part down.

It could be argued that if we find something stuck to the nozzle after place, then that is a placement error, so maybe we don't mark the placement complete until that test has passed.

So, with this change, if a part ran out, instead of getting many errors for a given part, you'd get one error. You fix the feeder and resume the job. If you are using Defer then all placements of that part would get skipped and you'd be notified at the end. Then you can fix the feeder and restart the job to fix unplaced placements.

Now, let the games begin! :)

Thanks,
Jason

Matt Brocklehurst

unread,
Apr 16, 2020, 1:43:48 AM4/16/20
to ope...@googlegroups.com
If i've got 3x feeders with 4K7 0603, and it tries to do a pick from one of these and for some reason fails - feeder jammed / tape ran out - will it automagically attempt to do a pick from the other two feeders with the same components in without nagging me first?



--
You received this message because you are subscribed to the Google Groups "OpenPnP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpnp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openpnp/CA%2BQw0jyn6JyMM1xbmZ8cX%2Bn53qpj-UBZCtD%3DEDtTZfz2tjSMMQ%40mail.gmail.com.

Jason von Nieda

unread,
Apr 16, 2020, 1:59:47 AM4/16/20
to ope...@googlegroups.com
Yep, that’s the plan.

Jason


--
Sent from my BeOS enabled toaster

Marek T.

unread,
Apr 16, 2020, 4:28:21 AM4/16/20
to OpenPnP
Some thoughts came to my mind, something aside above what finally sounds great. But we need to decide what next after an error got.

If algorithm after all retries will catch exception to throw a message, we're in alert, we get a message with Ok.
First, could be good to display an info what has failed, vacuum or vision.
Further, pressing Ok we'll cancel window and need to press Run.
If we're in Alert, an entire connected step (pick and align) will be re-run.
If we change Alert to Defer before Run, shall we do the bypass of failed part immediately and do next part from the list? Or we'll re-run the failed part and defer it (skip) if it fails again? IMO rather immediate bypass is logical.

Really, making step back to 1.0 style is wrong or impossible, or dificult? I mean the direct buttons in message, Defer and Alert, to choose the type of job continue. Instead of closing with Ok, changing something and Running again? Just it's faster.

Maybe it could be useful to add somewhere the button to repeat only failed operation, like only vision if it failed instead of whole connected procefure. Not necessary in message, just somewhere in panel. We see part on nozzle but vision fails, we don't need discard and pick again, isn't?

Marek T.

unread,
Apr 16, 2020, 4:35:34 AM4/16/20
to OpenPnP
And how many times manual re-run? In accordance to settings or only one time? I vote on "one time" in this case. Every sub-steps one time. We repaired something but not good enough, makes no sense to waste the parts.

Sorry for complications, but it will come back if we don't decide it now.

ma...@makr.zone

unread,
Apr 16, 2020, 10:24:51 AM4/16/20
to ope...@googlegroups.com
Am 16.04.2020 um 06:17 schrieb Jason von Nieda:
Hi all,

This thread is to discuss needed changes to the feed retry, pick retry, and maybe other retry systems in OpenPnP 2.0.

There are a number of bugs and problems, so I think it's not even that useful to discuss how it works now, or what is broken, and instead focus on how we want it to work.

This won't be very popular, but I can't sit on my mouth. Sorry. This is somewhat related to a discussion we already had (link later).

 :-/

Note, this is a monologue, i.e. I'm addressing myself, somewhat sarcastically, as a 30 year professional software engineer, in "reality check mode" (occupational disease). Everybody should try and step in my shoes, but please nobody feel personally addressed or even offended.

Basic assumptions:

  1. This should deal with the real physical world, and rather the ugly side of it, i.e. the exceptional case, the errors, shit happening, Murphy's Law, where stuff simply does not behave as I thought or hoped it would. I need to take off the rose-colored glasses!
  2. I already tried hard to make my machines as good and reliable as possible, I tried to prevent these errors, rather than having to deal with them. So these errors already are the ineradicable residue of my best engineering and problem solving efforts, within given constraints. Need to be very careful, to avoid the common fallacy of trying to resort to the same engineering ideas and principles that by definition will have been proven ineffective in case of an error.
  3. This means that I need to be humble and as unpresumptuous as possible about these error. I don't know why and how exactly they will happen and when, because if I knew, I could probably improve the machine to prevent them. The emphasis is on "I don't know".
  4. Simplification and early application of some (imagined) pareto principle, just for the sake of simplification (or impatience, or because it is mega-trend-fashionable to dumb-down software) will not work here.
  5. Might as well just keep the good and honest Alert solution, if I'm unprepared to face this reality!
  6. Having realized, I don't know why and how exactly an error happens, I need to focus on what I do know, once it has happened.
  7. I can retrace my steps. I know what came before. Several objects have been involved before it happened.
  8. I believe it is completely inadequate to blame all errors on one origin alone. Be it the feeder, or be it the part.
  9. In fact the discussion about whether it should be the part or the feeder shows exactly how inadequate the single error origin model is.
  10. To show what I mean: the simplest error is a feeder running out of parts (reels have empty pocket trailers, so this is unavoidable) and as Matt Brocklehurst said, it should then fail over to the next feeder with the same part. I don't see how that could work inside the drafted system, when the retry count is no longer on the feeder, but on the part. 

    ... and now a bit more constructive  ...

  11. The solution is to handle all relevant OpenPNP objects as potential origins of errors.
  12. Once an error happens, penalty points are associated to all objects that were involved leading up to the error.
  13. These are Head (i.e. XY Motion), Pump, Nozzle, Valve & Sensor, Nozzle Tip, Feeder, Package, Part maybe Bottom Camera, etc.  and (most importantly) the Placement and even the individual Job Processor Steps.
  14. For each relevant object, we can instantiate an associated ErrorHandler. A central Error table for easy/tabular maintenance and unified GUI.
  15. Each ErrorHandler object has two counters: global and resettable error count.
  16. Plus the user can set limits, of how many resettable errors and how many global errors an object may collect, before being taken out of commission.
  17. The user can create and associate an ErrorHandler for any relevant object (see 13). OpenPNP's AbstractModelObject is nicely unified and very well prepared for such a generic use.
  18. Some ErrorHandlers will be mandatory and automatically created by OpenPNP up front. Obviously a good set of defaults will have to be developed in the field and proposed by OpenPNP.
  19. Some ErrorHandlers will be templates, where you can set (and reset) the default settings by type of an object, and filtering by properties such as package.
  20. Template based ErrorHandlers will be created on the fly when a new object of that type and filtering properties catches the first error.
  21. Once a placement was done with success, the resettable error counts of all (truly) involved objects are reset. But not the global counters.
  22. Because some errors will be intermittent, global counters are important to catch those.
  23. Once an ErrorHandler has reached the error limit, the associated object is taken out of commission and the Job Processor may no longer use it.
  24. As these include Placement and Job Processor Step, the Job Processor can "naturally" backtrack on those.
  25. No need to discriminate "Pick Retry", "Feed Retry" etc. on the GUI and in the code, the corresponding Job Processor Step ErrorHandlers will stand for those.
  26. Because it is universal, you'll also get "BoardLocationFiducialCheck Retry", "CalibrateNozzleTips Retry" for free, for instance.
  27. So you get a robust system all the way, as I can imagine it is very frustrating to have a job fail half way in, just because the nozzle tip calibration vision failed once due to a solitary glitch in the camera image.
  28. This is also automatically ready for all future types of Steps (and embraces a finer Step granularity, as is sometimes discussed).
  29. Outside the Placements and Steps, all the other objects are potential subjects too.
  30. It may disable a feeder, because it ran empty. It will fail over to the next enabled feeder, or skip that part.
  31. It may disable a nozzle tip, because it got full of cream and blocked and all picks with it failed, regardless of part, feeder or package. This will skip all parts with no alternative nozzle tip compatibility.
  32. It may try and limp on with one nozzle, because the vacuum sensor on the other is faulty.
  33. It may be restricted to parts without bottom vision, because a part has fallen on the bottom camera.
  34. It may disable one package, because a change in ambient light is spoiling bottom vision for that peculiar package.
  35. It may in deed disable a part, because it has failed from three feeders, maybe the PartOn vacuum level is faulty for that specific (porous) part.
  36. It may even disable the head (X, Y motion) because the Y stepper has skipped a step and it is now constantly mis-picking (this obviously halts the Job for good).
  37. Obviously, the error limits need to reflect the hierarchy of machine objects. A Step will fail quickly, a feeder follows soon, a nozzle or camera, or even the head (for X/Y motion) takes many more "red cards" to take out.
  38. It is important to have this hierarchy and also the global counters. Last time we discussed this, I could show that the drafted system would go on and waste all N feed retries times M pick retries of parts of all the feeders(!), if the error origin is global (e.g. the vacuum pump failed, or a stepper missed a step). The same is still true, I believe. This system would still need close supervision
  39. See the past discussion and specifically the latest chart by Marek here.
  40. While this proposed system here is a radical departure from today's solution, it is IMHO simpler to implement than the one currently drafted, because everything works the same way on a generic Job Processor Step granularity level. 
  41. Once a user has the hang of one ErrorObject, (s)he is also ready to use all the others. The template & filtering system would allow for very powerful presets. All discussions whether you want the error limit presets by part or by package or by feeder or by placement would be moot, because you can simply do all of the above.

_Mark


bert shivaan

unread,
Apr 16, 2020, 11:35:36 AM4/16/20
to OpenPnP
I like Mark,s idea above, admittedly I have NO IDEA how difficult it would be to implement it.

I know the system Marek implemented in V1.0 seems to work well for him. 
I would love to see a system that when using multi nozzles, if 1 fails vision for example, the rest are placed and the failed one trashed, then a new part is picked. This way less trips to the well so to speak.

--
You received this message because you are subscribed to the Google Groups "OpenPnP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpnp+u...@googlegroups.com.

Marek T.

unread,
Apr 16, 2020, 11:51:13 AM4/16/20
to OpenPnP
Bert it will work exactly as you say if you choose Defer and number of tries is exceeded. Also my implementation is doing this way when auto-skip is selected.

I think that the way that Jason described is relatively easy for implementation (he'd not proposose something impossible or requiring total revolution) and functionality is as me-you found the best + more. And works for semi-mass production very good.
I don't want to say that nothing better exists. But if we'll have "only" this - it will be hell and heaven difference comparing to old 1.0 and actual hopeless (about retries) 2.0.

It will never happen that everybody are glad, it's endless.

Jason von Nieda

unread,
Apr 19, 2020, 11:39:34 PM4/19/20
to ope...@googlegroups.com
Hi Mark,

I think this is a very good idea. It matches reality well - anything that can throw an Exception is a possible source of trouble, and should be considered "suspect". I think it will be quite a bit of work, and I can't think of a good way to do it without massive amounts of boilerplate - but that is potentially okay too. Do you have any thoughts on how it could be actually implemented in code?

I'm moving forward with what I've described above, because these issues need to be fixed soon and I think it's a relatively small amount of work, but if you are interested in taking a swing at implementing your concept, I am all in favor. If we can come up with a well designed implementation of what you've described I think it will be a relatively easy conversion from what I'm working on and what you are proposing. I think it would be good to start with a single object implementing this concept, and the plumbing required to handle it's errors.

Thanks,
Jason


ma...@makr.zone

unread,
Apr 20, 2020, 4:38:01 AM4/20/20
to ope...@googlegroups.com

Hi Jason

Thanks for taking the time and patience to read my half-rant.

Your proposal of how to proceed seems perfect to me. I'm glad it's a relatively small amount of work and your still open for other solutions after that.

> if you are interested in taking a swing at implementing your concept, I am all in favor.

I'm well aware that the JobProcessor is more or less the crown of it all, holding the other pieces together. So I am not ready to suggest I could do it alone.  But I could probably create some proof of concept on one or two Steps, and maybe you could then provide guidance or take over.

This is in no way urgent (for me). I don't think I'll ever produce so many boards, I would not like to watch the machine do every pick anyway ;-)  The important thing for me (why I made the post with my proposal) was to suggest a real deep rethink, before doing a complete overhaul of the current system with lots of work put into it. That's why I'm very glad you say its relatively small amount of work. So it will not be set in stone for all eternity, due to the sheer amount of blood, toil, tears and sweat that went into it. 

I have other TODOs on my list first, but I'll always keep this in mind, collecting valuable knowledge about all the bits and pieces the JobProcessor is supposed to keep together.

_Mark

Marek T.

unread,
Apr 23, 2020, 3:09:45 AM4/23/20
to OpenPnP
Hi Jason,

So since you tested it on yourself. Will you fix it or remove to apply what we talked?

Jason von Nieda

unread,
Apr 23, 2020, 1:08:59 PM4/23/20
to ope...@googlegroups.com
First, just fix the pick retry bug. Should be a small fix. Then work on implementing what we discussed in this thread.

Jason


On Thu, Apr 23, 2020 at 2:09 AM Marek T. <marek.tw...@gmail.com> wrote:
Hi Jason,

So since you tested it on yourself. Will you fix it or remove to apply what we talked?

--
You received this message because you are subscribed to the Google Groups "OpenPnP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpnp+u...@googlegroups.com.

John Plocher

unread,
Apr 23, 2020, 4:30:43 PM4/23/20
to ope...@googlegroups.com
Mark,

As I read this, I'm struck by the user model conflation between errors that come out of machine structural settings, calibration issues and per-job settings - and wonder if things would simplify if they were detangled.  (I'm very aware that we may well be looking at the same problem from opposite ends, and we could be in violent agreement...)

I'm basing my response on your observation that "I already tried hard to make my machines as good and reliable as possible".  That is, there are demarcation lines between "the machine failed at something that should just work", "the machine isn't calibrated correctly, and can't work", and "the execution of my job failed because of something I did or didn't do".

Working down the "what hat am I wearing" chain from operator to maintainer to builder, IMO, there are a slew of normal, don't get your pants in a tizzy errors  - failures that happen when using a correctly functioning machine to set up and run jobs:
  • Ran out of parts in a feeder
  • Didn't specify the correct visual patterns for the part in use
  • Didn't choose the right nozzle for a part
  • Got a thumbprint on the camera lens
  • A nozzle tip got gummed up with solder paste
  • A tray got bumped and is now out of registration
Put these failures into a bucket called "job errors" because they have little to do with how well the machine is calibrated or constructed, and everything to do with how the operator used the machine.  This is where "Abort, Retry, Fail" dialogs can be very useful, though there is room for a discussion on how they fit into an operator workflow:
  • When setting up a new job, with its cycle of testing/debugging, the operator needs messaging to learn about the above failure(s) and how to remediate things - whether it is to add feeders and parts, change the visual part matching pipeline, ensure all the needed tips are available or whatever: an poorly specified job can't succeed.
  • After setup is complete, a manual placement validation is usual.  Here, the focus may be more on Retry (with associated job config changes) than on Defer or Ignore/Skip. 
  • Finally, in an Automatic/Production mode, the focus should be on choices that improve board throughput speeds & robustness and reduce interaction requirements.
Then there are the errors that fall into the machine's "configuration and calibration errors" bucket, and have a slightly different focus - this is where you would create and tune a new camera pipeline or or install a new bank of feeders.  Here the error focus is on proving out functionality and debugging communications issues between subsystems.

Finally are the things I consider "core machine stuff" - if the X-axis encoder doesn't play well with the X-axis home and limit switches, or you can't talk G-Code to that 'duino controller, you need to know about it so you can fix things and iterate on the "make my machines as good and reliable as possible" ethos.

Does this mean there are 3 different error handling systems?  no, but it may mean that the errors need to be interpreted or handled in different ways in different contexts - or that the errors themselves need a taxonomy and handler context...  I just don't see the value of, say, the operator having to deal with the complexity of "set limits, of how many resettable errors and how many global errors an object may collect, before being taken out of commission".

  -John

ma...@makr.zone

unread,
Apr 24, 2020, 6:47:01 AM4/24/20
to ope...@googlegroups.com

Hi John

thanks for wrapping your brain around this too! :)

I agree that errors can be classified into different bins and different diagnostic tools are needed. I'm all for pushing the notion that first there must be a good setup. In fact, I'm developing tools to that end for OpenPNP, see my vacuum setup graphical diagnostics (and please help testing it!):

https://groups.google.com/d/msg/openpnp/gL7uEjKmLzU/fksMRtLpAwAJ

I'm in the process of developing the same for the camera settle and boy have issues with my machine popped up through it's diagnostic power (will announce soon)!!

https://groups.google.com/d/msg/openpnp/jnvon8elGzI/VhDvC8KnDQAJ

Having said that, once you're running the machine (and we're talking about "quite productive" running in this thread), I'm very  much  convinced that all attempts towards "precise single origin or error diagnostics" are doomed. Just don't try, you'll fail!

Yes it may be vision of that part that keeps failing, but maybe it's because a previous part has fallen on the camera and the next part will fail too. No point in counting errors on the part or feeder.

Or worse, the X stepper has missed a step, because it was momentarily stuck in the nozzle tip changer, and now it keeps not picking parts right. But this might lead to vacuum fails too, so the cumulative trickling-down of errors will then indicate something more "fundamental" is wrong.

Very bad to keep retrying with the next part/feeder/placement and the next, and the next...!

My proposal is just a simple heuristics and maybe it won't work out, but it's worth a try, IMHO. 

All I'm actually saying is "don't blame the messenger", the failure reporting component might not be the true origin of the error.

Now I do agree that my counting system might be too ambitious and too complex for the user. Maybe one global error counter and limit in addition to the part or feeder counters, would cover the Pareto 80% of the problem already.

    :-)

_Mark

--
You received this message because you are subscribed to the Google Groups "OpenPnP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpnp+u...@googlegroups.com.

Mike Menci

unread,
Sep 7, 2020, 2:16:11 PM9/7/20
to OpenPnP
Hello,
Is there any progress in this fix witch is very old request and not working in OpenPnP2.
I read it as a small bug - but where the actual problem is ?
Thanks
Mike

Jason von Nieda

unread,
Sep 10, 2020, 11:07:57 AM9/10/20
to ope...@googlegroups.com
There's been no progress on it in quite a while. I've got some unfinished code that I think is going in the right direction, but it's a complete rewrite of the JobProcessor. Looking back, I see I said I thought the retry bug would be a small fix. I think it turns out it's actually pretty complicated, but I will revisit it this weekend and see if I can come up with a quick fix to at least get the main bug fixed.

Jason


Mike Menci

unread,
Sep 19, 2020, 8:32:23 AM9/19/20
to OpenPnP
Jason - this might be the right weekend ?
:-)
Mike

Nom Dinet

unread,
May 31, 2021, 7:03:07 AM5/31/21
to OpenPnP
Dear All,

Any progress on this issue?

Regards

Nom

Marek T.

unread,
May 31, 2021, 7:29:41 AM5/31/21
to OpenPnP
As far as I know:
When the pick fails because of vacuum - it is retried. When the vision fails the pick is still not retried but can be "defered" and tried to be pnp again after the job is finished (in another job run passing).
So It's partially done but still not the whole algorithm we discussed here.

Nom Dinet

unread,
May 31, 2021, 7:37:24 AM5/31/21
to OpenPnP
Hi Marek,
Thanks for the info.  My machine is two heads and no vacuum sensing for pick fail.  If the first head picks ok and passes bottom vision and the second head fails bottom vision, I don't have the option to place the first component before the job needs to be stopped.  I have to discard both components and the job stops.
Regards
Nom

Marek T.

unread,
May 31, 2021, 7:50:53 AM5/31/21
to OpenPnP
Hi!
Two heads you mean two nozzles I guess, right?
If you have this part declared as "defer" in the job list so it should place the vission-passed-part with first nozzle and the second should defer for later...
Or maybe it is not so, sorry I don't use 2.0 for everyday only test new features from time-to-time, so I can know something wrong :-(.

Chad Olson

unread,
May 31, 2021, 12:05:10 PM5/31/21
to ope...@googlegroups.com
What type of pick and place machine are you using with 2 nozzles?

Sent from my iPhone

On May 31, 2021, at 7:50 AM, Marek T. <marek.tw...@gmail.com> wrote:



Nom Dinet

unread,
May 31, 2021, 7:19:07 PM5/31/21
to OpenPnP
Hello Marek,
Sorry, yes I meant two nozzles.  I don't have a defer option in the job list.  I am running ver 2.
Colso: tje machine is a DIY build - nothing extraordinary but has just developed a problem where the left nozzle passes bottom vision but the right fails with the same component and tip.
Regards

Nom

Marek Twarowski

unread,
May 31, 2021, 8:20:32 PM5/31/21
to ope...@googlegroups.com
It's almost impossible. Just in 2.0 version you must have two option to choose for every position in the job list: Alert and Defer. 

You received this message because you are subscribed to a topic in the Google Groups "OpenPnP" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpnp/kbnMKnDArIk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpnp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openpnp/03767411-c402-438b-bb22-0626cfa7c116n%40googlegroups.com.

Nom Dinet

unread,
Jun 1, 2021, 3:10:14 AM6/1/21
to OpenPnP
Thanks Marek, I found the Defer option.  I am away from home for 1 week so I will try then.

Nom

Marek Twarowski

unread,
Jun 1, 2021, 4:26:10 AM6/1/21
to ope...@googlegroups.com
Holidays? Enjoy :-). 
Let me know if it fixes your issue (partially at least) when you'll check it.

ozzy_sv

unread,
Jun 2, 2021, 2:31:09 AM6/2/21
to OpenPnP
I'm wondering, I haven't updated for a long time, have the main problem from the topic been fixed?  
Or is everything the same as it was?

вторник, 1 июня 2021 г. в 11:26:10 UTC+3, Marek T.:

Marek Twarowski

unread,
Jun 2, 2021, 3:11:07 AM6/2/21
to ope...@googlegroups.com
It is as I said. It's been partially fixed, something works but final algorithm is still not applied.

Reply all
Reply to author
Forward
0 new messages