Dear Vincent,I am writing to ask you something about the ObjectRecognition.actionWe are planning on using your standard for the definition of object recognition actions.The reason is that, although we plan to develop our own object recognition algorithms, perhaps later we could easily compare with the ones that exist in ORK, or we could even integrate our algorithm into the wg-perception or ORK.Its always nice to follow a standard :).Anyway, I have a question about the definition of that action:# Optional ROI to use for the object detectionbool use_roifloat32[] filter_limits---# Send the found objects, see the msg files for docsobject_recognition_msgs/RecognizedObjectArray recognized_objects---#no feedbackThe question is why there is no information concerning the data used to produce a recognition,e.g. an image or a point cloud.I imagine your object recognition algorithms (i.e., the action servers) actually subscribe to theimage or point cloud messages, get one image (or point cloud), and then process this information.But that means that the image that is used is arbitrary, in the sense that its the image received by the action server when it reached the point of subscribing for an image.So, for example, if you have two recognition servers, say linemod and tod, and you callactions to recognize with these two:call action recognize_using_linemodcall action recognize_using_todYou are not sure if the images used in linemod and tod are exactly the same, right?My proposal is to include the image or point cloud in the action goal message, i.e.:# Optional ROI to use for the object detectionbool use_roifloat32[] filter_limitsbool use_point_cloudsensor_msgs/PointCloud2 point_cloudbool use_imagesensor_msgs/Image---# Send the found objects, see the msg files for docsobject_recognition_msgs/RecognizedObjectArray recognized_objects---#no feedbackI think there are advantages to this change, namely:1. I believe that this change would have no effect on the existing algorithms, because if the use_point_cloud or use_image flags are false then the behavior is to subscribe the image or point_cloud (as I believe is already done). So, to get exactly the same behavior as now we just have to set use_point_cloud and use_image to false.2. We could if needed have the recognition servers use exactly the same image for the recognition.3. We could have the recognition servers operate on images (or point clouds) taken offline.4. The overhead of adding this fields is no so much if the image and point_cloud fields are empty.Suppose the following scenario: a manipulator with a camera in hand. The robot takes the camera to a location, takes an image, then asks the recognition servers to recognize objects in the image.With the proposed changes one could trigger the recognition and then start moving the manipulator immediately, without the need to wait for the recognition result (because we would not be sure when the image was captured by the action servers).With the proposed changes the object recognition action would be more modular and cover a larger field of applications.Could I have your opinion on this. Do your consider to make these changes?Thank you very much,Miguel Oliveira
Hello Vincent,Thanks for the quick reply. Here are some additional comments.On Wed, Mar 25, 2015 at 5:41 PM, Vincent Rabaud wrote:That would be a good conversation to have on:
https://groups.google.com/forum/#!forum/ros-sig-perception
It could work to have those things as part but then again, you should have an array ofI am all for having a standard and let's modify whatever.Now, is it something practical for the client ? First, he should send an array of images, and PointCloud2. But it should also work with compressed images right ? And probably laser scans ? That's why I don't think it should be on the client side to send the data.Now, I understand your point that the client should choose which sensors to process things on. The most flexible thing I see is a string that would be a configuration string that the server would understand and from which it would decide which sensors to use and which method to use. Please, start a thread on the SIG: other persons will face the same issue and we'll never come up with a standard if we don't make it public. Thx.On Wed, Mar 25, 2015 at 9:18 PM, Miguel Armando Riem de Oliveira wrote:Hello Vincent,Thanks for the quick reply. Here are some additional comments.On Wed, Mar 25, 2015 at 5:41 PM, Vincent Rabaud wrote:That would be a good conversation to have on:It could work to have those things as part but then again, you should have an array of images / pcd. And you could have more data like sonars and so on. That's why it was chosen this way: configure your pipeline as you want, it plugs to whatever topics (using ORK or not). Just ask it to give an answer and that's it no ?Yes but which topics? for example a linemod action server would need an image message, but on which topic? Suppose for example that there are 3 cameras on the robot. This decision of which sensor to use should be on the client side, not on the server side, I think.In that way its not possible to use data (images, pc, sonar) which is not streaming from the current sensor.One alternative would be that the client publishes an image on a given topic (and latches the message), then it calls the action signaling the image should be fetched from that topic. That would mean an additional string on the action goal to specify the topic name.Anyway, my contact was just to see if we could use the same action definition in our architecture.If you don't see any advantages in adding this extra functionality to your action definition, we will create our own action definition (which should be very similar to yours except for this extra functionality).Best regards and thank you for the help. images / pcd. And you could have more data like sonars and so on. That's why it was chosen this way: configure your pipeline as you want, it plugs to whatever topics (using ORK or not). Just ask it to give an answer and that's it no ?Yes but which topics? for example a linemod action server would need an image message, but on which topic? Suppose for example that there are 3 cameras on the robot. This decision of which sensor to use should be on the client side, not on the server side, I think.In that way its not possible to use data (images, pc, sonar) which is not streaming from the current sensor.One alternative would be that the client publishes an image on a given topic (and latches the message), then it calls the action signaling the image should be fetched from that topic. That would mean an additional string on the action goal to specify the topic name.Anyway, my contact was just to see if we could use the same action definition in our architecture.If you don't see any advantages in adding this extra functionality to your action definition, we will create our own action definition (which should be very similar to yours except for this extra functionality).Best regards and thank you for the help.
Hey Miguel,
On Thu, Mar 26, 2015 at 02:34:46AM -0700, Miguel Armando Riem de Oliveira wrote:
> My question is that when designing action servers for the object recognition action
> it should be on the client side to decide which data is used to recognize the object.
I don't agree with that statement as it is too general in my opinion.
Which *types* of sources to use for the recognition process should be left to the server,
because otherwise you would end up including quite a number of optional arrays of
sensor_msgs/* in the standard ObjectRecognitionRequest. (is there a magnetic field active
around the object? Is the liquid in the cup hot or cold?)
That aside, there indeed *is* a problem with timing the recognition action.
I have a problem similar to the one you described below:
...
I move my robot arm out of the FoV of the camera
to detect objects in front of the robot and then trigger the action.
However, I can't be sure that the recognition server already received a new point cloud,
so I had to add a sleep(2.0) in between which obviously is not safe either.
Instead, I would like to see a way to specify a time frame in the action to assure
all data that is used for the recognition is from within that time frame.
The most easy thing I can come up with would be to add an std_msgs/Header to the request
to specify a time (and, the hell, a frame for the ROI - that is driving me crazy for ages...)
and, if deemed necessary, a second time to specify the end point of such a time frame.
--
You received this message because you are subscribed to the Google Groups "ROS Perception Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-percept...@googlegroups.com.
To post to this group, send email to ros-sig-p...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ros-sig-perception/20150326154702.GA24048%40kuebelreiter.informatik.Uni-Osnabrueck.DE.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-perception+unsub...@googlegroups.com.
To post to this group, send email to ros-sig-p...@googlegroups.com.
1) Why do you want to tell the object recognition server what sensors to use?
- Are you getting poor accuracy?
- Do you want to limit processing?
- Something else?
I think the object recognition server should be able to handle multiple sensors appropriately. Even if they don't all point to in the same direction. If it's not, then that's something that could be improved upon, but not by explicitly telling the server which sensors to use. If you are seeing poor performance, telling the object recognition server which sensors to use seems more like a band-aid, and is probably better dealt with by having multiple servers.
2) Should you be able to ask for objects in the past/future?
- Asking for objects in the past to me doesn't jive well with ROS infrastructure of pub/sub. Is the object recognition server creating a database of all objects detected in the scene? Should it? How would you manage that? Is it better handled by something else? My gut feeling says that it is probably handled better by something else, giving you more control over how/how long/what gets saved.
- Future objects might be useful in object pose prediction. If you expect the object recognition to detect say a ball and you want to know where it thinks it will be when it comes within range to catch it. Though this too is probably outside the scope of what the object recognition server was designed for.
- If you just want to find an object after you move, that should go through a state machine that monitors joint position, then the client should make a request when the robot is ready.
On Fri, Mar 27, 2015 at 02:41:19AM -0700, Miguel Armando Riem de Oliveira wrote:No, they don't have to be. In your case it would be quite easy
> In addition to that, it is a matter of philosophy if you will: action
> servers are supposed to be "blind" wrt the robot in which they operate.
to add a check for whether or not the arms are tucked.
If anything, the *client* should be agnostic of the way the server
implements the action.
> > 2) *Should* you be able to ask for objects in the past/future?
> >
> > - Asking for objects in the past to me doesn't jive well with ROS> > infrastructure of pub/sub. Is the object recognition server creating aIt's definitely not the same as you would be missing transforms for old data.
> > database of all objects detected in the scene? Should it? How would you
> > manage that? Is it better handled by something else? My gut feeling says
> > that it is probably handled better by something else, giving you more
> > control over how/how long/what gets saved.
> >
> Well, to the action server it would be the same, meaning it would not know
> the sensorial data is from the past. It would have to be the client the one
> to publish a message on a topic with past data (see my idea about the
> latched msgs, although I see the point that tfs could be a problem)
Especially if you use more than one sensor, how are you going to fuse the
frames without tf?
> 3 action server:This sounds like you want a status string to be added as feedback to the action.
> 3.1 start object recognition
> 3.2 collect sensorial data (say 1 sec)
> 3.3 process the data (say 5 sec, because processing usually takes long)
> 3.4 return result
>
> 4 action client: move hand somewhere else.
>
> My point is that, if it was possible to redesign the ObjectRecognition
> action in such a way that we could parallelize steps 3 and 4. Or perhaps
> steps 4 and 3.3/3.4.
The client could then wait until the status changes from ACQUIRING_DATA to
PROCESSING or something like that.