Discussion of the ObjectRecognition.action

63 views
Skip to first unread message

Miguel Armando Riem de Oliveira

unread,
Mar 26, 2015, 5:34:46 AM3/26/15
to ros-sig-p...@googlegroups.com
Hello,

I am writing to discuss a possible change in the object recognition action, defined in  https://github.com/wg-perception/object_recognition_msgs

I contacted Vincent Rabaud by email before, and he recommended I started this discussion here.

Bellow is the transcription of the emails exchanged with Vincent, but before a summary of the discussion:

My question is that when designing action servers for the object recognition action , it should be on the client side to decide which data is used to recognize the object.
In my opinion, this decision should include not only which sensor is used, i.e., rgb camera number 3, but also the time stamp of the sensorial data (i.e., use the image from camera number 3 taken at time t). How this could be done is discussed bellow but other ideas would be welcome.

The current ObjectRecognition.action does not support this functionality.

Do you see this functionality as interesting?
If so, how could it be implemented?

Please send your thoughts.

Best regards,

Miguel

---------------------------------------------------
Emails exchanged before with Vincent
---------------------------------------------------

** Vincent **
That would be a good conversation to have on:
https://groups.google.com/forum/#!forum/ros-sig-perception
It could work to have those things as part but then again, you should have an array of images / pcd. And you could have more data like sonars and so on. That's why it was chosen this way: configure your pipeline as you want, it plugs to whatever topics (using ORK or not). Just ask it to give an answer and that's it no ?

** Miguel **

On Wed, Mar 25, 2015 at 3:20 PM, Miguel Armando Riem de Oliveira wrote:
Dear Vincent,

I am writing to ask you something about the ObjectRecognition.action

We are planning on using your standard for the definition of object recognition actions.

The reason is that, although we plan to develop our own object recognition algorithms, perhaps later we could easily compare with the ones that exist in ORK, or we could even integrate our algorithm into the wg-perception or ORK.

Its always nice to follow a standard :).

Anyway, I have a question about the definition of that action:

# Optional ROI to use for the object detection
bool use_roi
float32[] filter_limits
---
# Send the found objects, see the msg files for docs
object_recognition_msgs/RecognizedObjectArray recognized_objects
---
#no feedback

The question is why there is no information concerning the data used to produce a recognition,
e.g. an image or a point cloud.

I imagine your object recognition algorithms (i.e., the action servers) actually subscribe to the 
image or point cloud messages, get one image (or point cloud), and then process this information.

But that means that the image that is used is arbitrary, in the sense that its the image received by the action server when it reached the point of subscribing for an image.

So, for example, if you have two recognition servers, say linemod and tod, and you call 
actions to recognize with these two:

call action recognize_using_linemod
call action recognize_using_tod

You are not sure if the images used in linemod and tod are exactly the same, right?

My proposal is to include the image or point cloud in the action goal message, i.e.:

# Optional ROI to use for the object detection
bool use_roi
float32[] filter_limits
bool use_point_cloud
sensor_msgs/PointCloud2 point_cloud
bool use_image
sensor_msgs/Image
---
# Send the found objects, see the msg files for docs
object_recognition_msgs/RecognizedObjectArray recognized_objects
---
#no feedback

I think there are advantages to this change, namely:

1. I believe that this change would have no effect on the existing algorithms, because if the use_point_cloud or use_image flags are false then the behavior is to subscribe the image or point_cloud (as I believe is already done). So, to get exactly the same behavior as now we just have to set use_point_cloud and use_image to false.

2. We could if needed have the recognition servers use exactly the same image for the recognition. 

3. We could have the recognition servers operate on images (or point clouds) taken offline.

4. The overhead of adding this fields is no so much if the image and point_cloud fields are empty.

Suppose the following scenario: a manipulator with a camera in hand. The robot takes the camera to a location, takes an image, then asks the recognition servers to recognize objects in the image.

With the proposed changes one could trigger the recognition and then start moving the manipulator immediately, without the need to wait for the recognition result (because we would not be sure when the image was captured by the action servers).

With the proposed changes the object recognition action would be more modular and cover a larger field of applications. 

Could I have your opinion on this. Do your consider to make these changes?

Thank you very much,

Miguel Oliveira


** Vincent **

I am all for having a standard and let's modify whatever.
Now, is it something practical for the client ? First, he should send an array of images, and PointCloud2. But it should also work with compressed images right ? And probably laser scans ? That's why I don't think it should be on the client side to send the data.

Now, I understand your point that the client should choose which sensors to process things on. The most flexible thing I see is a string that would be a configuration string that the server would understand and from which it would decide which sensors to use and which method to use. Please, start a thread on the SIG: other persons will face the same issue and we'll never come up with a standard if we don't make it public. Thx.

** Miguel **

On Wed, Mar 25, 2015 at 9:18 PM, Miguel Armando Riem de Oliveira wrote:
Hello Vincent,

Thanks for the quick reply. Here are some additional comments.


On Wed, Mar 25, 2015 at 5:41 PM, Vincent Rabaud wrote:
That would be a good conversation to have on:
https://groups.google.com/forum/#!forum/ros-sig-perception
It could work to have those things as part but then again, you should have an array ofI am all for having a standard and let's modify whatever.
Now, is it something practical for the client ? First, he should send an array of images, and PointCloud2. But it should also work with compressed images right ? And probably laser scans ? That's why I don't think it should be on the client side to send the data.

Now, I understand your point that the client should choose which sensors to process things on. The most flexible thing I see is a string that would be a configuration string that the server would understand and from which it would decide which sensors to use and which method to use. Please, start a thread on the SIG: other persons will face the same issue and we'll never come up with a standard if we don't make it public. Thx.

On Wed, Mar 25, 2015 at 9:18 PM, Miguel Armando Riem de Oliveira  wrote:
Hello Vincent,

Thanks for the quick reply. Here are some additional comments.


On Wed, Mar 25, 2015 at 5:41 PM, Vincent Rabaud  wrote:
That would be a good conversation to have on:
It could work to have those things as part but then again, you should have an array of images / pcd. And you could have more data like sonars and so on. That's why it was chosen this way: configure your pipeline as you want, it plugs to whatever topics (using ORK or not). Just ask it to give an answer and that's it no ?

Yes but which topics? for example a linemod action server would need an image message, but on which topic? Suppose for example that there are 3 cameras on the robot. This decision of which sensor to use should be on the client side, not on the server side, I think.

In that way its not possible to use data (images, pc, sonar) which is not streaming from the current sensor. 

One alternative would be that the client publishes an image on a given topic (and latches the message), then it calls the action signaling the image should be fetched from that topic. That would mean an additional string on the action goal to specify the topic name.

Anyway, my contact was just to see if we could use the same action definition in our architecture. 

If you don't see any advantages in adding this extra functionality to your action definition, we will create our own action definition (which should be very similar to yours except for this extra functionality).

Best regards and thank you for the help. images / pcd. And you could have more data like sonars and so on. That's why it was chosen this way: configure your pipeline as you want, it plugs to whatever topics (using ORK or not). Just ask it to give an answer and that's it no ?

Yes but which topics? for example a linemod action server would need an image message, but on which topic? Suppose for example that there are 3 cameras on the robot. This decision of which sensor to use should be on the client side, not on the server side, I think.

In that way its not possible to use data (images, pc, sonar) which is not streaming from the current sensor. 

One alternative would be that the client publishes an image on a given topic (and latches the message), then it calls the action signaling the image should be fetched from that topic. That would mean an additional string on the action goal to specify the topic name.

Anyway, my contact was just to see if we could use the same action definition in our architecture. 

If you don't see any advantages in adding this extra functionality to your action definition, we will create our own action definition (which should be very similar to yours except for this extra functionality).

Best regards and thank you for the help.

v4hn

unread,
Mar 26, 2015, 7:52:16 AM3/26/15
to ros-sig-p...@googlegroups.com
Hey Miguel,

On Thu, Mar 26, 2015 at 02:34:46AM -0700, Miguel Armando Riem de Oliveira wrote:
> My question is that when designing action servers for the object recognition action
> it should be on the client side to decide which data is used to recognize the object.

I don't agree with that statement as it is too general in my opinion.
Which *types* of sources to use for the recognition process should be left to the server,
because otherwise you would end up including quite a number of optional arrays of
sensor_msgs/* in the standard ObjectRecognitionRequest. (is there a magnetic field active
around the object? Is the liquid in the cup hot or cold?)

That aside, there indeed *is* a problem with timing the recognition action.
I have a problem similar to the one you described below:

> Suppose the following scenario: a manipulator with a camera in hand. The
> robot takes the camera to a location, takes an image, then asks the
> recognition servers to recognize objects in the image.
>
> With the proposed changes one could trigger the recognition and then start
> moving the manipulator immediately, without the need to wait for the
> recognition result (because we would not be sure when the image was
> captured by the action servers).

I move my robot arm out of the FoV of the camera
to detect objects in front of the robot and then trigger the action.
However, I can't be sure that the recognition server already received a new point cloud,
so I had to add a sleep(2.0) in between which obviously is not safe either.

Instead, I would like to see a way to specify a time frame in the action to assure
all data that is used for the recognition is from within that time frame.

The most easy thing I can come up with would be to add an std_msgs/Header to the request
to specify a time (and, the hell, a frame for the ROI - that is driving me crazy for ages...)
and, if deemed necessary, a second time to specify the end point of such a time frame.

Thanks for reading,


v4hn

Miguel Armando Riem de Oliveira

unread,
Mar 26, 2015, 10:14:43 AM3/26/15
to ros-sig-p...@googlegroups.com
Hi v4hn,

Thanks for the comments,

Em quinta-feira, 26 de março de 2015 11:52:16 UTC, v4hn escreveu:
Hey Miguel,

On Thu, Mar 26, 2015 at 02:34:46AM -0700, Miguel Armando Riem de Oliveira wrote:
> My question is that when designing action servers for the object recognition action
> it should be on the client side to decide which data is used to recognize the object.

I don't agree with that statement as it is too general in my opinion.
Which *types* of sources to use for the recognition process should be left to the server,
because otherwise you would end up including quite a number of optional arrays of
sensor_msgs/* in the standard ObjectRecognitionRequest. (is there a magnetic field active
around the object? Is the liquid in the cup hot or cold?)

Yes, I see your point. The type of sensor used should be on the server side, say, a texture based algorithm should use a camera, 
a 3d based algorithm should use point clouds. I agree that it does not make sense to consider the action client defining the type of sensor.

But there is still the problem of defining which sensor (the sensor index), i.e., which of the 3 cameras onboard the robot is the action client going to use to recognize objects?
 

That aside, there indeed *is* a problem with timing the recognition action.
I have a problem similar to the one you described below:

...
 
I move my robot arm out of the FoV of the camera
to detect objects in front of the robot and then trigger the action.
However, I can't be sure that the recognition server already received a new point cloud,
so I had to add a sleep(2.0) in between which obviously is not safe either.

Yep, that's also the kind of problem I see coming ...
 
Instead, I would like to see a way to specify a time frame in the action to assure
all data that is used for the recognition is from within that time frame.

The most easy thing I can come up with would be to add an std_msgs/Header to the request
to specify a time (and, the hell, a frame for the ROI - that is driving me crazy for ages...)
and, if deemed necessary, a second time to specify the end point of such a time frame.

That solution does not solve the sensor index problem.

Furthermore, it still depends on the fact that the recognition is to be made with "live sensors", meaning the time 
interval will have to be defined in the future (some point after the action is called). What if you want to detect boxes in an image taken a week ago?
 
I think the idea of a string defining the topic to subscribe to would be more general (eventually combined with a time interval) ... if you want to use live, you give the topic streaming from a live sensor, if you want to use offline data, you publish and latch it on a topic and give that topic to the action server.

Regards,

Miguel 

v4hn

unread,
Mar 26, 2015, 11:47:04 AM3/26/15
to ros-sig-p...@googlegroups.com
On Thu, Mar 26, 2015 at 07:14:43AM -0700, Miguel Armando Riem de Oliveira wrote:
> But there is still the problem of defining which sensor (the sensor index),
> i.e., which of the 3 cameras onboard the robot is the action client going
> to use to recognize objects?

I don't think a "sensor index" would be useful.
If you want the server to recognize objects from these sources,
it should subscribe to all these sources anyway.
So why do you want to restrict the server to use only some of them?
If you want to recognize things from different sources separately,
then start multiple recognition servers.
http://en.wikipedia.org/wiki/YAGNI ?

> Furthermore, it still depends on the fact that the recognition is to be
> made with "live sensors", meaning the time
> interval will have to be defined in the future (some point after the action
> is called). What if you want to detect boxes in an image taken a week ago?

As far as I can see the ObjectRecognition.action is *meant* to be used
only with "live sensors" (or rosbags for that sake).
With older images there is also the question where the transforms
for that point in time come from, so there's more bookkeeping required
there and I would say you not expect the action server to handle that.

> I think the idea of a string defining the topic to subscribe to would be
> more general (eventually combined with a time interval) ... if you want to
> use live, you give the topic streaming from a live sensor, if you want to
> use offline data, you publish and latch it on a topic and give that topic
> to the action server.

I'm pretty sure it's no good idea to subscribe to the sources only *after*
the server received a request, as you might be missing relevant data already
when the server first receives data. I don't like that idea.


v4hn

Vincent Rabaud

unread,
Mar 26, 2015, 6:24:56 PM3/26/15
to ros-sig-p...@googlegroups.com
What we could add is a configuration parameter (like a string). This would define which inputs / topics / sensors to use and it would be up to server to interpret it.
I also agree that a timestamp could be auseful parameter.

Should we add those two then ?


--
You received this message because you are subscribed to the Google Groups "ROS Perception Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-percept...@googlegroups.com.
To post to this group, send email to ros-sig-p...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ros-sig-perception/20150326154702.GA24048%40kuebelreiter.informatik.Uni-Osnabrueck.DE.
For more options, visit https://groups.google.com/d/optout.

Allison Thackston

unread,
Mar 26, 2015, 9:37:22 PM3/26/15
to ros-sig-p...@googlegroups.com
The fundamental questions here to me are

1) Why do you want to tell the object recognition server what sensors to use?
  • Are you getting poor accuracy?
  • Do you want to limit processing?
  • Something else?
I think the object recognition server should be able to handle multiple sensors appropriately.  Even if they don't all point to in the same direction.  If it's not, then that's something that could be improved upon, but not by explicitly telling the server which sensors to use.  If you are seeing poor performance, telling the object recognition server which sensors to use seems more like a band-aid, and is probably better dealt with by having multiple servers.

2) Should you be able to ask for objects in the past/future?
  • Asking for objects in the past to me doesn't jive well with ROS infrastructure of pub/sub.  Is the object recognition server creating a database of all objects detected in the scene?  Should it? How would you manage that?  Is it better handled by something else?  My gut feeling says that it is probably handled better by something else, giving you more control over how/how long/what gets saved.
  • Future objects might be useful in object pose prediction.  If you expect the object recognition to detect say a ball and you want to know where it thinks it will be when it comes within range to catch it.  Though this too is probably outside the scope of what the object recognition server was designed for.  
  • If you just want to find an object after you move, that should go through a state machine that monitors joint position, then the client should make a request when the robot is ready.
To unsubscribe from this group and stop receiving emails from it, send an email to ros-sig-perception+unsub...@googlegroups.com.

To post to this group, send email to ros-sig-p...@googlegroups.com.

Miguel Armando Riem de Oliveira

unread,
Mar 27, 2015, 5:41:19 AM3/27/15
to ros-sig-p...@googlegroups.com
Hello Allison,

Thanks for the comments.

1) Why do you want to tell the object recognition server what sensors to use?
  • Are you getting poor accuracy?
  • Do you want to limit processing?
  • Something else?
Not because I am getting poor accuracy, but yes, to limit the processing to some extent.

In addition to that, it is a matter of philosophy if you will: action servers are supposed to be "blind" wrt the robot in which they operate. 
That's the only way to have them working in many robots without changing the code. Because of that, I think that it should be the client (i.e., a  program with hardware platform awareness ) to decide which sensors to use. 

Imagine a PR2 robot with 2 cameras, one in the head, another in the hand. Suppose that the PR2's arms are tucked and we are sure the hand cameras will got capture any good images.
Now I would like (as an action client) to be able to say "recognize objects using the head camera". I don't think we should leave this decision to the action server , which should be hardware platform agnostic (well, this one in particular, perhaps not others).
 
I think the object recognition server should be able to handle multiple sensors appropriately.  Even if they don't all point to in the same direction.  If it's not, then that's something that could be improved upon, but not by explicitly telling the server which sensors to use.  If you are seeing poor performance, telling the object recognition server which sensors to use seems more like a band-aid, and is probably better dealt with by having multiple servers.

Well, as I said before, I disagree with you on this one. But note that I also think that the action server should be able to handle multiple sensors. The difference is that I think it should be able to handle only if the action client would says "use all available sensors"
 

2) Should you be able to ask for objects in the past/future?
  • Asking for objects in the past to me doesn't jive well with ROS infrastructure of pub/sub.  Is the object recognition server creating a database of all objects detected in the scene?  Should it? How would you manage that?  Is it better handled by something else?  My gut feeling says that it is probably handled better by something else, giving you more control over how/how long/what gets saved.
Well, to the action server it would be the same, meaning it would not know the sensorial data is from the past. It would have to be the client the one to publish a message on a topic with past data (see my idea about the latched msgs, although I see the point that tfs could be a problem)
  • Future objects might be useful in object pose prediction.  If you expect the object recognition to detect say a ball and you want to know where it thinks it will be when it comes within range to catch it.  Though this too is probably outside the scope of what the object recognition server was designed for.  
That would be interesting but I agree it looks far fetched for now.
 
  • If you just want to find an object after you move, that should go through a state machine that monitors joint position, then the client should make a request when the robot is ready.

I am not sure I understood you, because in fact the client would not know when its ready (see v4hn solution, he waits for 2 secs??). My problem is very specific and I am not just preparing for something that will never happen as v4hn suggests. Let me describe it:

Robot has a camera in hand.

1 action client: Robot hand goes to observe a shelf.

2 action client: call ObjectRecognition action

3 action server: 
   3.1 start object recognition
   3.2 collect sensorial data (say 1 sec)
   3.3 process the data (say 5 sec, because processing usually takes long)
   3.4 return result

4 action client: move hand somewhere else.

My point is that, if it was possible to redesign the ObjectRecognition action in such a way that we could parallelize steps 3 and 4. Or perhaps steps 4 and 3.3/3.4.

With the current solution implementation its not possible. With the solutions suggested of adding a timestamp I think its also not possible.

 Regards,

Miguel

v4hn

unread,
Mar 27, 2015, 8:39:26 AM3/27/15
to ros-sig-p...@googlegroups.com
On Fri, Mar 27, 2015 at 02:41:19AM -0700, Miguel Armando Riem de Oliveira wrote:
> In addition to that, it is a matter of philosophy if you will: action
> servers are supposed to be "blind" wrt the robot in which they operate.

No, they don't have to be. In your case it would be quite easy
to add a check for whether or not the arms are tucked.
If anything, the *client* should be agnostic of the way the server
implements the action.

> > I think the object recognition server* should* be able to handle multiple
> > sensors appropriately. Even if they don't all point to in the same
> > direction. If it's not, then that's something that could be improved upon,
> > but not by explicitly telling the server which sensors to use. If you are
> > seeing poor performance, telling the object recognition server which
> > sensors to use seems more like a band-aid, and is probably better dealt
> > with by having multiple servers.

agreed

> > 2) *Should* you be able to ask for objects in the past/future?
> >
> > - Asking for objects in the past to me doesn't jive well with ROS
> > infrastructure of pub/sub. Is the object recognition server creating a
> > database of all objects detected in the scene? Should it? How would you
> > manage that? Is it better handled by something else? My gut feeling says
> > that it is probably handled better by something else, giving you more
> > control over how/how long/what gets saved.
> >
> Well, to the action server it would be the same, meaning it would not know
> the sensorial data is from the past. It would have to be the client the one
> to publish a message on a topic with past data (see my idea about the
> latched msgs, although I see the point that tfs could be a problem)

It's definitely not the same as you would be missing transforms for old data.
Especially if you use more than one sensor, how are you going to fuse the
frames without tf?

> > - If you just want to find an object after you move, that should go
> > through a state machine that monitors joint position, then the client
> > should make a request when the robot is ready.

The problem, in my case, is that the ecto-Subscribers in the recognition plasm
might still have old images cached, even if the robot arm is outside the FoV *now*.
But now that I think about it, this does not necessarily require a change
in the action definition, as one could also use now() the moment the request
is accepted.

But how does one get a time stamp into the ecto plasm from the (python)
action server around it? Has anyone experienced/thought about/solved
that problem before?

> 3 action server:
> 3.1 start object recognition
> 3.2 collect sensorial data (say 1 sec)
> 3.3 process the data (say 5 sec, because processing usually takes long)
> 3.4 return result
>
> 4 action client: move hand somewhere else.
>
> My point is that, if it was possible to redesign the ObjectRecognition
> action in such a way that we could parallelize steps 3 and 4. Or perhaps
> steps 4 and 3.3/3.4.

This sounds like you want a status string to be added as feedback to the action.
The client could then wait until the status changes from ACQUIRING_DATA to
PROCESSING or something like that.


v4hn

Martin Günther

unread,
Mar 27, 2015, 10:08:15 AM3/27/15
to ros-sig-p...@googlegroups.com
From what I gather from the previous discussion, there are three
mostly orthogonal problems we're trying to solve:

1. How to synchronize the recognition client (which also controls robot
movement) and the recognition server to make sure that:

1a) the recognition server doesn't start data acquisition while the
robot was still moving (because the camera could still be
occluded by moving robot parts, not pointed at the right
direction and so on)

1b) the recognition client knows when data acquisition has finished,
so it can start moving again

2. How to get two parallel recognition servers to process the exact
same input data (for example, for comparison or voting).

3. How to run object recognition on old (recorded) data in a live
system (so, no rosbags).


Re 1.: I like v4hns proposal: Make sure that the input data has a time
stamp later than the call to the recognition action (solves 1a), and
provide status feedback in the action to let the client know when
acquision has finished (solves 2b).

Re 2.: Initially, I liked the idea of adding the input point cloud to
the action goal, but now I agree it would be too complex if we want to
support more or less arbitrary input data formats. Even if we would
only use point clouds and images, we'd also need the synchronized
CameraInfo messages, have problems with compressed image topics and so
on. I have no good solution here. Doesn't the underlying ecto framework
support exactly this (synchronized pipelines of input data,
distribution to multiple parallel recognizers, comparison, voting)?
Perhaps it would be a good idea to implement synchronization on that
level.

Re 3.: Working on data from vastly different times is a bit incompatible
with the basic ROS assumptions. If at all possible, I'd suggest working
around the need for two time lines by either:

- recording a rosbag, then running the whole system on the bag; or
- not only saving the raw data, but also saving the recognition results
with it directly.

In any case, I don't see a change to the action definition that would
help here.


On Thu, 26 Mar 2015 23:24:25 +0100
Vincent Rabaud <vincent...@gmail.com> wrote:

> What we could add is a configuration parameter (like a string). This
> would define which inputs / topics / sensors to use and it would be
> up to server to interpret it.

That sounds a bit underspecified to me. Just having a generic string
field in the message and allowing the client and server to interpret it
however they want kind of defeats the purpose of having an action
specification at all.

> I also agree that a timestamp could be auseful parameter.

Just for clarification: This time stamp is meant to allow the client to
select exactly which point cloud/image the server processes in order
to solve problem #2 (synchronization of two parallel servers), right?

Presumably the client would listen to a point cloud/image topic, pick
one of the time stamps and then send that to the servers.

I don't like that idea. It would require the servers to keep a cache of
past input data, and it would mean that the client has to make a lot of
assumptions about the internals of the servers (which input topic, how
long are their point cloud / image / tf caches etc.).

Cheers,
Martin

Miguel Armando Riem de Oliveira

unread,
Mar 27, 2015, 10:14:10 AM3/27/15
to ros-sig-p...@googlegroups.com
Hi, 
 
On Fri, Mar 27, 2015 at 02:41:19AM -0700, Miguel Armando Riem de Oliveira wrote: 
> In addition to that, it is a matter of philosophy if you will: action 
> servers are supposed to be "blind" wrt the robot in which they operate. 

No, they don't have to be. In your case it would be quite easy 
to add a check for whether or not the arms are tucked. 
If anything, the *client* should be agnostic of the way the server 
implements the action. 

Perhaps I am missing something but are these objectrecognition actions supposed to be 
used in different robot platforms of not? 

If so, they have to be configured somehow ... topic names etc. 

How is this done? ros parameters?


> > 2) *Should* you be able to ask for objects in the past/future? 
> > 
> >    - Asking for objects in the past to me doesn't jive well with ROS 
> >    infrastructure of pub/sub.  Is the object recognition server creating a 
> >    database of all objects detected in the scene?  Should it? How would you 
> >    manage that?  Is it better handled by something else?  My gut feeling says 
> >    that it is probably handled better by something else, giving you more 
> >    control over how/how long/what gets saved. 
> > 
> Well, to the action server it would be the same, meaning it would not know 
> the sensorial data is from the past. It would have to be the client the one 
> to publish a message on a topic with past data (see my idea about the 
> latched msgs, although I see the point that tfs could be a problem) 

It's definitely not the same as you would be missing transforms for old data. 
Especially if you use more than one sensor, how are you going to fuse the 
frames without tf? 

Yes, I agree with you that, if the recognition is going to use the tfs, that would be a problem.
One could playback a bagfile on a special namespace just for feeding that recognition server, but 
I think that would be too complex. So I agree, latched messages do not make sense.
 
> 3 action server: 
>    3.1 start object recognition 
>    3.2 collect sensorial data (say 1 sec) 
>    3.3 process the data (say 5 sec, because processing usually takes long) 
>    3.4 return result 

> 4 action client: move hand somewhere else. 

> My point is that, if it was possible to redesign the ObjectRecognition 
> action in such a way that we could parallelize steps 3 and 4. Or perhaps 
> steps 4 and 3.3/3.4. 

This sounds like you want a status string to be added as feedback to the action. 
The client could then wait until the status changes from ACQUIRING_DATA to 
PROCESSING or something like that. 

Yes, perhaps that would be a good solution. Especially if we don't want to use the functionality of using past data, then the feedback would solve it.

Regards,

Miguel

Allison Thackston

unread,
Mar 27, 2015, 10:44:52 AM3/27/15
to ros-sig-p...@googlegroups.com
They are suppose to be used on different robotic platforms.  The way this is done is typically through launch files and parameters.  The launch file is where topic names get remapped to what the server is expecting.  So if a particular object recognition server uses a stereo pair to do recognition it would have two camera input topics that you would remap to your camera topic names.

Adding in some sort of feedback would be nice.  An enum is quicker/more efficient to process/easier to standardize than an arbitrary string though.  Perhaps we can use both for both standardization and flexibility?  What feedback would be standard across all object recognition servers?

Miguel Armando Riem de Oliveira

unread,
Mar 30, 2015, 6:27:55 AM3/30/15
to ros-sig-p...@googlegroups.com
Hi Allison,

Ok, I see, the action servers get configured on init using ros parameters and remappings to be defined in the launch file. Makes sense.

I am now convinced that it is not a good idea to include the sensor data in the objectrecognition action (because the sensor data might not be the only required information, e.g. tfs).

Thank you all for your valuable input.

Best regards,

Miguel 
Reply all
Reply to author
Forward
0 new messages