OD&A

21 views
Skip to first unread message

Brain Higgins

unread,
Jul 4, 2024, 6:10:43 AM (14 days ago) Jul 4
to hbrob...@googlegroups.com
Happy 4th
What sensor (or combination of 2 sensors ) for object detection and avoidance 

Brian Higgins 
VA Researcher for blind mobility “Sensor Aided Navigation”




Sent from my iPhone 

Alan Federman

unread,
Jul 4, 2024, 11:04:12 AM (14 days ago) Jul 4
to hbrob...@googlegroups.com, Brain Higgins
indoor or outdoor?
 
--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/2F4D7124-310C-40F8-8BB7-BE9C5CB3C987%40comcast.net.

Chris Albertson

unread,
Jul 4, 2024, 2:17:00 PM (14 days ago) Jul 4
to hbrob...@googlegroups.com, Brain Higgins
>>
>> Happy 4th
>> What sensor (or combination of 2 sensors ) for object detection and avoidance

Elon Musk famously said that you can do everything with vision. His “proof” was that human drivers use only vision. People with good eyesight can run on uneven trails and not rip over large rocks. So in theory, All you should need is the video camera that is built into every cell phone.

But the problem with video is that is the VERY hard and expensive to process into usable data. On the other hand 3D LIDAR is much easier to process but the sensor itself is 100X more expensive and 100X larger

So it is a trade-off between the cost and physical size of the device and how hard it is to write the software.

Looks at a “hard” scenario: There is a car approaching to my left and i want to know if I should step off the curb. If I see that the driver is looking at me and she is applying the brakes, I can step off the curb. But I can see 50 feet away and look through the glass windshield. Or I can even see the driver waving at me telling me the go. And ultrasonic “ping” type device has no hope of doiing this. But a Video camera and very powerful AI software could work very well.

On the other hand maybe we don’t need a computer and AI. Because we have a human to do the processing. All we need is to somehow get the vision data to the blind person and he can do the rest. One way might be to convert sight to sound. We can use pitch and volume and stereo sound so put any kind of sound into 3D space. Object could have pitch based iin their distance nd volume based on their size and of course direction based on the direction. It wouild be easy to convert 3D scanning LIDAR data to 3D sound.

But the LIDAR and sound would never see the driver waving at you to to cross the street and could not verify the driver has eye contact with you.

A really big problem if you use a high end sensor is how to comunicate the sensor’s data to the human. For example, I tried putting an object detector on a James Bond film durring a case scene and it detects dozens of objects and gave the loation of each of them but this was dozens of object detections per second. Even when driving a motorcycle at high speed through a crowded marketplace, the detector “saw” hundreds of objects. But what to do with this much data? A complex señor will see thousands of objects in a minute.


I think you have to start with the user interface and specify that in detail. Then work out what sensor can drive that interface.

Stephen Williams

unread,
Jul 11, 2024, 2:15:38 AM (7 days ago) Jul 11
to hbrob...@googlegroups.com

I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding.  At least until those types of cameras become very small & very inexpensive.

Vision will eventually be the main depth sensing & scene understanding mechanism, optionally using whatever else is available.  We can do a big of processing now, and there are some chips that do stereo correlation in-camera now.  I worked at a startup, Pelican Imaging, that used a tight array of color-filtered monochrome cameras, 3x3 or 4x5 on the came circuit board, to get depth + super resolution by correlating images between tightly spaced cameras.  We could produce a pretty good depth map at 30fps on an Android phone.  (We used the CPU with NEON vector instructions, GPU, and the Qualcomm DSP (which few get to program) to do so, and it used too much power: 12 watts at peak.  But it worked!)  I integrated the Movidius MV1 with that as a possible preprocessor: Those chips have 16 or more processors running at full data rates on raw data for a few milliwatts.  And that was 10 years ago.

It should be possible to do pretty well with two or more cameras or even 1 camera, but we have limits in our algorithms now.  And specialized hardware is probably required to get low latency + a reasonable frame rate.  At some point it should get much better, and we probably can keep the power reasonable.  Just maybe not right now.  On the other hand, right now we can ship 1-4 4k video streams at 72fps over wifi low enough latency (<60ms) without a lot of power.

For something not sharable yet, I've been working on 4k x 2 48-72fps HEVC (and alternately the layered codec LCEVC standard) video with hardware compression & decompression over WebRTC (which I've now abandoned because of latency & complexity issues) and WebTransport (HTTP3/TLS/QUIC).  For PC->Quest3 (or 2), I'm getting 60ms round trip for 4k+ stereo (4k+ for each eye) over wifi.  That includes fairly hefty rendering in Unreal Engine for each pair of frames.  The Qualcomm chipsets can encode HEVC, so Android->PC should be similar.

So that allows one or more hefty PCs + GPU, perhaps in the cloud, to process multicamera video for a robot.

(WebTransport is best for a wide range of network transport, but is very hard obtain & use outside of what is built into web browsers.  I spent a LOT of time finding & using what seems to be the only reasonable solution.  Soonish, I'll publish an open source library that makes C-based WebTransport library reasonable & easy to use.  And I have a way of using that in NAT/firewall piercing peer to peer mode, completing the replacement of WebRTC.)


I have been thinking about how ML can be used to enhance camera to point cloud scene understanding.  Training a model with single & dual cameras + ground truth point clouds, labeling, probably using some intermediate representations like visual flow + motion vectors, it should be possible to end up with a visual processor that correctly infers a lot.  Maybe the result of that can be used to create an optimized processing block.  It seems like a combination of generic stereo processing + object / architectural feature repesentation (windows, doors, stop signs, etc.) could be on the right track to create a robust solution.

In searching for relevant papers, it seems evident that the best approach is probably to recognize objects in a scene, including the foreground (ground) & background (sky, trees, etc.), then estimate & track segmentation & depth of those in an informed way.


https://www.researchgate.net/publication/359390057_MonoDTR_Monocular_3D_Object_Detection_with_Depth-Aware_Transformer

"MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features."

https://github.com/kuanchihhuang/MonoDTR

https://www.researchgate.net/publication/362618833_MonoPoly_A_Practical_Monocular_3D_Object_Detector

"3D object detection plays a pivotal role in driver assistance systems and has practical requirements for small storage and fast inference. Monocular 3D detection alternatives abandon the complexity of LiDAR setup and pursues the effectiveness and efficiency of the vision scheme. In this work, we propose a set of anchor-free monocular 3D detectors called MonoPoly based on the keypoint paradigm. Specifically, we design a polynomial feature aggregation sampling module to extract multi-scale context features for auxiliary training and alleviate classification and localization misalignment through an attention-aware loss. Extensive experiments show that the proposed MonoPoly series achieves an excellent trade-off between performance and model size while maintaining real-time efficiency on KITTI and nuScenes datasets."


Keypoint / feature point based systems are 'old school', from vision before pre-LLM / transformer successes.  But those methods could be a way to focus on the parts of an image that are most likely to be interesting.  This group has many papers exploring keypoints + other methods.


Going from images to NeRF to a point cloud is an interesting approach:
https://arxiv.org/html/2404.04875v1

NeRF2Points: Large-Scale Point Cloud Generation From Street Views’ Radiance Field Optimization

"Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility of NeRF: the derivation of point clouds from aggregated urban landscape imagery. The transmutation of street-view data into point clouds is fraught with complexities, attributable to a nexus of interdependent variables. First, high-quality point cloud generation hinges on precise camera poses, yet many datasets suffer from inaccuracies in pose metadata. Also, the standard approach of NeRF is ill-suited for the distinct characteristics of street-view data from autonomous vehicles in vast, open settings. Autonomous vehicle cameras often record with limited overlap, leading to blurring, artifacts, and compromised pavement representation in NeRF-based point clouds. In this paper, we present NeRF2Points, a tailored NeRF variant for urban point cloud synthesis, notable for its high-quality output from RGB inputs alone. Our paper is supported by a bespoke, high-resolution 20-kilometer urban street dataset, designed for point cloud generation and evaluation. NeRF2Points adeptly navigates the inherent challenges of NeRF-based point cloud synthesis through the implementation of the following strategic innovations: (1) Integration of Weighted Iterative Geometric Optimization (WIGO) and Structure from Motion (SfM) for enhanced camera pose accuracy, elevating street-view data precision. (2) Layered Perception and Integrated Modeling (LPiM) is designed for distinct radiance field modeling in urban environments, resulting in coherent point cloud representations. (3) Geometric-aware consistency regularization to rectify geometric distortions in sparse street-view data, confirming superiority of NeRF2Points through empirical validation."


sdw

--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

Chris Albertson

unread,
Jul 11, 2024, 3:08:15 AM (7 days ago) Jul 11
to hbrob...@googlegroups.com
Have you seen the recent YouTube vidios of Tesla cars with a LIDAR mounted above the windshield.    Tesla is not planning on selling cars with LIDAR. But they are using the LIDAR data to collect ground truth used to train the video processors.   I think the goal is to have a neural network trained to convert vidio to point cloud.

On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding.  At least until those types of cameras become very small & very inexpensive….

Stephen Williams

unread,
Jul 11, 2024, 3:31:00 AM (7 days ago) Jul 11
to hbrob...@googlegroups.com

Gathering data for training makes sense.  And some applications can afford the cost, complexity, and space: heavy equipment for instance.

It might even make sense to have stationary lidar that vehicles or other robots can get a feed from as needed: Intersections, tunnels, pedestrian-heavy crosswalks.

sdw

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Brain Higgins

unread,
Jul 11, 2024, 7:24:24 AM (7 days ago) Jul 11
to hbrob...@googlegroups.com
OD…(H)…A

With the smart cane the Human processes the avoidance…

Brian Higgins 
VA Researcher for blind mobility “Sensor Aided Navigation”




Sent from my iPhone 

On Jul 11, 2024, at 3:31 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:



Chris Albertson

unread,
Jul 11, 2024, 12:56:59 PM (7 days ago) Jul 11
to hbrob...@googlegroups.com

On Jul 11, 2024, at 12:30 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

Gathering data for training makes sense.  And some applications can afford the cost, complexity, and space: heavy equipment for instance.

It might even make sense to have stationary lidar that vehicles or other robots can get a feed from as needed: Intersections, tunnels, pedestrian-heavy crosswalks.


Yes,  I have thought for a while that Tesla needs to first implement good car-to-car sensor sharing or at least car-to-car situational awareness sharing.  For example if one car drives through an intersection and can breifly see down the cross street, why not braodcast what it sees to help the car behind.   At least here in the So Cal beach cities there are so many Tesla cars that there are always a few within sight.

Secondly, once you have this implemented you can build a small box with just the camera and computer and hang that on traffic signals and it can continously broadcast what it sees.  Then every car that cars to listen can “see around corners”.

Cameras and computers on traffic lights is not so hard or expensive to do.  Here is Redondo Beach, They have started to fix the problem that the car-detectors at intersections to trip the signals does not work for bicycles.  They are using video now with object detection.    The law in CA is that if a traffic light is “broken” you can drive through the red light, after stopping.  The signals that don’t detect bikes are defined as broken, bikes can legally run the red light.   So the city is fixing the sensors by going with video.  Once you have video on traffic signals, the next step seems like it should be inexpensive, broadcast the data.

Same thing could be applied to domestic robots.  I think data sharing between robots and fixed-mount security cameras could improve performance and reduce cost



sdw

On 7/11/24 12:07 AM, Chris Albertson wrote:
Have you seen the recent YouTube vidios of Tesla cars with a LIDAR mounted above the windshield.    Tesla is not planning on selling cars with LIDAR. But they are using the LIDAR data to collect ground truth used to train the video processors.   I think the goal is to have a neural network trained to convert vidio to point cloud.

On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding.  At least until those types of cameras become very small & very inexpensive….

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/C1E0D06C-8DFB-451C-94B8-28D02ED0439F%40gmail.com.
--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Stephen Williams

unread,
Jul 11, 2024, 2:17:53 PM (7 days ago) Jul 11
to hbrob...@googlegroups.com


On 7/11/24 9:56 AM, Chris Albertson wrote:


On Jul 11, 2024, at 12:30 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

Gathering data for training makes sense.  And some applications can afford the cost, complexity, and space: heavy equipment for instance.

It might even make sense to have stationary lidar that vehicles or other robots can get a feed from as needed: Intersections, tunnels, pedestrian-heavy crosswalks.


Yes,  I have thought for a while that Tesla needs to first implement good car-to-car sensor sharing or at least car-to-car situational awareness sharing.  For example if one car drives through an intersection and can breifly see down the cross street, why not braodcast what it sees to help the car behind.   At least here in the So Cal beach cities there are so many Tesla cars that there are always a few within sight.

Secondly, once you have this implemented you can build a small box with just the camera and computer and hang that on traffic signals and it can continously broadcast what it sees.  Then every car that cars to listen can “see around corners”.

Yes, perfect.

Automakers are talking about things like this.  But they will likely take forever to decide on a solution.  There is an interesting opportunity to create something open source that works well in a variety of circumstances.

Pick a good protocol, standard data formats, global public decentralized registry for GPS location, status, appropriate use, etc.

Have to solve the problem of rapid network acquisition as you pass through intersections or are near a vehicle.  Perhaps some mesh network techniques are lightweight enough.  Or eventually get cameras to send over local cellular links to avoid bottlenecks.

Or simply rely on existing mobile & fixed Internet connectivity, let the mobile companies figure out the fastest data path.  Probably need an alternative for remote areas or as a fallback, but dense areas should have good networking.

Might want cloud caching for various kinds of optimization.



Cameras and computers on traffic lights is not so hard or expensive to do.  Here is Redondo Beach, They have started to fix the problem that the car-detectors at intersections to trip the signals does not work for bicycles.  They are using video now with object detection.    The law in CA is that if a traffic light is “broken” you can drive through the red light, after stopping.  The signals that don’t detect bikes are defined as broken, bikes can legally run the red light.   So the city is fixing the sensors by going with video.  Once you have video on traffic signals, the next step seems like it should be inexpensive, broadcast the data.

Nice.


Same thing could be applied to domestic robots.  I think data sharing between robots and fixed-mount security cameras could improve performance and reduce cost


Yes.


sdw




sdw

On 7/11/24 12:07 AM, Chris Albertson wrote:
Have you seen the recent YouTube vidios of Tesla cars with a LIDAR mounted above the windshield.    Tesla is not planning on selling cars with LIDAR. But they are using the LIDAR data to collect ground truth used to train the video processors.   I think the goal is to have a neural network trained to convert vidio to point cloud.

On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding.  At least until those types of cameras become very small & very inexpensive….

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/C1E0D06C-8DFB-451C-94B8-28D02ED0439F%40gmail.com.
--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/441fbf08-f934-4d5d-ba44-e1d41e430c1a%40lig.net.

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages