--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/2F4D7124-310C-40F8-8BB7-BE9C5CB3C987%40comcast.net.
I view things like Lidar & time of flight cameras as being
proxies for future much better video processing &
understanding. At least until those types of cameras become very
small & very inexpensive.
Vision will eventually be the main depth sensing & scene
understanding mechanism, optionally using whatever else is
available. We can do a big of processing now, and there are some
chips that do stereo correlation in-camera now. I worked at a
startup, Pelican Imaging, that used a tight array of
color-filtered monochrome cameras, 3x3 or 4x5 on the came circuit
board, to get depth + super resolution by correlating images
between tightly spaced cameras. We could produce a pretty good
depth map at 30fps on an Android phone. (We used the CPU with
NEON vector instructions, GPU, and the Qualcomm DSP (which few get
to program) to do so, and it used too much power: 12 watts at
peak. But it worked!) I integrated the Movidius MV1 with that as
a possible preprocessor: Those chips have 16 or more processors
running at full data rates on raw data for a few milliwatts. And
that was 10 years ago.
It should be possible to do pretty well with two or more cameras or even 1 camera, but we have limits in our algorithms now. And specialized hardware is probably required to get low latency + a reasonable frame rate. At some point it should get much better, and we probably can keep the power reasonable. Just maybe not right now. On the other hand, right now we can ship 1-4 4k video streams at 72fps over wifi low enough latency (<60ms) without a lot of power.
For something not sharable yet, I've been working on 4k x 2 48-72fps HEVC (and alternately the layered codec LCEVC standard) video with hardware compression & decompression over WebRTC (which I've now abandoned because of latency & complexity issues) and WebTransport (HTTP3/TLS/QUIC). For PC->Quest3 (or 2), I'm getting 60ms round trip for 4k+ stereo (4k+ for each eye) over wifi. That includes fairly hefty rendering in Unreal Engine for each pair of frames. The Qualcomm chipsets can encode HEVC, so Android->PC should be similar.
So that allows one or more hefty PCs + GPU, perhaps in the cloud, to process multicamera video for a robot.(WebTransport is best for a wide range of network transport, but
is very hard obtain & use outside of what is built into web
browsers. I spent a LOT of time finding & using what seems to
be the only reasonable solution. Soonish, I'll publish an open
source library that makes C-based WebTransport library reasonable
& easy to use. And I have a way of using that in NAT/firewall
piercing peer to peer mode, completing the replacement of WebRTC.)
I have been thinking about how ML can be used to enhance camera to point cloud scene understanding. Training a model with single & dual cameras + ground truth point clouds, labeling, probably using some intermediate representations like visual flow + motion vectors, it should be possible to end up with a visual processor that correctly infers a lot. Maybe the result of that can be used to create an optimized processing block. It seems like a combination of generic stereo processing + object / architectural feature repesentation (windows, doors, stop signs, etc.) could be on the right track to create a robust solution.
In searching for relevant papers, it seems evident that the best approach is probably to recognize objects in a scene, including the foreground (ground) & background (sky, trees, etc.), then estimate & track segmentation & depth of those in an informed way.
"MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. It mainly consists of two components: (1) the Depth-Aware Feature Enhancement (DFE) module that implicitly learns depth-aware features with auxiliary supervision without requiring extra computation, and (2) the Depth-Aware Transformer (DTR) module that globally integrates context- and depth-aware features."
https://github.com/kuanchihhuang/MonoDTR
https://www.researchgate.net/publication/362618833_MonoPoly_A_Practical_Monocular_3D_Object_Detector
"3D object detection plays a pivotal role in driver assistance systems and has practical requirements for small storage and fast inference. Monocular 3D detection alternatives abandon the complexity of LiDAR setup and pursues the effectiveness and efficiency of the vision scheme. In this work, we propose a set of anchor-free monocular 3D detectors called MonoPoly based on the keypoint paradigm. Specifically, we design a polynomial feature aggregation sampling module to extract multi-scale context features for auxiliary training and alleviate classification and localization misalignment through an attention-aware loss. Extensive experiments show that the proposed MonoPoly series achieves an excellent trade-off between performance and model size while maintaining real-time efficiency on KITTI and nuScenes datasets."
Keypoint / feature point based systems are 'old school', from
vision before pre-LLM / transformer successes. But those methods
could be a way to focus on the parts of an image that are most
likely to be interesting. This group has many papers exploring
keypoints + other methods.
Going from images to NeRF to a point cloud is an interesting
approach:
https://arxiv.org/html/2404.04875v1
NeRF2Points: Large-Scale Point Cloud Generation From Street
Views’ Radiance Field Optimization
"Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility of NeRF: the derivation of point clouds from aggregated urban landscape imagery. The transmutation of street-view data into point clouds is fraught with complexities, attributable to a nexus of interdependent variables. First, high-quality point cloud generation hinges on precise camera poses, yet many datasets suffer from inaccuracies in pose metadata. Also, the standard approach of NeRF is ill-suited for the distinct characteristics of street-view data from autonomous vehicles in vast, open settings. Autonomous vehicle cameras often record with limited overlap, leading to blurring, artifacts, and compromised pavement representation in NeRF-based point clouds. In this paper, we present NeRF2Points, a tailored NeRF variant for urban point cloud synthesis, notable for its high-quality output from RGB inputs alone. Our paper is supported by a bespoke, high-resolution 20-kilometer urban street dataset, designed for point cloud generation and evaluation. NeRF2Points adeptly navigates the inherent challenges of NeRF-based point cloud synthesis through the implementation of the following strategic innovations: (1) Integration of Weighted Iterative Geometric Optimization (WIGO) and Structure from Motion (SfM) for enhanced camera pose accuracy, elevating street-view data precision. (2) Layered Perception and Integrated Modeling (LPiM) is designed for distinct radiance field modeling in urban environments, resulting in coherent point cloud representations. (3) Geometric-aware consistency regularization to rectify geometric distortions in sparse street-view data, confirming superiority of NeRF2Points through empirical validation."
sdw
Stephen D.
Williams
Founder: VolksDroid, Blue Scholar Foundation |
On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding. At least until those types of cameras become very small & very inexpensive….
Gathering data for training makes sense. And some applications can afford the cost, complexity, and space: heavy equipment for instance.
It might even make sense to have stationary lidar that vehicles
or other robots can get a feed from as needed: Intersections,
tunnels, pedestrian-heavy crosswalks.
sdw
--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/C1E0D06C-8DFB-451C-94B8-28D02ED0439F%40gmail.com.
On Jul 11, 2024, at 3:31 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/441fbf08-f934-4d5d-ba44-e1d41e430c1a%40lig.net.
On Jul 11, 2024, at 12:30 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:Gathering data for training makes sense. And some applications can afford the cost, complexity, and space: heavy equipment for instance.
It might even make sense to have stationary lidar that vehicles or other robots can get a feed from as needed: Intersections, tunnels, pedestrian-heavy crosswalks.
sdw
On 7/11/24 12:07 AM, Chris Albertson wrote:
Have you seen the recent YouTube vidios of Tesla cars with a LIDAR mounted above the windshield. Tesla is not planning on selling cars with LIDAR. But they are using the LIDAR data to collect ground truth used to train the video processors. I think the goal is to have a neural network trained to convert vidio to point cloud.
--
On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:
I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding. At least until those types of cameras become very small & very inexpensive….
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/C1E0D06C-8DFB-451C-94B8-28D02ED0439F%40gmail.com.
--
Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/441fbf08-f934-4d5d-ba44-e1d41e430c1a%40lig.net.
On Jul 11, 2024, at 12:30 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:
Gathering data for training makes sense. And some applications can afford the cost, complexity, and space: heavy equipment for instance.
It might even make sense to have stationary lidar that vehicles or other robots can get a feed from as needed: Intersections, tunnels, pedestrian-heavy crosswalks.
Yes, I have thought for a while that Tesla needs to first implement good car-to-car sensor sharing or at least car-to-car situational awareness sharing. For example if one car drives through an intersection and can breifly see down the cross street, why not braodcast what it sees to help the car behind. At least here in the So Cal beach cities there are so many Tesla cars that there are always a few within sight.
Secondly, once you have this implemented you can build a small box with just the camera and computer and hang that on traffic signals and it can continously broadcast what it sees. Then every car that cars to listen can “see around corners”.
Yes, perfect.
Automakers are talking about things like this. But they will
likely take forever to decide on a solution. There is an
interesting opportunity to create something open source that works
well in a variety of circumstances.
Pick a good protocol, standard data formats, global public decentralized registry for GPS location, status, appropriate use, etc.
Have to solve the problem of rapid network acquisition as you pass through intersections or are near a vehicle. Perhaps some mesh network techniques are lightweight enough. Or eventually get cameras to send over local cellular links to avoid bottlenecks.
Or simply rely on existing mobile & fixed Internet
connectivity, let the mobile companies figure out the fastest data
path. Probably need an alternative for remote areas or as a
fallback, but dense areas should have good networking.
Might want cloud caching for various kinds of optimization.
Cameras and computers on traffic lights is not so hard or expensive to do. Here is Redondo Beach, They have started to fix the problem that the car-detectors at intersections to trip the signals does not work for bicycles. They are using video now with object detection. The law in CA is that if a traffic light is “broken” you can drive through the red light, after stopping. The signals that don’t detect bikes are defined as broken, bikes can legally run the red light. So the city is fixing the sensors by going with video. Once you have video on traffic signals, the next step seems like it should be inexpensive, broadcast the data.
Nice.
Same thing could be applied to domestic robots. I think data sharing between robots and fixed-mount security cameras could improve performance and reduce cost
Yes.
sdw
sdw
On 7/11/24 12:07 AM, Chris Albertson wrote:
Have you seen the recent YouTube vidios of Tesla cars with a LIDAR mounted above the windshield. Tesla is not planning on selling cars with LIDAR. But they are using the LIDAR data to collect ground truth used to train the video processors. I think the goal is to have a neural network trained to convert vidio to point cloud.
--
On Jul 10, 2024, at 11:15 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:
I view things like Lidar & time of flight cameras as being proxies for future much better video processing & understanding. At least until those types of cameras become very small & very inexpensive….
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/C1E0D06C-8DFB-451C-94B8-28D02ED0439F%40gmail.com.
--
Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/441fbf08-f934-4d5d-ba44-e1d41e430c1a%40lig.net.
--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/96C1FBEA-9CA6-490F-8007-EF8150194CF6%40gmail.com.