Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Camera interfaces

80 views
Skip to first unread message

Don Y

unread,
Dec 29, 2022, 8:17:07 AM12/29/22
to
ISTR playing with de-encapsulated DRAMs as image sensors
back in school (DRAM being relatively new technology, then).

But, most cameras seem to have (bit- or word-) serial interfaces
nowadays. Are there any (mainstream/high volume) devices that
"look" like a chunk of memory, in their native form?

Dimiter_Popoff

unread,
Dec 29, 2022, 8:33:53 AM12/29/22
to
Hah, Don, consider yourself lucky if you find a camera you have
enough documentation to use at all, serial or whatever.

The MIPI standards are only for politburo members (last time I looked
you need to make several millions annually to be able to *apply*
for membership, which of course costs thousands, annually again).

Not use about USB, perhaps USB cameras are covered in the standard
(yet to deal with that one).

Richard Damon

unread,
Dec 29, 2022, 12:06:40 PM12/29/22
to
Using a DRAM in that manner would only give you a single bit value for
each pixel (maybe some more modern memories store multiple bits in a
cell so you get a few grey levels).

There are some CMOS sensors that let you address pixels individually and
in a random order (like you got with the DRAM) but by its nature, such a
readout method tends to be slow, and space inefficient, so these
interfaces tend to be only available on smaller camera arrays.

That is why most sensors read out via row/column shift registers to a
pixel serial (maybe multiple pixels per clock) output, and if the camera
includes its own A/D conversion, might serialize the results to minimize
interconnect.

Richard Damon

unread,
Dec 29, 2022, 12:21:50 PM12/29/22
to
On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
> On 12/29/2022 15:16, Don Y wrote:
>> ISTR playing with de-encapsulated DRAMs as image sensors
>> back in school (DRAM being relatively new technology, then).
>>
>> But, most cameras seem to have (bit- or word-) serial interfaces
>> nowadays.  Are there any (mainstream/high volume) devices that
>> "look" like a chunk of memory, in their native form?
>>
>
> Hah, Don, consider yourself lucky if you find a camera you have
> enough documentation to use at all, serial or whatever.
>
> The MIPI standards are only for politburo members (last time I looked
> you need to make several millions annually to be able to *apply*
> for membership, which of course costs thousands, annually again).

If you are looking for the very latest standards, yes. Enough data is
out there to handle a lot of basic MIPI operations. Since the small
player isn't going to be trying to implement the low level interface
themselves (or at least shouldn't be trying to), unless you are trying
to work with a bleeding edge camera (which you probably can't actually
buy if you are a small player) you can tend to find enough information
to use the camera.

My experiance is if you can actually buy the camera normally, there will
be the data available to use it. The big problem is "Grey Market"
cameras, via unauthorized distributors you are at the mercy of the
distributor to get you the needed data.

>
> Not use about USB, perhaps USB cameras are covered in the standard
> (yet to deal with that one).

There is a USB video standard, and many USB cameras can just be plugged
in and used.

Dimiter_Popoff

unread,
Dec 29, 2022, 12:45:44 PM12/29/22
to
On 12/29/2022 19:21, Richard Damon wrote:
> On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
>> On 12/29/2022 15:16, Don Y wrote:
>>> ISTR playing with de-encapsulated DRAMs as image sensors
>>> back in school (DRAM being relatively new technology, then).
>>>
>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>> nowadays.  Are there any (mainstream/high volume) devices that
>>> "look" like a chunk of memory, in their native form?
>>>
>>
>> Hah, Don, consider yourself lucky if you find a camera you have
>> enough documentation to use at all, serial or whatever.
>>
>> The MIPI standards are only for politburo members (last time I looked
>> you need to make several millions annually to be able to *apply*
>> for membership, which of course costs thousands, annually again).
>
> If you are looking for the very latest standards, yes. Enough data is
> out there to handle a lot of basic MIPI operations. Since the small
> player isn't going to be trying to implement the low level interface
> themselves (or at least shouldn't be trying to),

So how does one use a MIPI camera without using the low level interface?

> unless you are trying
> to work with a bleeding edge camera (which you probably can't actually
> buy if you are a small player) you can tend to find enough information
> to use the camera.

That is fair enough, as long as we are talking about some internal
sensor specifics of the "bleeding edge" cameras.

>
> My experiance is if you can actually buy the camera normally, there will
> be the data available to use it.

That's really reassuring. I am more interested in talking to MIPI
display modules than to cameras (at least the sequence is this) but
still.

> The big problem is "Grey Market"
> cameras, via unauthorized distributors you are at the mercy of the
> distributor to get you the needed data.

Don't they conform to the MIPI standard? (which I have no access to).

>
>>
>> Not use about USB, perhaps USB cameras are covered in the standard
>> (yet to deal with that one).
>
> There is a USB video standard, and many USB cameras can just be plugged
> in and used.

OK, I thought I had seen that some years ago. Might be an escape (though
cameras found in phones and tablets etc. are probably all MIPI).

Rick C

unread,
Dec 29, 2022, 1:20:56 PM12/29/22
to
On Thursday, December 29, 2022 at 12:06:40 PM UTC-5, Richard Damon wrote:
> On 12/29/22 8:16 AM, Don Y wrote:
> > ISTR playing with de-encapsulated DRAMs as image sensors
> > back in school (DRAM being relatively new technology, then).
> >
> > But, most cameras seem to have (bit- or word-) serial interfaces
> > nowadays. Are there any (mainstream/high volume) devices that
> > "look" like a chunk of memory, in their native form?
> >
> Using a DRAM in that manner would only give you a single bit value for
> each pixel (maybe some more modern memories store multiple bits in a
> cell so you get a few grey levels).

You could probably modulate the timing of the scans to get a range of grey scale, even if small. Let the chip integrate for 1 unit, 2 units, 4 units, etc. of time. I'm assuming light responsiveness of the human eye is logarithmic, rather than linear. If not, then 1, 2, 3, 4 units of time. Even 16 levels of grey is much better than black and white.

It would be a bit of processing to translate the thermometer codes into pixel values, but just time consuming, not hard.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Richard Damon

unread,
Dec 29, 2022, 1:48:17 PM12/29/22
to
On 12/29/22 12:45 PM, Dimiter_Popoff wrote:
> On 12/29/2022 19:21, Richard Damon wrote:
>> On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
>>> On 12/29/2022 15:16, Don Y wrote:
>>>> ISTR playing with de-encapsulated DRAMs as image sensors
>>>> back in school (DRAM being relatively new technology, then).
>>>>
>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>> nowadays.  Are there any (mainstream/high volume) devices that
>>>> "look" like a chunk of memory, in their native form?
>>>>
>>>
>>> Hah, Don, consider yourself lucky if you find a camera you have
>>> enough documentation to use at all, serial or whatever.
>>>
>>> The MIPI standards are only for politburo members (last time I looked
>>> you need to make several millions annually to be able to *apply*
>>> for membership, which of course costs thousands, annually again).
>>
>> If you are looking for the very latest standards, yes. Enough data is
>> out there to handle a lot of basic MIPI operations. Since the small
>> player isn't going to be trying to implement the low level interface
>> themselves (or at least shouldn't be trying to),
>
> So how does one use a MIPI camera without using the low level interface?

You use a chip that has a mipi interface, either a CPU or FPGA with a
built in MIPI interface or a MIPI converter chip that converts the MIPI
interface into something you can deal with.

>
>> unless you are trying to work with a bleeding edge camera (which you
>> probably can't actually buy if you are a small player) you can tend to
>> find enough information to use the camera.
>
> That is fair enough, as long as we are talking about some internal
> sensor specifics of the "bleeding edge" cameras.

Bleeding Edge cameras/displays may need newer versions of MIPI than may
be easy to find in the consumer market. They may need bleeding edge
processors.

As I mention below, more important are the configuration registers,
which might be harder to get for bleeding edge parts. This is often
proprietary, as knowing what is adjustable is often part of the secret
sauce for those cameras.

>
>>
>> My experiance is if you can actually buy the camera normally, there
>> will be the data available to use it.
>
> That's really reassuring. I am more interested in talking to MIPI
> display modules than to cameras (at least the sequence is this) but
> still.

So you want a chip with MIPI DSI capability built in, or a convert chip.

>
>> The big problem is "Grey Market" cameras, via unauthorized
>> distributors you are at the mercy of the distributor to get you the
>> needed data.
>
> Don't they conform to the MIPI standard? (which I have no access to).

Yes, but MIPI doesn't define the more important configuration registers
you need to setup for the device.

MIPI is a video data protocol, it has limited configuration capability.
Generally there will be something like an I2C bus to the camera that is
used to configure it.

(There might be a way to tunnel the configuration over the MIPI lines,
but I doubt MIPI defines a configuration protocol)

>
>>
>>>
>>> Not use about USB, perhaps USB cameras are covered in the standard
>>> (yet to deal with that one).
>>
>> There is a USB video standard, and many USB cameras can just be
>> plugged in and used.
>
> OK, I thought I had seen that some years ago. Might be an escape (though
> cameras found in phones and tablets etc. are probably all MIPI).
>

Yes, the cameras in phones will almost always be MIPI. There is no need
for them to use a USB Video connection, that is just way too much overhead.

In fact, a lot of stand alone USB Cameras might have a MIPI based camera
core internally, with a MIPI to USB inteface to send the data to the Host.

Richard Damon

unread,
Dec 29, 2022, 1:58:46 PM12/29/22
to
Yes, at a drastic reduction in frame rate.

Power of two spacing will show a lot of banding in the image, but 16
levels spaces at about 1.4x exposure steps might be acceptable. (2**16
dynamic range is extream and well beyond even what "HDR" video can deal
with)

Don Y

unread,
Dec 29, 2022, 2:27:10 PM12/29/22
to
On 12/29/2022 10:06 AM, Richard Damon wrote:
> On 12/29/22 8:16 AM, Don Y wrote:
>> ISTR playing with de-encapsulated DRAMs as image sensors
>> back in school (DRAM being relatively new technology, then).
>>
>> But, most cameras seem to have (bit- or word-) serial interfaces
>> nowadays.  Are there any (mainstream/high volume) devices that
>> "look" like a chunk of memory, in their native form?
>
> Using a DRAM in that manner would only give you a single bit value for each
> pixel (maybe some more modern memories store multiple bits in a cell so you get
> a few grey levels).

I mentioned the DRAM reference only as an exemplar of how a "true"
parallel, random access interface could exist.

> There are some CMOS sensors that let you address pixels individually and in a
> random order (like you got with the DRAM) but by its nature, such a readout
> method tends to be slow, and space inefficient, so these interfaces tend to be
> only available on smaller camera arrays.

But, if you are processing the image, such an approach can lead to
higher throughput than having to transfer a serial data stream into
memory (thus consuming memory bandwidth).

> That is why most sensors read out via row/column shift registers to a pixel
> serial (maybe multiple pixels per clock) output, and if the camera includes its
> own A/D conversion, might serialize the results to minimize interconnect.

Yes, but then you have to store it in memory in order to examine it.
I.e., if your goal isn't just to pass the image out to a display,
then having to unpack the serial stream into RAM is an added cost.

Don Y

unread,
Dec 29, 2022, 2:29:58 PM12/29/22
to
I built my prototypes (proof-of-principle) using COTS USB cameras.
But, getting the data out of the serial data stream and into RAM so
it can be analyzed consumes memory bandwidth.

I'm currently trying to sort out an approximate cost factor "per
camera" (per video stream) and looking for ways that I can cut costs
(memory bandwidth requirements) to allow greater numbers of
cameras or higher frame rates.

Dimiter_Popoff

unread,
Dec 29, 2022, 3:11:42 PM12/29/22
to
On 12/29/2022 20:48, Richard Damon wrote:
> On 12/29/22 12:45 PM, Dimiter_Popoff wrote:
>> On 12/29/2022 19:21, Richard Damon wrote:
>>> On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
>>>> On 12/29/2022 15:16, Don Y wrote:
>>>>> ISTR playing with de-encapsulated DRAMs as image sensors
>>>>> back in school (DRAM being relatively new technology, then).
>>>>>
>>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>>> nowadays.  Are there any (mainstream/high volume) devices that
>>>>> "look" like a chunk of memory, in their native form?
>>>>>
>>>>
>>>> Hah, Don, consider yourself lucky if you find a camera you have
>>>> enough documentation to use at all, serial or whatever.
>>>>
>>>> The MIPI standards are only for politburo members (last time I looked
>>>> you need to make several millions annually to be able to *apply*
>>>> for membership, which of course costs thousands, annually again).
>>>
>>> If you are looking for the very latest standards, yes. Enough data is
>>> out there to handle a lot of basic MIPI operations. Since the small
>>> player isn't going to be trying to implement the low level interface
>>> themselves (or at least shouldn't be trying to),
>>
>> So how does one use a MIPI camera without using the low level interface?
>
> You use a chip that has a mipi interface, either a CPU or FPGA with a
> built in MIPI interface or a MIPI converter chip that converts the MIPI
> interface into something you can deal with.

An FPGA with MIPI would do, I have not looked for one yet.

>
>>
>>> unless you are trying to work with a bleeding edge camera (which you
>>> probably can't actually buy if you are a small player) you can tend
>>> to find enough information to use the camera.
>>
>> That is fair enough, as long as we are talking about some internal
>> sensor specifics of the "bleeding edge" cameras.
>
> Bleeding Edge cameras/displays may need newer versions of MIPI than may
> be easy to find in the consumer market. They may need bleeding edge
> processors.

Well a 64 bit GHz range 4 or 8 core power architecture part should be
plenty. But I am not after bleeding edge cameras, a decent one I
can control will do.

>
> As I mention below, more important are the configuration registers,
> which might be harder to get for bleeding edge parts. This is often
> proprietary, as knowing what is adjustable is often part of the secret
> sauce for those cameras.

Do you get that sort of data for decent cameras? Sort of like how
to focus it etc.? Or do you have to rely on black box "converters",
like with wifi modules which won't let you get around their tcp/ip
stack?

>>>
>>> My experiance is if you can actually buy the camera normally, there
>>> will be the data available to use it.
>>
>> That's really reassuring. I am more interested in talking to MIPI
>> display modules than to cameras (at least the sequence is this) but
>> still.
>
> So you want a chip with MIPI DSI capability built in, or a convert chip.

Not really, no. I want to be able to put the framebuffer data into
the display like I have been doing with RGB, hsync, vsync etc., via
a parallel or lvds interface. Is there enough info out there how to
do this with an fpga? I think I have enough info to do hdmi this way,
but no MIPI. Well, my guess is that pixel data will still be pixel
data etc., can't be that hard.


Richard Damon

unread,
Dec 29, 2022, 4:09:51 PM12/29/22
to
On 12/29/22 2:26 PM, Don Y wrote:
> On 12/29/2022 10:06 AM, Richard Damon wrote:
>> On 12/29/22 8:16 AM, Don Y wrote:
>>> ISTR playing with de-encapsulated DRAMs as image sensors
>>> back in school (DRAM being relatively new technology, then).
>>>
>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>> nowadays.  Are there any (mainstream/high volume) devices that
>>> "look" like a chunk of memory, in their native form?
>>
>> Using a DRAM in that manner would only give you a single bit value for
>> each pixel (maybe some more modern memories store multiple bits in a
>> cell so you get a few grey levels).
>
> I mentioned the DRAM reference only as an exemplar of how a "true"
> parallel, random access interface could exist.

Right, and cameras based on parallel random access do exist, but tend to
be on the smaller and slower end of the spectrum.

>
>> There are some CMOS sensors that let you address pixels individually
>> and in a random order (like you got with the DRAM) but by its nature,
>> such a readout method tends to be slow, and space inefficient, so
>> these interfaces tend to be only available on smaller camera arrays.
>
> But, if you are processing the image, such an approach can lead to
> higher throughput than having to transfer a serial data stream into
> memory (thus consuming memory bandwidth).

My guess is that in almost all cases, the need to send the address to
teh camera and then get back the pixel value is going to use up more
total bandwidth than getting the image in a stream. The one exception
would be if you need just a very small percentage of the array data, and
it is scattered over the array so a Region of Interest operation can't
be used.

>
>> That is why most sensors read out via row/column shift registers to a
>> pixel serial (maybe multiple pixels per clock) output, and if the
>> camera includes its own A/D conversion, might serialize the results to
>> minimize interconnect.
>
> Yes, but then you have to store it in memory in order to examine it.
> I.e., if your goal isn't just to pass the image out to a display,
> then having to unpack the serial stream into RAM is an added cost.
>

Unless you make sure you get a camera with the same image format and
timing as your display.

Don Y

unread,
Dec 29, 2022, 5:57:14 PM12/29/22
to
No, you're missing the nature of the DRAM example.

You don't "send" the address of the memory cell desired *to* the DRAM.
You simply *address* the memory cell, directly. I.e., if there are
N locations in the DRAM, then N addresses in your address space are
consumed by it; one for each location in the array.

I'm looking for *that* sort of "direct access" in a camera.

I could *emulate* it by building a module that implements <whatever>
interface to <whichever> camera and deserializes the data into a
RAM. Then, mapping that *entire* RAM into the address space of the
host processor.

(Keeping the RAM updated would require a pseudo dual-ported architecture;
possibly toggling between an "active" RAM and an "updated" RAM so that
the full bandwidth of the RAM was available to the host)

Having the host processor (DMA, etc.) perform this task means it loses
bandwidth to the "deserialization" activity.

>>> That is why most sensors read out via row/column shift registers to a pixel
>>> serial (maybe multiple pixels per clock) output, and if the camera includes
>>> its own A/D conversion, might serialize the results to minimize interconnect.
>>
>> Yes, but then you have to store it in memory in order to examine it.
>> I.e., if your goal isn't just to pass the image out to a display,
>> then having to unpack the serial stream into RAM is an added cost.
>
> Unless you make sure you get a camera with the same image format and timing as
> your display.

I typically don't "display" the images captured. Rather, I use the
cameras as sensors: is there anything in the path of the closing
(or opening) garage door that should cause me to inhibit/abort
those actions? has the mail truck appeared at the mailbox, yet,
today? *who* is standing at the front door?

Dimiter_Popoff

unread,
Dec 29, 2022, 6:14:49 PM12/29/22
to
Well of course but are you sure you can really win much? At first
glance you'd be able to halve the memory bandwidth. But then you may
run into problems with "doppler" kind of effects (clearly not Doppler
but you get the idea) if you access the frame being acquired; so you'll
want that double buffering you are talking about elsewhere (one frame
being acquired and one having been acquired prior to that). Which would
mean that somewhere something will have to do the copying you want to
avoid...
Since you have already done it with USB cameras I think the practical
way is to just keep doing it this way, may be not USB if you can
find some more economic way to do it, MIPI or whatever.

Richard Damon

unread,
Dec 29, 2022, 7:32:10 PM12/29/22
to
Not Bleeding Edge in processor power, but in MIPI interfaces. I don't
know it the latest cameras are using faster version of the MIPI
interface to move the pixels faster. If so, you need a chip with that
faster grade MIPI interface.

>
>>
>> As I mention below, more important are the configuration registers,
>> which might be harder to get for bleeding edge parts. This is often
>> proprietary, as knowing what is adjustable is often part of the secret
>> sauce for those cameras.
>
> Do you get that sort of data for decent cameras? Sort of like how
> to focus it etc.? Or do you have to rely on black box "converters",
> like with wifi modules which won't let you get around their tcp/ip
> stack?

I haven't heard of team members having trouble getting specs for
actually available product.

No, we are a bit bigger than teh "hobbiest" market, but no where near
the big boys. Our volumes would be in 1000s in some cases.

>
>>>>
>>>> My experiance is if you can actually buy the camera normally, there
>>>> will be the data available to use it.
>>>
>>> That's really reassuring. I am more interested in talking to MIPI
>>> display modules than to cameras (at least the sequence is this) but
>>> still.
>>
>> So you want a chip with MIPI DSI capability built in, or a convert chip.
>
> Not really, no. I want to be able to put the framebuffer data into
> the display like I have been doing with RGB, hsync, vsync etc., via
> a parallel or lvds interface. Is there enough info out there how to
> do this with an fpga? I think I have enough info to do hdmi this way,
> but no MIPI. Well, my guess is that pixel data will still be pixel
> data etc., can't be that hard.
>
>

(DSI is Display Serial Interface, that is the version of MIPI that a
MIPI display would use)

I have used Lattice Crosslink FPGAs to do that sort of work. They are
small gate arrays designed for protocol conversion.

Richard Damon

unread,
Dec 29, 2022, 7:41:05 PM12/29/22
to
No, look at you DRAM timing again, the trasaction begins with the
sending of the address over typically two clock edges with RAS and CAS,
and then a couple of clock cycles and then you get back on the data bus
the answer.

Yes, the addresses come from an address bus, using address space out of
the processor, but it is a multi-cycle operation. Typically, you read
back a "burst" with some minimal caching on the processor side, but that
is more a minor detail.


> I'm looking for *that* sort of "direct access" in a camera.

Its been awhile, but I thought some CMOS cameras could work on a similar
basis, strobe a Row/Column address from pins on the camera, and a few
clock cycles later you got a burst out of the camera starting at the
address cell.

Don Y

unread,
Dec 29, 2022, 10:32:47 PM12/29/22
to
But it's a single memory reference. Look at what happens when you
deserialize a USB video stream into that same DRAM. The DMAC has
tied up the bus for the same amount of time that the processor
would have if it read those same N locations.

> Yes, the addresses come from an address bus, using address space out of the
> processor, but it is a multi-cycle operation. Typically, you read back a
> "burst" with some minimal caching on the processor side, but that is more a
> minor detail.
>
>> I'm looking for *that* sort of "direct access" in a camera.
>
> Its been awhile, but I thought some CMOS cameras could work on a similar basis,
> strobe a Row/Column address from pins on the camera, and a few clock cycles
> later you got a burst out of the camera starting at the address cell.

I don't want the camera to decide which pixels *it* thinks I want to see.
It sends me a burst of a row -- but the next part of the image I may have
wanted to access may have been down the same *column*. Or, in another
part of the image entirely.

Serial protocols inherently deliver data in a predefined pattern
(often intended for display). Scene analysis doesn't necessarily
conform to that same pattern.

E.g., if I've imposed a mask on the field to indicate portions that
are not important, then any bandwidth the camera spends delivering that
data to memory is wasted. If the memory was "just there", then there
would be no associated bandwidth impact.

Don Y

unread,
Dec 29, 2022, 10:38:08 PM12/29/22
to
I'd save one memory reference, per pixel, per frame; the data is "just
there" instead of having to be streamed in from a USB device and DMA'ed
into memory.

> But then you may
> run into problems with "doppler" kind of effects (clearly not Doppler
> but you get the idea) if you access the frame being acquired; so you'll
> want that double buffering you are talking about elsewhere (one frame
> being acquired and one having been acquired prior to that). Which would
> mean that somewhere something will have to do the copying you want to
> avoid...

No. The "deserializer" could (conceivably) just toggle two (or N) pointers
to "captured frame" and "frame being captured". You do this when synthesizing
video for similar reasons (if you update the area of the frame buffer that
is being painted to the visual display AS it is being painted, objects
that are "in motion" appear to "tear" (visual artifacts).

> Since you have already done it with USB cameras I think the practical
> way is to just keep doing it this way, may be not USB if you can
> find some more economic way to do it, MIPI or whatever.

That was the issue I was exploring. I want to see the sort of performance
and cost associated with different approaches.

USB (and some of the camera protocols) are supported on much silicon.
But, when you start wanting to run multiple cameras from the same
host.... <frown>


George Neuner

unread,
Dec 30, 2022, 12:30:01 AM12/30/22
to
Hi Don,
You aren't going to find anything low cost ... if you want bandwidth
for multiple cameras, you need to look into bus based frame grabbers.
They still exist, but are (relatively) expensive and getting harder to
find.

George

Don Y

unread,
Dec 30, 2022, 2:28:03 AM12/30/22
to
Hi George!

[Hope you are faring well... enjoying the COLD! ;) ]

On 12/29/2022 10:29 PM, George Neuner wrote:
>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>> nowadays.  Are there any (mainstream/high volume) devices that
>>>> "look" like a chunk of memory, in their native form?

>> I built my prototypes (proof-of-principle) using COTS USB cameras.
>> But, getting the data out of the serial data stream and into RAM so
>> it can be analyzed consumes memory bandwidth.
>>
>> I'm currently trying to sort out an approximate cost factor "per
>> camera" (per video stream) and looking for ways that I can cut costs
>> (memory bandwidth requirements) to allow greater numbers of
>> cameras or higher frame rates.
>
> You aren't going to find anything low cost ... if you want bandwidth
> for multiple cameras, you need to look into bus based frame grabbers.
> They still exist, but are (relatively) expensive and getting harder to
> find.

So, my options are:
- reduce the overall frame rate such that N cameras can
be serviced by the USB (or whatever) interface *and*
the processing load
- reduce the resolution of the cameras (a special case of the above)
- reduce the number of cameras "per processor" (again, above)
- design a "camera memory" (frame grabber) that I can install
multiply on a single host
- develop distributed algorithms to allow more bandwidth to
effectively be applied

Richard Damon

unread,
Dec 30, 2022, 11:24:19 AM12/30/22
to
The fact that you are starting for the concept of using "USB Cameras"
sort of starts you with that sort of limit.

My personal thought on your problem is you want to put a "cheap"
processor right on each camera using a processor with a direct camera
interface to pull in the image and do your processing and send the
results over some comm-link to the center core.

It is unclear what you actual image requirements per camera are, so it
is hard to say what level camera and processor you will need.

My first feeling is you seem to be assuming a fairly cheep camera and
then doing some fairly simple processing over the partial image, in
which case you might even be able to live with a camera that uses a
crude SPI interface to bring the frame in, and a very simple processor.

Don Y

unread,
Dec 30, 2022, 12:04:25 PM12/30/22
to
If I went the frame-grabber approach, that would be how I would address the
hardware. But, it doesn't scale well. I.e., at what point do you throw in
the towel and say there are too many concurrent images in the scene to
pile them all onto a single "host" processor?

ISTM that the better solution is to develop algorithms that can
process portions of the scene, concurrently, on different "hosts".
Then, coordinate these "partial results" to form the desired result.

I already have a "camera module" (host+USB camera) that has adequate
processing power to handle a "single camera scene". But, these all
assume the scene can be easily defined to fit in that camera's field
of view. E.g., point a camera across the path of a garage door and have
it "notice" any deviation from the "unobstructed" image.

When the scene gets too large to represent in enough detail in a single
camera's field of view, then there needs to be a way to coordinate
multiple cameras to a single (virtual?) host. If those cameras were just
"chunks of memory", then the *imagery* would be easy to examine in a single
host -- though the processing power *might* need to increase geometrically
(depending on your current goal)

Moving the processing to "host per camera" implementation gives you more
MIPS. But, makes coordinating partial results tedious.

> It is unclear what you actual image requirements per camera are, so it is hard
> to say what level camera and processor you will need.
>
> My first feeling is you seem to be assuming a fairly cheep camera and then
> doing some fairly simple processing over the partial image, in which case you
> might even be able to live with a camera that uses a crude SPI interface to
> bring the frame in, and a very simple processor.

I use A LOT of cameras. But, I should be able to swap the camera
(upgrade/downgrade) and still rely on the same *local* compute engine.
E.g., some of my cameras have Ir illuminators; it's not important
in others; some are PTZ; others fixed.

Watching for an obstruction in the path of a garage door (open/close)
has different requirements than trying to recognize a visitor at the front
door. Or, identify the locations of the occupants of a facility.

Richard Damon

unread,
Dec 30, 2022, 1:02:44 PM12/30/22
to
Thats why I didn't suggest that method. I was suggesting each camera has
its own tightly coupled processor that handles the need of THAT

>
> ISTM that the better solution is to develop algorithms that can
> process portions of the scene, concurrently, on different "hosts".
> Then, coordinate these "partial results" to form the desired result.
>
> I already have a "camera module" (host+USB camera) that has adequate
> processing power to handle a "single camera scene".  But, these all
> assume the scene can be easily defined to fit in that camera's field
> of view.  E.g., point a camera across the path of a garage door and have
> it "notice" any deviation from the "unobstructed" image.

And if one camera can't fit the full scene, you use two cameras, each
with there own processor, and they each process their own image.

The only problem is if your image processing algoritm need to compare
parts of the images between the two cameras, which seems unlikely.

It does say that if trying to track something across the cameras, you
need enough overlap to allow them to hand off the object when it is in
the overlap.
>
> When the scene gets too large to represent in enough detail in a single
> camera's field of view, then there needs to be a way to coordinate
> multiple cameras to a single (virtual?) host.  If those cameras were just
> "chunks of memory", then the *imagery* would be easy to examine in a single
> host -- though the processing power *might* need to increase geometrically
> (depending on your current goal)

Yes, but your "chunks of memory" model just doesn't exist as a viable
camera model.

The CMOS cameras with addressable pixels have "access times"
significantly lower than your typical memory (and is read once) so
doesn't really meet that model. Some of them do allow for sending
multiple small regions of intererst and down loading just those regions,
but this then starts to require moderate processor overhead to be
loading all these regions and updating the grabber to put them where you
want.

And yes, it does mean that there might be some cases where you need a
core module that has TWO cameras connected to a single processor, either
to get a wider field of view, or to combine two different types of
camera (maybe a high res black and white to a low res color if you need
just minor color information, or combine a visible camera to a thermal
camera). These just become another tool in your tool box.

>
> Moving the processing to "host per camera" implementation gives you more
> MIPS.  But, makes coordinating partial results tedious.

Depends on what sort of partial results you are looking at.

>
>> It is unclear what you actual image requirements per camera are, so it
>> is hard to say what level camera and processor you will need.
>>
>> My first feeling is you seem to be assuming a fairly cheep camera and
>> then doing some fairly simple processing over the partial image, in
>> which case you might even be able to live with a camera that uses a
>> crude SPI interface to bring the frame in, and a very simple processor.
>
> I use A LOT of cameras.  But, I should be able to swap the camera
> (upgrade/downgrade) and still rely on the same *local* compute engine.
> E.g., some of my cameras have Ir illuminators; it's not important
> in others; some are PTZ; others fixed.

Doesn't sound reasonable. If you downgrade a camera, you can't count on
it being able to meet the same requirements, or you over speced the
initial camera.

You put on a camera a processor capable of handling the tasks you expect
out of that set of hardware. One type of processor likely can handle a
variaty of different camera setup with

>
> Watching for an obstruction in the path of a garage door (open/close)
> has different requirements than trying to recognize a visitor at the front
> door.  Or, identify the locations of the occupants of a facility.
>

Yes, so you don't want to "Pay" for the capability to recognize a
visitor in your garage door sensor, so you use different levels of
sensor/processor.

Don Y

unread,
Dec 30, 2022, 4:59:54 PM12/30/22
to
On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>> So, my options are:
>>>> - reduce the overall frame rate such that N cameras can
>>>>    be serviced by the USB (or whatever) interface *and*
>>>>    the processing load
>>>> - reduce the resolution of the cameras (a special case of the above)
>>>> - reduce the number of cameras "per processor" (again, above)
>>>> - design a "camera memory" (frame grabber) that I can install
>>>>    multiply on a single host
>>>> - develop distributed algorithms to allow more bandwidth to
>>>>    effectively be applied
>>>
>>> The fact that you are starting for the concept of using "USB Cameras" sort
>>> of starts you with that sort of limit.
>>>
>>> My personal thought on your problem is you want to put a "cheap" processor
>>> right on each camera using a processor with a direct camera interface to
>>> pull in the image and do your processing and send the results over some
>>> comm-link to the center core.
>>
>> If I went the frame-grabber approach, that would be how I would address the
>> hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
>> the towel and say there are too many concurrent images in the scene to
>> pile them all onto a single "host" processor?
>
> Thats why I didn't suggest that method. I was suggesting each camera has its
> own tightly coupled processor that handles the need of THAT

My existing "module" handles a single USB camera (with a fairly heavy-weight
processor).

But, being USB-based, there is no way to look at *part* of an image.
And, I have to pay a relatively high cost (capturing the entire
image from the serial stream) to look at *any* part of it.

*If* a "camera memory" was available, I would site N of these
in the (64b) address space of the host and let the host pick
and choose which parts of which images it wanted to examine...
without worrying about all of the bandwidth that would have been
consumed deserializing those N images into that memory (which is
a continuous process)

>> ISTM that the better solution is to develop algorithms that can
>> process portions of the scene, concurrently, on different "hosts".
>> Then, coordinate these "partial results" to form the desired result.
>>
>> I already have a "camera module" (host+USB camera) that has adequate
>> processing power to handle a "single camera scene".  But, these all
>> assume the scene can be easily defined to fit in that camera's field
>> of view.  E.g., point a camera across the path of a garage door and have
>> it "notice" any deviation from the "unobstructed" image.
>
> And if one camera can't fit the full scene, you use two cameras, each with
> there own processor, and they each process their own image.

That's the above approach, but...

> The only problem is if your image processing algoritm need to compare parts of
> the images between the two cameras, which seems unlikely.

Consider watching a single room (e.g., a lobby at a business) and
tracking the movements of "visitors". It's unlikely that an individual's
movements would always be constrained to a single camera field. There will
be times when he/she is "half-in" a field (and possibly NOT in the other,
HALF in the other or ENTIRELY in the other). You can't ignore cases where
the entire object (or, your notion of what that object's characteristics
might be) is not entirely in the field as that leaves a vulnerability.

For example, I watch our garage door with *four* cameras. A camera is
positioned on each side ("door jam"?) of the door "looking at" the other
camera. This because a camera can't likely see the full height of the door
opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
and I'll watch *its* side!).

[The other two cameras are similarly positioned on the overhead *track*
onto which the door rolls, when open]

An object in (or near) the doorway can be visible in one (either) or
both cameras, depending on where it is located. Additionally, one of
those manifestations may be only "partial" as regards to where it is
located and intersects the cameras' fields of view.

The "cost" of watching the door is only the cost of the actual *cameras*.
The cost of the compute resources is amortized over the rest of the system
as those can be used for other, non-camera, non-garage related activities.

> It does say that if trying to track something across the cameras, you need
> enough overlap to allow them to hand off the object when it is in the overlap.

And, objects that consume large portions of a camera's field of view
require similar handling (unless you can always guarantee that cameras
and targets are "far apart")

>> When the scene gets too large to represent in enough detail in a single
>> camera's field of view, then there needs to be a way to coordinate
>> multiple cameras to a single (virtual?) host.  If those cameras were just
>> "chunks of memory", then the *imagery* would be easy to examine in a single
>> host -- though the processing power *might* need to increase geometrically
>> (depending on your current goal)
>
> Yes, but your "chunks of memory" model just doesn't exist as a viable camera
> model.

Apparently not -- in the COTS sense. But, that doesn't mean I can't
build a "camera memory emulator".

The downside is that this increases the cost of the "actual camera"
(see my above comment wrt ammortization).

And, it just moves the point at which a single host (of fixed capabilities)
can no longer handle the scene's complexity. (when you have 10 cameras?)

> The CMOS cameras with addressable pixels have "access times" significantly
> lower than your typical memory (and is read once) so doesn't really meet that
> model. Some of them do allow for sending multiple small regions of intererst
> and down loading just those regions, but this then starts to require moderate
> processor overhead to be loading all these regions and updating the grabber to
> put them where you want.

You would, instead, let the "camera memory emulator" capture the entire
image from the camera and place the entire image in a contiguous
region of memory (from the perspective of the host). The cost of capturing
the portions that are not used is hidden *in* the cost of the "emulator".

> And yes, it does mean that there might be some cases where you need a core
> module that has TWO cameras connected to a single processor, either to get a
> wider field of view, or to combine two different types of camera (maybe a high
> res black and white to a low res color if you need just minor color
> information, or combine a visible camera to a thermal camera). These just
> become another tool in your tool box.

I *think* (uncharted territory) that the better investment is to develop
algorithms that let me distribute the processing among multiple
(single) "camera modules/nodes". How would your "two camera" exemplar
address an application requiring *three* cameras? etc.

I can, currently, distribute this processing by treating the
region of memory into which a (local) camera's imagery is
deserialized as a "memory object" and then exporting *access*
to that object to other similar "camera modules/nodes".

But, the access times of non-local memory are horrendous, given
that the contents are ephemeral (if accesses could be *cached*
on each host needing them, then these costs diminish).

So, I need to come up with algorithms that let me export abstractions
instead of raw data.

>> Moving the processing to "host per camera" implementation gives you more
>> MIPS.  But, makes coordinating partial results tedious.
>
> Depends on what sort of partial results you are looking at.

"Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"

"Ah! I was wondering whose legs those were in *my* image!"

>>> It is unclear what you actual image requirements per camera are, so it is
>>> hard to say what level camera and processor you will need.
>>>
>>> My first feeling is you seem to be assuming a fairly cheep camera and then
>>> doing some fairly simple processing over the partial image, in which case
>>> you might even be able to live with a camera that uses a crude SPI interface
>>> to bring the frame in, and a very simple processor.
>>
>> I use A LOT of cameras.  But, I should be able to swap the camera
>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>> E.g., some of my cameras have Ir illuminators; it's not important
>> in others; some are PTZ; others fixed.
>
> Doesn't sound reasonable. If you downgrade a camera, you can't count on it
> being able to meet the same requirements, or you over speced the initial camera.

Sorry, I was using up/down relative to "nominal camera", not "specific camera
previously selected for application". I'd 8really* like to just have a
single "camera module" (module = CPU+I/O) instead of one for camera type A
and another for camera type B, etc.

> You put on a camera a processor capable of handling the tasks you expect out of
> that set of hardware.  One type of processor likely can handle a variaty of
> different camera setup with

Exactly. If a particular instance has an Ir illuminator, then you include
controls for that in *the* "camera module". If another instance doesn't have
this ability, then those controls go unused.

>> Watching for an obstruction in the path of a garage door (open/close)
>> has different requirements than trying to recognize a visitor at the front
>> door.  Or, identify the locations of the occupants of a facility.
>
> Yes, so you don't want to "Pay" for the capability to recognize a visitor in
> your garage door sensor, so you use different levels of sensor/processor.

Exactly. But, the algorithms that do the scene analysis can be the same;
you just parameterize the image and the objects within it that you seek.

There will likely be some combinations that exceed the capabilities of
the hardware to process in real-time. So, you fall back to lower
frame rates or let the algorithms drop targets ("You watch Bob, I'll
watch Tom!")


Richard Damon

unread,
Dec 31, 2022, 12:39:23 AM12/31/22
to
Yep, having chosen USB as your interface, you have limited yourself.

Since you say you have a fairly heavy-weight processor, that frame grab
likely isn't you limiting factor.

>
> *If* a "camera memory" was available, I would site N of these
> in the (64b) address space of the host and let the host pick
> and choose which parts of which images it wanted to examine...
> without worrying about all of the bandwidth that would have been
> consumed deserializing those N images into that memory (which is
> a continuous process)

But such a camera would almost certainly be designed for the processor
to be on the same board as the camera, (or be VERY slow in access), so
much less apt allow you to add multiple cameras to one processor.

>
>>> ISTM that the better solution is to develop algorithms that can
>>> process portions of the scene, concurrently, on different "hosts".
>>> Then, coordinate these "partial results" to form the desired result.
>>>
>>> I already have a "camera module" (host+USB camera) that has adequate
>>> processing power to handle a "single camera scene".  But, these all
>>> assume the scene can be easily defined to fit in that camera's field
>>> of view.  E.g., point a camera across the path of a garage door and have
>>> it "notice" any deviation from the "unobstructed" image.
>>
>> And if one camera can't fit the full scene, you use two cameras, each
>> with there own processor, and they each process their own image.
>
> That's the above approach, but...
>
>> The only problem is if your image processing algoritm need to compare
>> parts of the images between the two cameras, which seems unlikely.
>
> Consider watching a single room (e.g., a lobby at a business) and
> tracking the movements of "visitors".  It's unlikely that an individual's
> movements would always be constrained to a single camera field.  There will
> be times when he/she is "half-in" a field (and possibly NOT in the other,
> HALF in the other or ENTIRELY in the other).  You can't ignore cases where
> the entire object (or, your notion of what that object's characteristics
> might be) is not entirely in the field as that leaves a vulnerability.

Sounds like you aren't overlapping your cameras enough or have
insufficent coverage. Maybe your problem is wrong field of view for
your lens. Maybe you need fewer but better cameras with wider fields of
view.

This might be due to try to use "stock" inexpensive USB cameras.
>
> For example, I watch our garage door with *four* cameras.  A camera is
> positioned on each side ("door jam"?) of the door "looking at" the other
> camera.  This because a camera can't likely see the full height of the door
> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
> and I'll watch *its* side!).

Right, and if ANY see a problem, you stop. So no need for inter-camera
coordination.
>
> [The other two cameras are similarly positioned on the overhead *track*
> onto which the door rolls, when open]
>
> An object in (or near) the doorway can be visible in one (either) or
> both cameras, depending on where it is located.  Additionally, one of
> those manifestations may be only "partial" as regards to where it is
> located and intersects the cameras' fields of view.

But since you aren't trying to ID, only Detect, there still isn't a need
for camera-camera processing, just camera-door controller

>
> The "cost" of watching the door is only the cost of the actual *cameras*.
> The cost of the compute resources is amortized over the rest of the system
> as those can be used for other, non-camera, non-garage related activities.
>
>> It does say that if trying to track something across the cameras, you
>> need enough overlap to allow them to hand off the object when it is in
>> the overlap.
>
> And, objects that consume large portions of a camera's field of view
> require similar handling (unless you can always guarantee that cameras
> and targets are "far apart")
>
>>> When the scene gets too large to represent in enough detail in a single
>>> camera's field of view, then there needs to be a way to coordinate
>>> multiple cameras to a single (virtual?) host.  If those cameras were
>>> just
>>> "chunks of memory", then the *imagery* would be easy to examine in a
>>> single
>>> host -- though the processing power *might* need to increase
>>> geometrically
>>> (depending on your current goal)
>>
>> Yes, but your "chunks of memory" model just doesn't exist as a viable
>> camera model.
>
> Apparently not -- in the COTS sense.  But, that doesn't mean I can't
> build a "camera memory emulator".
>
> The downside is that this increases the cost of the "actual camera"
> (see my above comment wrt ammortization).

Yep, implementing this likely costs more than giving the camera a
dedicated moderate processor to do the major work. Might not handle the
actual ID problem of your Door bell, but could likely process the live
video, take a snapshot of a region with a good view of the vistor
coming, and send just that to your master system for ID.

>
> And, it just moves the point at which a single host (of fixed capabilities)
> can no longer handle the scene's complexity.  (when you have 10 cameras?)
>
>> The CMOS cameras with addressable pixels have "access times"
>> significantly lower than your typical memory (and is read once) so
>> doesn't really meet that model. Some of them do allow for sending
>> multiple small regions of intererst and down loading just those
>> regions, but this then starts to require moderate processor overhead
>> to be loading all these regions and updating the grabber to put them
>> where you want.
>
> You would, instead, let the "camera memory emulator" capture the entire
> image from the camera and place the entire image in a contiguous
> region of memory (from the perspective of the host).  The cost of capturing
> the portions that are not used is hidden *in* the cost of the "emulator".

Yep, you could build you system with a two-port memory buffer between
the frane grabber loading with one port, and the decoding processor on
the other.

The most cost effective way to do this is likely a commercial
frame-grabber with built "two-port" memory, that sits in a slot of a PC
type computer. These would likely not work with a "USB Camera" (why
would you need a frame grabber with a camera that has it built in) so
would be totally changing your cost models.

IF your current design method is based on using USB cameras, trying to
do a full custom interface may be out of your field of operation.

>
>> And yes, it does mean that there might be some cases where you need a
>> core module that has TWO cameras connected to a single processor,
>> either to get a wider field of view, or to combine two different types
>> of camera (maybe a high res black and white to a low res color if you
>> need just minor color information, or combine a visible camera to a
>> thermal camera). These just become another tool in your tool box.
>
> I *think* (uncharted territory) that the better investment is to develop
> algorithms that let me distribute the processing among multiple
> (single) "camera modules/nodes".  How would your "two camera" exemplar
> address an application requiring *three* cameras?  etc.

The first question comes, what processing are you thinking of that needs
images from 3 cameras.

Note, my two camera example was a case where the processing needed to be
done did need data from two cameras.

If you have another task that needs a different camera, you just build a
system with one two camera model and one 1 camera module, relaying back
to a central control, or you nominate one of the modules to be central
control if the load there is light enough.

Your garage doer example would be built from 4 seperate and independent
1 camera modules, either going to one as the master, or to a 5th module
acting as the master.

The cases I can think of for needing to process three cameras together
would be:

1) a system stiching images from 3 cameras and generating a single image
out of it, but that totally breaks your concept of needing only bits of
the images, that inherently is using most of each camera, and doing some
stiching processing on the overlaps.

2) A Multi-spectrum system, where again, you are taking the ENTIRE scene
from the three cameras and producing a merged "false-color" image from
them. Again, this also breaks you partial image model.

>
> I can, currently, distribute this processing by treating the
> region of memory into which a (local) camera's imagery is
> deserialized as a "memory object" and then exporting *access*
> to that object to other similar "camera modules/nodes".
>
> But, the access times of non-local memory are horrendous, given
> that the contents are ephemeral (if accesses could be *cached*
> on each host needing them, then these costs diminish).
>
> So, I need to come up with algorithms that let me export abstractions
> instead of raw data.

Sounds like you current design is very centralized. This limits its
scalability,

>
>>> Moving the processing to "host per camera" implementation gives you more
>>> MIPS.  But, makes coordinating partial results tedious.
>>
>> Depends on what sort of partial results you are looking at.
>
> "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"
>
> "Ah!  I was wondering whose legs those were in *my* image!"
>
>>>> It is unclear what you actual image requirements per camera are, so
>>>> it is hard to say what level camera and processor you will need.
>>>>
>>>> My first feeling is you seem to be assuming a fairly cheep camera
>>>> and then doing some fairly simple processing over the partial image,
>>>> in which case you might even be able to live with a camera that uses
>>>> a crude SPI interface to bring the frame in, and a very simple
>>>> processor.
>>>
>>> I use A LOT of cameras.  But, I should be able to swap the camera
>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>> E.g., some of my cameras have Ir illuminators; it's not important
>>> in others; some are PTZ; others fixed.
>>
>> Doesn't sound reasonable. If you downgrade a camera, you can't count
>> on it being able to meet the same requirements, or you over speced the
>> initial camera.
>
> Sorry, I was using up/down relative to "nominal camera", not "specific
> camera
> previously selected for application".  I'd 8really* like to just have a
> single "camera module" (module = CPU+I/O) instead of one for camera type A
> and another for camera type B, etc.
>

That only works if you are willing to spend for the sports car, even if
you just need it to go around the block.

It depends a bit on how much span you need of capability. A $10 camera
is likely having a very different interface to a $30,000 camera, so will
need a different board. Some boards might handle multiple camera
interface types if it doesn't add a lot to the board, but you are apt to
find that you need to make some choice.

Then some tasks will just need a lot more computer power than others.
Yes, you can just put too much computer power on the simple tasks, (and
that might make sense to early design the higher end processor), but
ultimately you are going to want the less expensive lower end processors.

>> You put on a camera a processor capable of handling the tasks you
>> expect out of that set of hardware.  One type of processor likely can
>> handle a variaty of different camera setup with
>
> Exactly.  If a particular instance has an Ir illuminator, then you include
> controls for that in *the* "camera module".  If another instance doesn't
> have
> this ability, then those controls go unused.

Yes, Auxilary functionality is often cheap to include the hooks for.

>
>>> Watching for an obstruction in the path of a garage door (open/close)
>>> has different requirements than trying to recognize a visitor at the
>>> front
>>> door.  Or, identify the locations of the occupants of a facility.
>>
>> Yes, so you don't want to "Pay" for the capability to recognize a
>> visitor in your garage door sensor, so you use different levels of
>> sensor/processor.
>
> Exactly.  But, the algorithms that do the scene analysis can be the same;
> you just parameterize the image and the objects within it that you seek.

Actually, "Tracking" can be a very different type of algorithm then
"Detecting". You might be able to use a Tracking base algorithm to
Detect, but likely a much simpler algorithm can be used (needing less
resources) to just detect.

Don Y

unread,
Dec 31, 2022, 3:27:26 AM12/31/22
to
Doesn't matter. Any serial interface poses the same problem;
I can't examine the image until I can *look* at it.

> Since you say you have a fairly heavy-weight processor, that frame grab likely
> isn't you limiting factor.

It becomes an issue when the number of cameras increases
significantly on a single host. I have one scene that requires
11 cameras to capture, completely.

>> *If* a "camera memory" was available, I would site N of these
>> in the (64b) address space of the host and let the host pick
>> and choose which parts of which images it wanted to examine...
>> without worrying about all of the bandwidth that would have been
>> consumed deserializing those N images into that memory (which is
>> a continuous process)
>
> But such a camera would almost certainly be designed for the processor to be on
> the same board as the camera, (or be VERY slow in access), so much less apt
> allow you to add multiple cameras to one processor.

Yes. But, if the module is small, then siting the assembly "someplace
convenient" isn't a big issue. I.e., my modules are smaller than most
webcams/dashcams.

>>>> ISTM that the better solution is to develop algorithms that can
>>>> process portions of the scene, concurrently, on different "hosts".
>>>> Then, coordinate these "partial results" to form the desired result.
>>>>
>>>> I already have a "camera module" (host+USB camera) that has adequate
>>>> processing power to handle a "single camera scene".  But, these all
>>>> assume the scene can be easily defined to fit in that camera's field
>>>> of view.  E.g., point a camera across the path of a garage door and have
>>>> it "notice" any deviation from the "unobstructed" image.
>>>
>>> And if one camera can't fit the full scene, you use two cameras, each with
>>> there own processor, and they each process their own image.
>>
>> That's the above approach, but...
>>
>>> The only problem is if your image processing algoritm need to compare parts
>>> of the images between the two cameras, which seems unlikely.
>>
>> Consider watching a single room (e.g., a lobby at a business) and
>> tracking the movements of "visitors".  It's unlikely that an individual's
>> movements would always be constrained to a single camera field.  There will
>> be times when he/she is "half-in" a field (and possibly NOT in the other,
>> HALF in the other or ENTIRELY in the other).  You can't ignore cases where
>> the entire object (or, your notion of what that object's characteristics
>> might be) is not entirely in the field as that leaves a vulnerability.
>
> Sounds like you aren't overlapping your cameras enough or have insufficent
> coverage.  Maybe your problem is wrong field of view for your lens. Maybe you
> need fewer but better cameras with wider fields of view.

Distance from camera to target means you have to play games with optics
that can distort images.

I also can't rely on "professional installers" *or* for the cameras to remain
aimed in their original configurations.

> This might be due to try to use "stock" inexpensive USB cameras.
>
>> For example, I watch our garage door with *four* cameras.  A camera is
>> positioned on each side ("door jam"?) of the door "looking at" the other
>> camera.  This because a camera can't likely see the full height of the door
>> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
>> and I'll watch *its* side!).
>
> Right, and if ANY see a problem, you stop. So no need for inter-camera
> coordination.

But you don't know there is a problem until you can identify *where*
the obstruction exists and if that poses a problem for the vehicle
or the "obstructing item". Doing so requires knowing what the
object likely is.

E.g., SWMBO frequently stands in the doorway as I pull the car in or
out (not enough room between vehicles *in* the garage to allow for
ease of entry/egress). I'd not want this to be flagged as a
problem (signalling an alert in the vehicle).

Likewise, an obstruction on one vehicle-side of the garage shouldn't
interfere with access to the other side.

>> [The other two cameras are similarly positioned on the overhead *track*
>> onto which the door rolls, when open]
>>
>> An object in (or near) the doorway can be visible in one (either) or
>> both cameras, depending on where it is located.  Additionally, one of
>> those manifestations may be only "partial" as regards to where it is
>> located and intersects the cameras' fields of view.
>
> But since you aren't trying to ID, only Detect, there still isn't a need for
> camera-camera processing, just camera-door controller

The cameras need to coordinate to resolve the location of the object.
A "toy wagon" would present differently, visually, than a tall person.

>>>> When the scene gets too large to represent in enough detail in a single
>>>> camera's field of view, then there needs to be a way to coordinate
>>>> multiple cameras to a single (virtual?) host.  If those cameras were just
>>>> "chunks of memory", then the *imagery* would be easy to examine in a single
>>>> host -- though the processing power *might* need to increase geometrically
>>>> (depending on your current goal)
>>>
>>> Yes, but your "chunks of memory" model just doesn't exist as a viable camera
>>> model.
>>
>> Apparently not -- in the COTS sense.  But, that doesn't mean I can't
>> build a "camera memory emulator".
>>
>> The downside is that this increases the cost of the "actual camera"
>> (see my above comment wrt ammortization).
>
> Yep, implementing this likely costs more than giving the camera a dedicated
> moderate processor to do the major work. Might not handle the actual ID problem
> of your Door bell, but could likely process the live video, take a snapshot of
> a region with a good view of the vistor coming, and send just that to your
> master system for ID.

But, then I could just use one of my existing "modules". If the
target fits entirely within its field of view, then it has everything
that it needs for the assigned functionality. If not, then it
needs to consult with other cameras.

>>> The CMOS cameras with addressable pixels have "access times" significantly
>>> lower than your typical memory (and is read once) so doesn't really meet
>>> that model. Some of them do allow for sending multiple small regions of
>>> intererst and down loading just those regions, but this then starts to
>>> require moderate processor overhead to be loading all these regions and
>>> updating the grabber to put them where you want.
>>
>> You would, instead, let the "camera memory emulator" capture the entire
>> image from the camera and place the entire image in a contiguous
>> region of memory (from the perspective of the host).  The cost of capturing
>> the portions that are not used is hidden *in* the cost of the "emulator".
>
> Yep, you could build you system with a two-port memory buffer between the frane
> grabber loading with one port, and the decoding processor on the other.

Yes. But large *true* dual-port memories are costly. Instead, you would
emulate such a device either by time-division multiplexing a single
physical memory *or* sharing alternate memories (fill one, view the other).

> The most cost effective way to do this is likely a commercial frame-grabber
> with built "two-port" memory, that sits in a slot of a PC type computer. These
> would likely not work with a "USB Camera" (why would you need a frame grabber
> with a camera that has it built in) so would be totally changing your cost models.

Yes, I have a few of these intended for medical imaging apps.
Way too big; way too expensive. Designed for the wrong type of "host"

> IF your current design method is based on using USB cameras, trying to do a
> full custom interface may be out of your field of operation.
>
>>> And yes, it does mean that there might be some cases where you need a core
>>> module that has TWO cameras connected to a single processor, either to get a
>>> wider field of view, or to combine two different types of camera (maybe a
>>> high res black and white to a low res color if you need just minor color
>>> information, or combine a visible camera to a thermal camera). These just
>>> become another tool in your tool box.
>>
>> I *think* (uncharted territory) that the better investment is to develop
>> algorithms that let me distribute the processing among multiple
>> (single) "camera modules/nodes".  How would your "two camera" exemplar
>> address an application requiring *three* cameras?  etc.
>
> The first question comes, what processing are you thinking of that needs images
> from 3 cameras.
>
> Note, my two camera example was a case where the processing needed to be done
> did need data from two cameras.
>
> If you have another task that needs a different camera, you just build a system
> with one two camera model and one 1 camera module, relaying back to a central
> control, or you nominate one of the modules to be central control if the load
> there is light enough.
>
> Your garage doer example would be built from 4 seperate and independent 1
> camera modules, either going to one as the master, or to a 5th module acting as
> the master.

Yes, but they have to share image data (either raw or abstracted)
to make deductions about the targets present.

> The cases I can think of for needing to process three cameras together would be:
>
> 1) a system stiching images from 3 cameras and generating a single image out of
> it, but that totally breaks your concept of needing only bits of the images,
> that inherently is using most of each camera, and doing some stiching
> processing on the overlaps.
>
> 2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from
> the three cameras and producing a merged "false-color" image from them. Again,
> this also breaks you partial image model.

Or, tracking multiple actors in an "arena" -- visitors in a business,
occupants in a home, etc. In much the same way that the two garage
door cameras conspire to locate the obstruction's position along the
line from left doorjam to right, pairs of cameras can resolve
a target in an arena and *sets* of cameras (freely paired, as needed)
can track all locations (and targets) in the arena.

>> I can, currently, distribute this processing by treating the
>> region of memory into which a (local) camera's imagery is
>> deserialized as a "memory object" and then exporting *access*
>> to that object to other similar "camera modules/nodes".
>>
>> But, the access times of non-local memory are horrendous, given
>> that the contents are ephemeral (if accesses could be *cached*
>> on each host needing them, then these costs diminish).
>>
>> So, I need to come up with algorithms that let me export abstractions
>> instead of raw data.
>
> Sounds like you current design is very centralized. This limits its scalability,

The current design is completely distributed. The only "shared component"
is the network switch through which they converse and the RDBMS that acts
as the persistent store.

If a site realizes that it needs additional coverage to track <whatever>
it just adds another camera module and lets the RDBMS know about it's general
location/functionality (i.e., how it can relate to any other cameras
covering the same arena)

>>>>> My first feeling is you seem to be assuming a fairly cheep camera and then
>>>>> doing some fairly simple processing over the partial image, in which case
>>>>> you might even be able to live with a camera that uses a crude SPI
>>>>> interface to bring the frame in, and a very simple processor.
>>>>
>>>> I use A LOT of cameras.  But, I should be able to swap the camera
>>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>>> E.g., some of my cameras have Ir illuminators; it's not important
>>>> in others; some are PTZ; others fixed.
>>>
>>> Doesn't sound reasonable. If you downgrade a camera, you can't count on it
>>> being able to meet the same requirements, or you over speced the initial
>>> camera.
>>
>> Sorry, I was using up/down relative to "nominal camera", not "specific camera
>> previously selected for application".  I'd 8really* like to just have a
>> single "camera module" (module = CPU+I/O) instead of one for camera type A
>> and another for camera type B, etc.
>
> That only works if you are willing to spend for the sports car, even if you
> just need it to go around the block.

If the "extra" bits of the sports car can be used by other elements,
then those costs aren't directly borne by the camera module, itself.
E.g., when the garage door is closed, there's no reason the modules
in the garage can't be busy training speech models or removing
commercials from recorded broadcast content.

If, OTOH, you detect objects with a photo-interrupter across the door's
path, there's scant little it can do when not needed.

> It depends a bit on how much span you need of capability. A $10 camera is
> likely having a very different interface to a $30,000 camera, so will need a
> different board. Some boards might handle multiple camera interface types if it
> doesn't add a lot to the board, but you are apt to find that you need to make
> some choice.

I don't ever see a need for a $30,000 camera. There may be a need for a
PTZ model. Or, a low lux model. Or, one with longer focal length. Or,
shorter (I'd considered putting one *in* the mailbox to examine its
contents instead of just detecting that it had been "visited").

Instead of a 4K device, I'd opt for multiple simpler devices better
positioned.

But, not radically different in terms of cost, size, etc.

If you walk into a bank lobby, you don't see *one* super-high resolution,
wide field camera surveilling the lobby but, rather half a dozen or more
watching specific portions of the lobby. Similarly, if you use the
self-check at the store, there is a camera per checkout station instead
of one "really good" camera located centrally trying to take it all in.

This gives installers more leeway in terms of how they cover an arena.

> Then some tasks will just need a lot more computer power than others. Yes, you
> can just put too much computer power on the simple tasks, (and that might make
> sense to early design the higher end processor), but ultimately you are going
> to want the less expensive lower end processors.

I can call on surplus processing power from other nodes in the system
in much the same way that they can call on surplus capabilities from
a camera module that isn't "seeing" anything interesting, at the moment.

There will always be limits on what can be done; I'm not going
to be able to VISUALLY verify that you have the right wrench in
your hand as you set about working on the car. Or, that you
are holding an eating utensil instead of a random piece of
plastic as you traverse the kitchen.

But, I'll know YOU are in the kitchen and likely the person whose
voice I hear (to further reinforce the speaker identification
algorithms).

>>> You put on a camera a processor capable of handling the tasks you expect out
>>> of that set of hardware.  One type of processor likely can handle a variaty
>>> of different camera setup with
>>
>> Exactly.  If a particular instance has an Ir illuminator, then you include
>> controls for that in *the* "camera module".  If another instance doesn't have
>> this ability, then those controls go unused.
>
> Yes, Auxilary functionality is often cheap to include the hooks for.

But, it often requires looking at your TOTAL needs instead of designing
for specific (initial) needs. E.g., my camera modules now include
audio capabilities as there are instances where I want an audio
pickup in the same arena that I am monitoring. Silly to have to add
an "audio module" just because I didn't have the foresight to
include it with the camera!

>>>> Watching for an obstruction in the path of a garage door (open/close)
>>>> has different requirements than trying to recognize a visitor at the front
>>>> door.  Or, identify the locations of the occupants of a facility.
>>>
>>> Yes, so you don't want to "Pay" for the capability to recognize a visitor in
>>> your garage door sensor, so you use different levels of sensor/processor.
>>
>> Exactly.  But, the algorithms that do the scene analysis can be the same;
>> you just parameterize the image and the objects within it that you seek.
>
> Actually, "Tracking" can be a very different type of algorithm then
> "Detecting". You might be able to use a Tracking base algorithm to Detect, but
> likely a much simpler algorithm can be used (needing less resources) to just
> detect.

My current detection algorithm (e.g., garage) just looks for deltas between
"clear" and "obstructed" imagery, conditioned by masks. There is some
image processing required as things look different at night vs. day, etc.

I don't have to "get it right". All I have to do is demonstrate "proof of
concept". And, be able to indicate why a particular approach is superior
to others/existing ones.

E.g., if you drive a "pickup-on-steroids", you'd need to locate a
photointerrupter "obstruction detector" pretty high up off the ground
to catch the case where the truck bed was in the way of the door.
Or, some lumber overhanging the end of the bed that you forgot you'd
brought home! And, you'd likely need *another* detector down low
to catch toddlers or toy wagons in the path of the door.

OTOH, doing the detection with a camera catches these use conditions
in addition to the "nominal" one for which the photointerrupter was
designed.

Tracking two/four occupants of a home *suggests* that you can track
6 or 8. Or, dozens of employees in a business conference room, etc.

I have no desire to spend my time perfecting any of these
technologies (I have other goals); just lay the groundwork and the
framework to make them possible.

Dimiter_Popoff

unread,
Dec 31, 2022, 6:15:54 AM12/31/22
to
Isn't there a camera doing a protocol which allows you to request
a specific area only to be transferred? RFB like, VNC does that
all the time.

Don Y

unread,
Dec 31, 2022, 1:17:00 PM12/31/22
to
On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
>> Serial protocols inherently deliver data in a predefined pattern
>> (often intended for display).  Scene analysis doesn't necessarily
>> conform to that same pattern.
>
> Isn't there a camera doing a protocol which allows you to request
> a specific area only to be transferred? RFB like, VNC does that
> all the time.

That only makes sense if you know, a priori, which part(s) of the
image you might want to examine. E.g., it would work for
"exposing" just the portion of the field that "overlaps" some
other image. I can get fixed parts of partial frames from
*other* cameras just by ensuring the other camera puts that
portion of the image in a particular memory object and then
export that memory object to the node that wants it.

But, if a target can move into or out of the exposed area, then
you have to make a return trip to the camera to request MORE of
the field.

When your targets are "far away" (like a surveillance camera
monitoring a parking lot), targets don't move from their
previous noted positions considerably from one frame to the
next.

But, when the camera and targets are in close proximity,
there's greater (apparent) relative motion in the same
frame-interval. So, knowing where (x,y+WxH)) the portion of
the image of interest lay, previously, is less predictive
of where it may lie currently.

Having the entire image available means the software
can look <wherever> and <whenever>.

Dimiter_Popoff

unread,
Dec 31, 2022, 3:13:21 PM12/31/22
to
Well yes, obviously so, but this is valid whatever the interface.
Direct access to the sensor cells can't be double buffered so
you will have to transfer anyway to get the frame you are analyzing
static.
Perhaps you could find a way to make yourself some camera module
using an existing one, MIPI or even USB, since you are looking for low
overall cost; and add some MCU board to it to do the buffering and
transfer areas on request. Or may be put enough CPU power together with
each camera to do most if not all of the analysis... Depending on
which achieves the lowest cost. But I can't say much on cost, that's
pretty far from me (as you know).

Don Y

unread,
Dec 31, 2022, 4:29:44 PM12/31/22
to
I would assume the devices would have evolved an "internal buffer"
(as I said, my experience with *DRAM* in this manner was 40+ years
ago)

> Perhaps you could find a way to make yourself some camera module
> using an existing one, MIPI or even USB, since you are looking for low
> overall cost; and add some MCU board to it to do the buffering and
> transfer areas on request. Or may be put enough CPU power together with
> each camera to do most if not all of the analysis... Depending on
> which achieves the lowest cost. But I can't say much on cost, that's
> pretty far from me (as you know).

My current approach gives me that -- MIPS, size, etc. But, the cost
of transferring parts of the image (without adding a specific mechanism)
is a "shared page" (DSM). So, host (on node A) references part of
node *B*s frame buffer and the page (on B) containing that memory
address gets shipped back to node A and mapped into A's memory.

An agency on A could "touch" a "pixel-per-block" and cause the entire
frame to be transferred to A, from B (or, I can treat the entire frame
as a coherent object and arrange for ALL of it to be transferred when
ANY of it is referenced). Some process on B could alternate between
multiple such "memory objects" ("this one is complete, but I'm busy
filling this OTHER one with data from the camera interface")
to give me a *virtual* "camera memory device".

But, transport delays make this unsuitable for real-time work;
a megabyte of imagery would require 100ms to transfer, in "raw"
form. (I could encode it on the originating host; transfer it
and then decode it on the receiving host -- at the expense of MIPS.
This is how I "record" video without saturating the network)

So, you (B) want to "abstract" the salient features of the image
while it is on B and then transfer just those to A. *Use*
them, on A, and then move on to the next set of features
(that B has computed while A was busy chewing on the last set)

Or, give A direct access to the native data (without A having
to capture video streams from each of the cameras that it wants
to potentially examine)

George Neuner

unread,
Dec 31, 2022, 5:40:15 PM12/31/22
to

> [Hope you are faring well... enjoying the COLD! ;) ]

Not very. Don't think I have your latest email.
That's the way all cameras work - at least low level. The camera
captures a field (or a frame, depending) on its CCD, and then the CCD
pixel data is read out serially by a controller.

What you are looking for is some kind of local frame buffering at the
camera. There are some "smart" cameras that provide that ... and also
generally a bunch of image analysis functions that you may or may not
find useful. I haven't played with any of them in a long time, and
when I did the image functions were too primitive for my purpose, so I
really can't recommend anything.


>>> ISTM that the better solution is to develop algorithms that can
>>> process portions of the scene, concurrently, on different "hosts".
>>> Then, coordinate these "partial results" to form the desired result.
>>>
>>> I already have a "camera module" (host+USB camera) that has adequate
>>> processing power to handle a "single camera scene".  But, these all
>>> assume the scene can be easily defined to fit in that camera's field
>>> of view.  E.g., point a camera across the path of a garage door and have
>>> it "notice" any deviation from the "unobstructed" image.
>>
>> And if one camera can't fit the full scene, you use two cameras, each with
>> there own processor, and they each process their own image.
>
>That's the above approach, but...
>
>> The only problem is if your image processing algoritm need to compare parts of
>> the images between the two cameras, which seems unlikely.
>
>Consider watching a single room (e.g., a lobby at a business) and
>tracking the movements of "visitors". It's unlikely that an individual's
>movements would always be constrained to a single camera field. There will
>be times when he/she is "half-in" a field (and possibly NOT in the other,
>HALF in the other or ENTIRELY in the other). You can't ignore cases where
>the entire object (or, your notion of what that object's characteristics
>might be) is not entirely in the field as that leaves a vulnerability.

I've done simple cases following objects from one camera to another,
but not dealing with different angles/points of view - the cameras had
contiguous views with a bit of overlap. That made it relatively easy.

Following a person, e.g., seen quarter-behind in one camera, and
tracking them to another camera that sees a side view - from the
/other/ side -

Just following a person is easy, but tracking a specific person,
particularly when multiple people are present, gets very complicated
very quickly.

Don Y

unread,
Dec 31, 2022, 8:35:25 PM12/31/22
to
On 12/31/2022 3:40 PM, George Neuner wrote:
>
>> [Hope you are faring well... enjoying the COLD! ;) ]
>
> Not very. Don't think I have your latest email.

Hmmm... I wondered why I hadn't heard from you! (I trashed a bunch
of email aliases trying to shake off spammers -- you know, the
businesses that "need" your email address in order for you to
place an order... and then feel like you will be DELIGHTED to
receive an ongoing stream of solicitations! The problem with
aliases is that you can't "undelete" them -- they get permanently
excised from the mail domain's name space, for obvious reasons!)
Exactly. And, bring that buffer out to a set of pins for random
access -- like a DRAM (memory). In that way, I could explore whatever
parts of the image I deemed necessary -- without paying a price
(bandwidth) to pull the image data into "my" memory.
Yes. Each camera needs to grok the physical space in order to
understand "references" provided by another observer into that
space.

For the garage door cameras, it's relatively simple: you're
looking at a very narrow strip of 2-space (the plane of the
door) from opposing ends. You *know* that the door opening
has the same physical dimensions as seen on each door jam
by the opposing cameras, even if it "appears differently" to
the two observers. And, you know that anything seen by a
camera is located between that camera and its counterpart
(though it may not be visible to the counterpart).

What you don't know is how "thick" (along the vision axis)
the object might be (e.g., a person vs. a vehicle). But, I
don't see that knowledge adding much value to warrant further
complicating the design.

> Following a person, e.g., seen quarter-behind in one camera, and
> tracking them to another camera that sees a side view - from the
> /other/ side -
>
> Just following a person is easy, but tracking a specific person,
> particularly when multiple people are present, gets very complicated
> very quickly.

Yes. You need enough detail to be able to distinguish *easily*
between candidates. You're not just "counting bodies".

In the *home* environment, the actors are likely not malevolent;
it's in their best interest for the system to know who/where they
are. But, I don't think that's necessarily true in commercial and
industrial environments. Even though it is similarly in THEIR best
interests, I think the actors, there, are more likely to express
hostility towards their overlords in that way.



Dimiter_Popoff

unread,
Jan 1, 2023, 9:04:56 AM1/1/23
to
I assume A and B are connected over Ethernet via tcp/ip? Or are they
just two cores on the same chip or something?

>
> ....
>
> But, transport delays make this unsuitable for real-time work;
> a megabyte of imagery would require 100ms to transfer, in "raw"
> form.  (I could encode it on the originating host; transfer it
> and then decode it on the receiving host -- at the expense of MIPS.
> This is how I "record" video without saturating the network)

100 ms latency would be an issue if you face say A-Train
(for you and the rest who have not watched "The Boys" - he is a
super (a "sup" as they have it) who can run fast enough to not
be seen by normal humans...) :-).

>
> So, you (B) want to "abstract" the salient features of the image
> while it is on B and then transfer just those to A.  *Use*
> them, on A, and then move on to the next set of features
> (that B has computed while A was busy chewing on the last set)
>
> Or, give A direct access to the native data (without A having
> to capture video streams from each of the cameras that it wants
> to potentially examine)
>

In RFB, the server can - and should - decide which parts of the
framebuffer have changed and send across only them. Which works
fine for computer generated images - plenty of single colour areas,
no noise etc. In your case you might have to resort to jpeg
the image downgrading its quality so "small" changes would
disappear, I think those who write video encoders do something
like that (for my vnc server lossless RLE was plenty, but it it
is not very efficient when the screen is some real life photo,
obviously).

Don Y

unread,
Jan 1, 2023, 4:28:27 PM1/1/23
to
On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
>>> Perhaps you could find a way to make yourself some camera module
>>> using an existing one, MIPI or even USB, since you are looking for low
>>> overall cost; and add some MCU board to it to do the buffering and
>>> transfer areas on request. Or may be put enough CPU power together with
>>> each camera to do most if not all of the analysis... Depending on
>>> which achieves the lowest cost. But I can't say much on cost, that's
>>> pretty far from me (as you know).
>>
>> My current approach gives me that -- MIPS, size, etc.  But, the cost
>> of transferring parts of the image (without adding a specific mechanism)
>> is a "shared page" (DSM).  So, host (on node A) references part of
>> node *B*s frame buffer and the page (on B) containing that memory
>> address gets shipped back to node A and mapped into A's memory.
>
> I assume A and B are connected over Ethernet via tcp/ip? Or are they
> just two cores on the same chip or something?

If A had direct access to the camera on B, then they'd be the same node,
right? :>

A maps a memory object that has been defined on B into A's
memory space at a particular address (range). So, A *pretends*
it has a local copy of the frame buffer.

When A references ANY address in that range, a fault causes a
request to be sent to B for a copy of the datum referenced.

Of course, it would be silly to just return that one value;
any subsequent references would also have to cause a page fault
(because you couldn't allow A to reference an adjacent address
for which data is not yet available, locally). So, B ships a copy
of the entire page over to A and A instantiates that copy as a local
page marked as "present" (so, subsequent references incur no faults).

The application has fine-grained control over the *policy* that is
used, here. So, if he knows that an access to address N will then
be followed by N+3000r16, he can arrange for the page containing
N *and* N+3000r16 to both be shipped over (so there isn't a
fault triggered when the N+3000r16 reference occurs).

[There are also provisions that allow multiple *writers*
to shared regions so the memory behaves, functionally (but
not temporally!) like LOCAL "shared memory"]

But, that's a shitload of overhead if you want to treat
the remote frame buffer AS IF it was local.

>> But, transport delays make this unsuitable for real-time work;
>> a megabyte of imagery would require 100ms to transfer, in "raw"
>> form.  (I could encode it on the originating host; transfer it
>> and then decode it on the receiving host -- at the expense of MIPS.
>> This is how I "record" video without saturating the network)
>
> 100 ms latency would be an issue if you face say A-Train
> (for you and the rest who have not watched "The Boys" - he is a
> super (a "sup" as they have it) who can run fast enough to not
> be seen by normal humans...) :-).

The bigger problem is throughput. You don't care if all of your
references are skewed 100ms in time; add enough buffering to
ensure every frame remains available for that full 100ms and
just expect the results to be "late".

The problem happens when there's another frame coming before
you've finished processing the current frame. And so on.

So, while it is "slick" and eliminates a lot of explicit remote
access code being exposed to the algorithm (e.g., "get me location
X,Y of the remote frame buffer"), it's just not practical for the
application.

>> So, you (B) want to "abstract" the salient features of the image
>> while it is on B and then transfer just those to A.  *Use*
>> them, on A, and then move on to the next set of features
>> (that B has computed while A was busy chewing on the last set)
>>
>> Or, give A direct access to the native data (without A having
>> to capture video streams from each of the cameras that it wants
>> to potentially examine)
>>
>
> In RFB, the server can - and should - decide which parts of the
> framebuffer have changed and send across only them. Which works
> fine for computer generated images - plenty of single colour areas,

Yes, but if the receiving end has no interest in those areas
of the image, then you're just wasting effort (bandwidth)
transfering them -- esp if the areas of interest will need
that bandwidth!

> no noise etc.  In your case you might have to resort to jpeg
> the image downgrading its quality so "small" changes would
> disappear, I think those who write video encoders do something
> like that (for my vnc server lossless  RLE was plenty, but it it
> is not very efficient when the screen is some real life photo,
> obviously).

I think the solution is to share abstractions. Design the
algorithms so they can address partial "objects of interest"
and report on those. Then, coordinate those partial results
to come up with a unified concept of what's happening in
the observed scene.

But, this is a fair bit harder than just trying to look at
a unified frame buffer and detect objects/motion!

OTOH, if it was easy, it would be boring ("What's to be learned
from doing something that's already been done?")


Dimiter_Popoff

unread,
Jan 1, 2023, 4:53:11 PM1/1/23
to
On 1/1/2023 23:28, Don Y wrote:
> On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
>> ....
>....
>> In RFB, the server can - and should - decide which parts of the
>> framebuffer have changed and send across only them. Which works
>> fine for computer generated images - plenty of single colour areas,
>
> Yes, but if the receiving end has no interest in those areas
> of the image, then you're just wasting effort (bandwidth)
> transfering them -- esp if the areas of interest will need
> that bandwidth!

But nothing is stopping the receiving end to request a particular area
and the sending side sending just the changed parts of it.
I am not suggesting you use RFB, I use it just as an example.

>
>> no noise etc.  In your case you might have to resort to jpeg
>> the image downgrading its quality so "small" changes would
>> disappear, I think those who write video encoders do something
>> like that (for my vnc server lossless  RLE was plenty, but it it
>> is not very efficient when the screen is some real life photo,
>> obviously).
>
> I think the solution is to share abstractions.  Design the
> algorithms so they can address partial "objects of interest"
> and report on those.  Then, coordinate those partial results
> to come up with a unified concept of what's happening in
> the observed scene.

Well I think this is the way to go, too. This implies enough
CPU horsepowers per camera which nowadays might be practical.

> But, this is a fair bit harder than just trying to look at
> a unified frame buffer and detect objects/motion!

Well yes but you lose the framebuffer transfer problem, no
need to do your "remote virtual machine" for that etc.

>
> OTOH, if it was easy, it would be boring ("What's to be learned
> from doing something that's already been done?")
>

Not only that; if it were easy everyone else would be doing it :-).


George Neuner

unread,
Jan 1, 2023, 8:59:20 PM1/1/23
to
On Sun, 1 Jan 2023 14:28:20 -0700, Don Y <blocked...@foo.invalid>
wrote:

>On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
>
>The bigger problem is throughput. You don't care if all of your
>references are skewed 100ms in time; add enough buffering to
>ensure every frame remains available for that full 100ms and
>just expect the results to be "late".
>
>The problem happens when there's another frame coming before
>you've finished processing the current frame. And so on.
>
>So, while it is "slick" and eliminates a lot of explicit remote
>access code being exposed to the algorithm (e.g., "get me location
>X,Y of the remote frame buffer"), it's just not practical for the
>application.

All cameras have a free-run "demand" mode in which (between resets)
the CCD is always accumulating - waiting to be read out. But many
also have a mode in they do nothing until commanded.

In any event, without command the controller will just service the CCD
- it won't transfer the image anywhere unless asked.

Many "smart" cameras can do ersatz stream compression by double
buffering internally and performing image subtraction to remove
unchanging (to some threshold) images. In a motion activated
environment this can greatly cut down on the number of images YOU have
to process.

Better ones also offer a suite of onboard image processing functions:
motion detection, contrast expansion, thresholding, line finding ...
now even some offer pattern object recognition. If the functions they
provide are useful, it can pay to take advantage of them.

I know you are (thinking of) designing your own ... you should maybe
think hard about what smarts you want onboard.



>> In RFB, the server can - and should - decide which parts of the
>> framebuffer have changed and send across only them. Which works
>> fine for computer generated images - plenty of single colour areas,
>
>Yes, but if the receiving end has no interest in those areas
>of the image, then you're just wasting effort (bandwidth)
>transfering them -- esp if the areas of interest will need
>that bandwidth!

That's true, but protocols like VNC's "copyrect" encoding essentially
divide the image into a large checkerboard, and transfers only those
"squares" where the underlying image has changed. What is considered a
"change" could be further limited on the sending side by
pre-processing: erosion and/or thresholding.

The biggest problem always is how much extra buffering you need for
as-yet-unprocessed images in the stream - while you're working on one
thing, you easily can lose something else.


>> no noise etc.  In your case you might have to resort to jpeg
>> the image downgrading its quality so "small" changes would
>> disappear, I think those who write video encoders do something
>> like that (for my vnc server lossless  RLE was plenty, but it it
>> is not very efficient when the screen is some real life photo,
>> obviously).

And RLE or copyrect can be combined further with lossless LZ
compression.


For really good results, wavelet compression is the best - it
basically reduces the whole image to a set of equation coefficients,
and you can preserve (or degrade) detail in the reconstructed image by
altering how many coefficients are calculated from the original.

But it is compute intensive: you really need a DSP or SIMD CPU to do
it efficiently.


>I think the solution is to share abstractions. Design the
>algorithms so they can address partial "objects of interest"
>and report on those. Then, coordinate those partial results
>to come up with a unified concept of what's happening in
>the observed scene.
>
>But, this is a fair bit harder than just trying to look at
>a unified frame buffer and detect objects/motion!
>
>OTOH, if it was easy, it would be boring ("What's to be learned
>from doing something that's already been done?")

As I said previously, smart cameras can do things like motion
detection onboard, and report the AOI along with the image.


George

Don Y

unread,
Jan 2, 2023, 1:50:56 AM1/2/23
to
On 1/1/2023 2:53 PM, Dimiter_Popoff wrote:
> On 1/1/2023 23:28, Don Y wrote:
>> On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
>>> ....
> >....
>>> In RFB, the server can - and should - decide which parts of the
>>> framebuffer have changed and send across only them. Which works
>>> fine for computer generated images - plenty of single colour areas,
>>
>> Yes, but if the receiving end has no interest in those areas
>> of the image, then you're just wasting effort (bandwidth)
>> transfering them -- esp if the areas of interest will need
>> that bandwidth!
>
> But nothing is stopping the receiving end to request a particular area
> and the sending side sending just the changed parts of it.
> I am not suggesting you use RFB, I use it just as an example.

I'm trying to hide the fact that there are bits of code (and I/O's)
operating on different processors. I.e., a single processor would
<somehow> have all of these images accessible to it. I'd like to
maintain that illusion by hiding any transfers/mapping "under the
surface" so the main algorithm can concentrate on the problem at
hand, and not the implementation platform.

It's like having virtual memory instead of forcing the application
to drag in "overlays" due to hardware constraints on the address
space. Or, having to push data out to disk when the amount of
local memory is exceeded.

These are just nuisances that interfere with the design of the algorithm.

But, I may be able to use the shared memory mechanism as a way to "hint"
to the OS as to which parts of the image are of interest to the remote
node. Then, arrange for the pager to only send the differences over
the wire -- counting on the local pager to instantiate a duplicate
copy of the previous image (which is likely still available on that
host).

I.e., bastardize CoW for the purpose.

>>> no noise etc.  In your case you might have to resort to jpeg
>>> the image downgrading its quality so "small" changes would
>>> disappear, I think those who write video encoders do something
>>> like that (for my vnc server lossless  RLE was plenty, but it it
>>> is not very efficient when the screen is some real life photo,
>>> obviously).
>>
>> I think the solution is to share abstractions.  Design the
>> algorithms so they can address partial "objects of interest"
>> and report on those.  Then, coordinate those partial results
>> to come up with a unified concept of what's happening in
>> the observed scene.
>
> Well I think this is the way to go, too. This implies enough
> CPU horsepowers per camera which nowadays might be practical.

I've got enough for a single camera. But, if I had to handle a
multicamera *scene* (completely) with that processor, I'd be
running out of MIPS.

>> But, this is a fair bit harder than just trying to look at
>> a unified frame buffer and detect objects/motion!
>
> Well yes but you lose the framebuffer transfer problem, no
> need to do your "remote virtual machine" for that etc.

The question will be where the effort pays off quickest. E.g.,
dropping the effective frame rate may make simpler solutions
more practical.

>> OTOH, if it was easy, it would be boring ("What's to be learned
>> from doing something that's already been done?")
>
> Not only that; if it were easy everyone else would be doing it :-).

I have no problem letting other people invent wheels that I
can freely use. Much of my current architecture is pieced
together from ideas gleaned over the past several decades
(admittedly, on bigger iron than "MCUs"). It's only now
that it is economically feasible for me to exploit some of these
technologies.


Don Y

unread,
Jan 2, 2023, 2:28:03 AM1/2/23
to
On 1/1/2023 6:59 PM, George Neuner wrote:
> On Sun, 1 Jan 2023 14:28:20 -0700, Don Y <blocked...@foo.invalid>
> wrote:
>
>> On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
>>
>> The bigger problem is throughput. You don't care if all of your
>> references are skewed 100ms in time; add enough buffering to
>> ensure every frame remains available for that full 100ms and
>> just expect the results to be "late".
>>
>> The problem happens when there's another frame coming before
>> you've finished processing the current frame. And so on.
>>
>> So, while it is "slick" and eliminates a lot of explicit remote
>> access code being exposed to the algorithm (e.g., "get me location
>> X,Y of the remote frame buffer"), it's just not practical for the
>> application.
>
> All cameras have a free-run "demand" mode in which (between resets)
> the CCD is always accumulating - waiting to be read out. But many
> also have a mode in they do nothing until commanded.

The implication in my comments was that you would want to target a
certain frame rate as a performance metric. Whether that has to
be the cameras nominal rate or something slower than that would
likely depend on the scene being analyzed.

> In any event, without command the controller will just service the CCD
> - it won't transfer the image anywhere unless asked.
>
> Many "smart" cameras can do ersatz stream compression by double
> buffering internally and performing image subtraction to remove
> unchanging (to some threshold) images. In a motion activated
> environment this can greatly cut down on the number of images YOU have
> to process.
>
> Better ones also offer a suite of onboard image processing functions:
> motion detection, contrast expansion, thresholding, line finding ...
> now even some offer pattern object recognition. If the functions they
> provide are useful, it can pay to take advantage of them.

I don't yet know what will be useful. So far, my algorithms have been
two-dimensional versions of photo-interrupters. I don't care what I'm
seeing, just that I'm seeing it in a certain place under certain
conditions.

Visually tracking targets will be considerably harder.

Previously, I required the targets to wear a beacon that I could
locate wirelessly. This works because it gives the user a means
of interacting with the system audibly without having to clutter
the space with utterances (and sort out what's intentional and
what is extraneous).

But, that only makes sense for folks using "personal audio".
Anyone without such a device would be invisible to the system.

Switching to vision will (?) let me allow anyone in the arena
to interact/participate. And, can potentially let nonverbal
users interact without having to wear a "transducer".

> I know you are (thinking of) designing your own ... you should maybe
> think hard about what smarts you want onboard.

Thinking hard is easy. Knowing WHAT to think about is hard!

>>> In RFB, the server can - and should - decide which parts of the
>>> framebuffer have changed and send across only them. Which works
>>> fine for computer generated images - plenty of single colour areas,
>>
>> Yes, but if the receiving end has no interest in those areas
>> of the image, then you're just wasting effort (bandwidth)
>> transfering them -- esp if the areas of interest will need
>> that bandwidth!
>
> That's true, but protocols like VNC's "copyrect" encoding essentially
> divide the image into a large checkerboard, and transfers only those
> "squares" where the underlying image has changed. What is considered a
> "change" could be further limited on the sending side by
> pre-processing: erosion and/or thresholding.

I would assume you could adaptively size regions so you looked at the
cost of sending the contents AND size information for multiple smaller
regions vs. just the contents (size implied) for larger ones -- which
may only contain a small amount of deltas.

> The biggest problem always is how much extra buffering you need for
> as-yet-unprocessed images in the stream - while you're working on one
> thing, you easily can lose something else.

Yes. The "easy" approach is to treat it as HRT and plan on processing
every frame in a single frame time. Latency can be large-ish -- as long
as throughput is guaranteed.

>>> no noise etc.  In your case you might have to resort to jpeg
>>> the image downgrading its quality so "small" changes would
>>> disappear, I think those who write video encoders do something
>>> like that (for my vnc server lossless  RLE was plenty, but it it
>>> is not very efficient when the screen is some real life photo,
>>> obviously).
>
> And RLE or copyrect can be combined further with lossless LZ
> compression.
>
> For really good results, wavelet compression is the best - it
> basically reduces the whole image to a set of equation coefficients,
> and you can preserve (or degrade) detail in the reconstructed image by
> altering how many coefficients are calculated from the original.
>
> But it is compute intensive: you really need a DSP or SIMD CPU to do
> it efficiently.

Time spent compressing and decompressing equates with time on the
wire, transferring UNcompressed data. There's a point at which it's
probably smarter to just use more network bandwidth than waste
MIPS trying to conserve it.

My immediate concern is making a "wise" (not necessarily "optimal")
HARDWARE implementation decision so I can have some boards cut.
And, start planning the sort of capabilities/feature that I can develop
with those facilities.
0 new messages