Doesn't matter. Any serial interface poses the same problem;
I can't examine the image until I can *look* at it.
> Since you say you have a fairly heavy-weight processor, that frame grab likely
> isn't you limiting factor.
It becomes an issue when the number of cameras increases
significantly on a single host. I have one scene that requires
11 cameras to capture, completely.
>> *If* a "camera memory" was available, I would site N of these
>> in the (64b) address space of the host and let the host pick
>> and choose which parts of which images it wanted to examine...
>> without worrying about all of the bandwidth that would have been
>> consumed deserializing those N images into that memory (which is
>> a continuous process)
>
> But such a camera would almost certainly be designed for the processor to be on
> the same board as the camera, (or be VERY slow in access), so much less apt
> allow you to add multiple cameras to one processor.
Yes. But, if the module is small, then siting the assembly "someplace
convenient" isn't a big issue. I.e., my modules are smaller than most
webcams/dashcams.
>>>> ISTM that the better solution is to develop algorithms that can
>>>> process portions of the scene, concurrently, on different "hosts".
>>>> Then, coordinate these "partial results" to form the desired result.
>>>>
>>>> I already have a "camera module" (host+USB camera) that has adequate
>>>> processing power to handle a "single camera scene". But, these all
>>>> assume the scene can be easily defined to fit in that camera's field
>>>> of view. E.g., point a camera across the path of a garage door and have
>>>> it "notice" any deviation from the "unobstructed" image.
>>>
>>> And if one camera can't fit the full scene, you use two cameras, each with
>>> there own processor, and they each process their own image.
>>
>> That's the above approach, but...
>>
>>> The only problem is if your image processing algoritm need to compare parts
>>> of the images between the two cameras, which seems unlikely.
>>
>> Consider watching a single room (e.g., a lobby at a business) and
>> tracking the movements of "visitors". It's unlikely that an individual's
>> movements would always be constrained to a single camera field. There will
>> be times when he/she is "half-in" a field (and possibly NOT in the other,
>> HALF in the other or ENTIRELY in the other). You can't ignore cases where
>> the entire object (or, your notion of what that object's characteristics
>> might be) is not entirely in the field as that leaves a vulnerability.
>
> Sounds like you aren't overlapping your cameras enough or have insufficent
> coverage. Maybe your problem is wrong field of view for your lens. Maybe you
> need fewer but better cameras with wider fields of view.
Distance from camera to target means you have to play games with optics
that can distort images.
I also can't rely on "professional installers" *or* for the cameras to remain
aimed in their original configurations.
> This might be due to try to use "stock" inexpensive USB cameras.
>
>> For example, I watch our garage door with *four* cameras. A camera is
>> positioned on each side ("door jam"?) of the door "looking at" the other
>> camera. This because a camera can't likely see the full height of the door
>> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
>> and I'll watch *its* side!).
>
> Right, and if ANY see a problem, you stop. So no need for inter-camera
> coordination.
But you don't know there is a problem until you can identify *where*
the obstruction exists and if that poses a problem for the vehicle
or the "obstructing item". Doing so requires knowing what the
object likely is.
E.g., SWMBO frequently stands in the doorway as I pull the car in or
out (not enough room between vehicles *in* the garage to allow for
ease of entry/egress). I'd not want this to be flagged as a
problem (signalling an alert in the vehicle).
Likewise, an obstruction on one vehicle-side of the garage shouldn't
interfere with access to the other side.
>> [The other two cameras are similarly positioned on the overhead *track*
>> onto which the door rolls, when open]
>>
>> An object in (or near) the doorway can be visible in one (either) or
>> both cameras, depending on where it is located. Additionally, one of
>> those manifestations may be only "partial" as regards to where it is
>> located and intersects the cameras' fields of view.
>
> But since you aren't trying to ID, only Detect, there still isn't a need for
> camera-camera processing, just camera-door controller
The cameras need to coordinate to resolve the location of the object.
A "toy wagon" would present differently, visually, than a tall person.
>>>> When the scene gets too large to represent in enough detail in a single
>>>> camera's field of view, then there needs to be a way to coordinate
>>>> multiple cameras to a single (virtual?) host. If those cameras were just
>>>> "chunks of memory", then the *imagery* would be easy to examine in a single
>>>> host -- though the processing power *might* need to increase geometrically
>>>> (depending on your current goal)
>>>
>>> Yes, but your "chunks of memory" model just doesn't exist as a viable camera
>>> model.
>>
>> Apparently not -- in the COTS sense. But, that doesn't mean I can't
>> build a "camera memory emulator".
>>
>> The downside is that this increases the cost of the "actual camera"
>> (see my above comment wrt ammortization).
>
> Yep, implementing this likely costs more than giving the camera a dedicated
> moderate processor to do the major work. Might not handle the actual ID problem
> of your Door bell, but could likely process the live video, take a snapshot of
> a region with a good view of the vistor coming, and send just that to your
> master system for ID.
But, then I could just use one of my existing "modules". If the
target fits entirely within its field of view, then it has everything
that it needs for the assigned functionality. If not, then it
needs to consult with other cameras.
>>> The CMOS cameras with addressable pixels have "access times" significantly
>>> lower than your typical memory (and is read once) so doesn't really meet
>>> that model. Some of them do allow for sending multiple small regions of
>>> intererst and down loading just those regions, but this then starts to
>>> require moderate processor overhead to be loading all these regions and
>>> updating the grabber to put them where you want.
>>
>> You would, instead, let the "camera memory emulator" capture the entire
>> image from the camera and place the entire image in a contiguous
>> region of memory (from the perspective of the host). The cost of capturing
>> the portions that are not used is hidden *in* the cost of the "emulator".
>
> Yep, you could build you system with a two-port memory buffer between the frane
> grabber loading with one port, and the decoding processor on the other.
Yes. But large *true* dual-port memories are costly. Instead, you would
emulate such a device either by time-division multiplexing a single
physical memory *or* sharing alternate memories (fill one, view the other).
> The most cost effective way to do this is likely a commercial frame-grabber
> with built "two-port" memory, that sits in a slot of a PC type computer. These
> would likely not work with a "USB Camera" (why would you need a frame grabber
> with a camera that has it built in) so would be totally changing your cost models.
Yes, I have a few of these intended for medical imaging apps.
Way too big; way too expensive. Designed for the wrong type of "host"
> IF your current design method is based on using USB cameras, trying to do a
> full custom interface may be out of your field of operation.
>
>>> And yes, it does mean that there might be some cases where you need a core
>>> module that has TWO cameras connected to a single processor, either to get a
>>> wider field of view, or to combine two different types of camera (maybe a
>>> high res black and white to a low res color if you need just minor color
>>> information, or combine a visible camera to a thermal camera). These just
>>> become another tool in your tool box.
>>
>> I *think* (uncharted territory) that the better investment is to develop
>> algorithms that let me distribute the processing among multiple
>> (single) "camera modules/nodes". How would your "two camera" exemplar
>> address an application requiring *three* cameras? etc.
>
> The first question comes, what processing are you thinking of that needs images
> from 3 cameras.
>
> Note, my two camera example was a case where the processing needed to be done
> did need data from two cameras.
>
> If you have another task that needs a different camera, you just build a system
> with one two camera model and one 1 camera module, relaying back to a central
> control, or you nominate one of the modules to be central control if the load
> there is light enough.
>
> Your garage doer example would be built from 4 seperate and independent 1
> camera modules, either going to one as the master, or to a 5th module acting as
> the master.
Yes, but they have to share image data (either raw or abstracted)
to make deductions about the targets present.
> The cases I can think of for needing to process three cameras together would be:
>
> 1) a system stiching images from 3 cameras and generating a single image out of
> it, but that totally breaks your concept of needing only bits of the images,
> that inherently is using most of each camera, and doing some stiching
> processing on the overlaps.
>
> 2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from
> the three cameras and producing a merged "false-color" image from them. Again,
> this also breaks you partial image model.
Or, tracking multiple actors in an "arena" -- visitors in a business,
occupants in a home, etc. In much the same way that the two garage
door cameras conspire to locate the obstruction's position along the
line from left doorjam to right, pairs of cameras can resolve
a target in an arena and *sets* of cameras (freely paired, as needed)
can track all locations (and targets) in the arena.
>> I can, currently, distribute this processing by treating the
>> region of memory into which a (local) camera's imagery is
>> deserialized as a "memory object" and then exporting *access*
>> to that object to other similar "camera modules/nodes".
>>
>> But, the access times of non-local memory are horrendous, given
>> that the contents are ephemeral (if accesses could be *cached*
>> on each host needing them, then these costs diminish).
>>
>> So, I need to come up with algorithms that let me export abstractions
>> instead of raw data.
>
> Sounds like you current design is very centralized. This limits its scalability,
The current design is completely distributed. The only "shared component"
is the network switch through which they converse and the RDBMS that acts
as the persistent store.
If a site realizes that it needs additional coverage to track <whatever>
it just adds another camera module and lets the RDBMS know about it's general
location/functionality (i.e., how it can relate to any other cameras
covering the same arena)
>>>>> My first feeling is you seem to be assuming a fairly cheep camera and then
>>>>> doing some fairly simple processing over the partial image, in which case
>>>>> you might even be able to live with a camera that uses a crude SPI
>>>>> interface to bring the frame in, and a very simple processor.
>>>>
>>>> I use A LOT of cameras. But, I should be able to swap the camera
>>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>>> E.g., some of my cameras have Ir illuminators; it's not important
>>>> in others; some are PTZ; others fixed.
>>>
>>> Doesn't sound reasonable. If you downgrade a camera, you can't count on it
>>> being able to meet the same requirements, or you over speced the initial
>>> camera.
>>
>> Sorry, I was using up/down relative to "nominal camera", not "specific camera
>> previously selected for application". I'd 8really* like to just have a
>> single "camera module" (module = CPU+I/O) instead of one for camera type A
>> and another for camera type B, etc.
>
> That only works if you are willing to spend for the sports car, even if you
> just need it to go around the block.
If the "extra" bits of the sports car can be used by other elements,
then those costs aren't directly borne by the camera module, itself.
E.g., when the garage door is closed, there's no reason the modules
in the garage can't be busy training speech models or removing
commercials from recorded broadcast content.
If, OTOH, you detect objects with a photo-interrupter across the door's
path, there's scant little it can do when not needed.
> It depends a bit on how much span you need of capability. A $10 camera is
> likely having a very different interface to a $30,000 camera, so will need a
> different board. Some boards might handle multiple camera interface types if it
> doesn't add a lot to the board, but you are apt to find that you need to make
> some choice.
I don't ever see a need for a $30,000 camera. There may be a need for a
PTZ model. Or, a low lux model. Or, one with longer focal length. Or,
shorter (I'd considered putting one *in* the mailbox to examine its
contents instead of just detecting that it had been "visited").
Instead of a 4K device, I'd opt for multiple simpler devices better
positioned.
But, not radically different in terms of cost, size, etc.
If you walk into a bank lobby, you don't see *one* super-high resolution,
wide field camera surveilling the lobby but, rather half a dozen or more
watching specific portions of the lobby. Similarly, if you use the
self-check at the store, there is a camera per checkout station instead
of one "really good" camera located centrally trying to take it all in.
This gives installers more leeway in terms of how they cover an arena.
> Then some tasks will just need a lot more computer power than others. Yes, you
> can just put too much computer power on the simple tasks, (and that might make
> sense to early design the higher end processor), but ultimately you are going
> to want the less expensive lower end processors.
I can call on surplus processing power from other nodes in the system
in much the same way that they can call on surplus capabilities from
a camera module that isn't "seeing" anything interesting, at the moment.
There will always be limits on what can be done; I'm not going
to be able to VISUALLY verify that you have the right wrench in
your hand as you set about working on the car. Or, that you
are holding an eating utensil instead of a random piece of
plastic as you traverse the kitchen.
But, I'll know YOU are in the kitchen and likely the person whose
voice I hear (to further reinforce the speaker identification
algorithms).
>>> You put on a camera a processor capable of handling the tasks you expect out
>>> of that set of hardware. One type of processor likely can handle a variaty
>>> of different camera setup with
>>
>> Exactly. If a particular instance has an Ir illuminator, then you include
>> controls for that in *the* "camera module". If another instance doesn't have
>> this ability, then those controls go unused.
>
> Yes, Auxilary functionality is often cheap to include the hooks for.
But, it often requires looking at your TOTAL needs instead of designing
for specific (initial) needs. E.g., my camera modules now include
audio capabilities as there are instances where I want an audio
pickup in the same arena that I am monitoring. Silly to have to add
an "audio module" just because I didn't have the foresight to
include it with the camera!
>>>> Watching for an obstruction in the path of a garage door (open/close)
>>>> has different requirements than trying to recognize a visitor at the front
>>>> door. Or, identify the locations of the occupants of a facility.
>>>
>>> Yes, so you don't want to "Pay" for the capability to recognize a visitor in
>>> your garage door sensor, so you use different levels of sensor/processor.
>>
>> Exactly. But, the algorithms that do the scene analysis can be the same;
>> you just parameterize the image and the objects within it that you seek.
>
> Actually, "Tracking" can be a very different type of algorithm then
> "Detecting". You might be able to use a Tracking base algorithm to Detect, but
> likely a much simpler algorithm can be used (needing less resources) to just
> detect.
My current detection algorithm (e.g., garage) just looks for deltas between
"clear" and "obstructed" imagery, conditioned by masks. There is some
image processing required as things look different at night vs. day, etc.
I don't have to "get it right". All I have to do is demonstrate "proof of
concept". And, be able to indicate why a particular approach is superior
to others/existing ones.
E.g., if you drive a "pickup-on-steroids", you'd need to locate a
photointerrupter "obstruction detector" pretty high up off the ground
to catch the case where the truck bed was in the way of the door.
Or, some lumber overhanging the end of the bed that you forgot you'd
brought home! And, you'd likely need *another* detector down low
to catch toddlers or toy wagons in the path of the door.
OTOH, doing the detection with a camera catches these use conditions
in addition to the "nominal" one for which the photointerrupter was
designed.
Tracking two/four occupants of a home *suggests* that you can track
6 or 8. Or, dozens of employees in a business conference room, etc.
I have no desire to spend my time perfecting any of these
technologies (I have other goals); just lay the groundwork and the
framework to make them possible.