kOfxImageEffectPropRenderWindow
The region to be rendered.
The order of the values is x1, y1, x2, y2.
This will be in PixelCoordinates
topic 1: PropRenderWindow
OK part 1 - I cover in this email 3 of 6 host types on my list, so this is cleared out first before it gets more fancy and also as to not be overwhelming.
My initial hypothesis was that we needed an additional definition
for what I call Spatial Format but it's probably too simple, now
upon thinking some more seems like we
actually probably need to somehow enumerate hosts (host typology)
or distinguish them (actually often it's mode-place in application as
opposed to host) via a set of strings to
clarify at plugin level what is possible. Some of this might be a
defect of our Contexts but that's also a different discussion. I
do start in incremental amount of information about situation available
as opposed to start from a theorical general situation and degrade from
there...
So here's first draft and the simpler pass draft first, the net result I
think would be a real simple chart or table where essentially row is
type of host architecture. Anyhow the best idea I think now is probably
to create somehow a Spatial format supported "master property" so we
can avoid inventing prop names or create if this and that then this
means this... and add a set of ofxStrings with specified interpretation.
It took me a number of hours to actually run down in my head
different processing architecture I needed to work with over many years
:) - It looks I hope real simple in the end. Anyhow here's first draft
of where I am.
So before we ofxStringnify this, we start with what we have, first a "nominal" Project
Space (resolution). I assume
here Project Space is first at least a default reference usable
somehow by
all host types. This is not even a given, I remember when this was
first proposed, someone at ILM saying their internal compositor did not
have that concept.
kOfxImageEffectPropProjectExtent
for
the extent of the current project, always rooted at 0,0 so it's 0,0 X,Y (this defines W, H, 2D int)kOfxImageEffectPropProjectSize
for
the size of the current project, a single scale value 1.0 or
lower XY // why lower? and it's a 2D value doublekOfxImageEffectPropProjectOffset
for
the offset of the current project. an offset from 0,0 XY // a double(768,576) --------------------------------------- | | | BLACK | |.....................................| (768, 504) | | | | | LETTER BOXED IMAGERY | | | | | (0,72) |.....................................| | | | BLACK | | | --------------------------------------- (0,0)
This is ascii art - copy and paste from current doc Comment: This
should probably really be 2 windows - but anyway Size + Offset is our
second window. But for backwards compatibility it's spit across 3
properties and stays like that.
In practice this is also just like layer and comp for example. Right? (next figure)
For a fit-all host types solution, we need to clarify that PropProject is the Parent
of our Clip. That is whether one calls this a comp, a timeline or
a sequence... there is spatially a parent reference. Similarly whether
one calls the container of the frame a video track, a clip or a layer
(or even a node) - when you stack effects it's rooted at that second
window somehow.
(768,576) --------------------------------------- | | | COMP | ............................................... (818, 504) | | | | | LAYER | | | | | (-50,72).............................................| | | | COMP | | | --------------------------------------- (0,0)
[ These invariant at least in the scope of a clip frame range pair
Project Props are also important for Overlay Drawing anchoring which we
assume happens in COMP space not LAYER space in graphic above ]
Then follows (it's not incremental level of difficulty or sophistication of the host)
Host Type 1: The Two Windows coincide. All inputs and
outputs are the same
size as Project. Most systems are not uniformely like that but can
have a mode/page etc where they are effectively that (or even a
bit like Opaque for Premult, can be like that when sourcing a fileIn
input). There is no
RoI, RoD, render window etc - it's all the same as main input clip
bounds, and it's 0 to ProjectExtent (is fully covered by Project
properties). OT: Their counter part at effect level would be
pixel-independent/point processing/pixel shader where a pixel in an
effect does not need the rest of the image. We could theorically have
something tinier than tiling where such can be concatenated by host...
For ImageEffect this might be the simples host type (I guess one
could come with a plugin that just collects/generates data in UI thread
actions and is a NO OP render wise for an even simpler case).
Host Type 2: Next we have essentially only 3 possibilities
to handle here: overscan (LAYER does not fit in COMP), LAYER fits in
COMP and LAYER=COMP window (is then Type 1)
In others words, small clip in large project, large clip in small project.
To avoid confusion I repaste changing the labels - but to mean same thing (Timeline is parent of clip)
(768,576) --------------------------------------- | | | TIMELINE | ............................................... (818, 504) | | | | | CLIP | | | | | (-50,72).............................................| | | | TIMELINE | | | --------------------------------------- (0,0)
No per frame windowing yet - ( that's Type 3 and over)
If someone doesn't get it yet, : ascii art below where the EDIT
module has very specific hard coded supported resolutions... and the
compositor is sandwiched in-between.
(768,576) --------------------------------------- | | | EDIT MODULE | ............................................... (818, 504) | | | | | CONNECT LINK | | | | | (-50,72).............................................| | | | EDIT MODULE | | | --------------------------------------- (0,0)
Such host in OVERSCAN case can create an output buffer size of union of both ("pad with real pixels") or in case of UNDERSCAN ("pad with black").
Essentially we have one window which is the defined pixels data and one
window which is just reference geometry (a crop window).
For UNDERSCAN case the reference geometry tells the plugin it's a smaller image in a large comp...
This sort of host hopefully has some form of nesting/compounding so user can workaround edge cases.
Practical example: an animator might render OVERSCAN (+20% the amount
of pixels) not to reframe but to have correct image processing at
frame edges... the final is assume smaller. (Also note for even versus odd number size, the geometry window 0.5, 0.5 is center of image and not necessarily aligned with a pixel).
Also, for example transcoding in Quicktime plugin parlance (like a
filter with extra powers regarding spatial- temporal - and color
conversion) would fit in this type of host (or typically more
appropriately this sort of mode within an host). It's not expected here
to have effects manipulate the spatial format here, just to let the
user define the format implicitely. It's useful to have regular filter
in this context as one probably wants to denoise before they resize
(order is important)... If one remembers Shake for example when we did
Twixtor (the temporal aspect of analogue discussion) we needed to derive
this as a FileIn (file reader)... sort of same as being a transcoder
type. Even every node-based compositor have non-animatable input and
output nodes... We are not even at pan and scan here, typically aside
resizing, cropping, one would flip image, normalize aspect ratio to 1.0
(in NTSC/PAL days, probably at same time as deinterlacing), handle
anamorphic squeeze...). ETC We are then sort of like in ffmpeg (or no
UI TuttleOFX import export extentions) plugin here.
[...]
Host Type 3: Then, we get to next Type of host (or host mode), an additional window is expected and one that can change every frame.
Such host typically starts at LAYER root and essentially can only grow
the LAYER size in terms of real pixel memory allocation. They start at LAYER root
and advance effect by effect (not knowing in advance where they are
going). One optimization implementation seen would be a pre-render call
that goes through the effect stack and just adds to the defined pixels
window to the output buffer in the end [This is slightly different than
how we document RoI and RoD right now). So when the first effect is
rendered, at least the defined pixels bounds are already in a large window,
and then it's possible the input/source is already padded with black.
Some sort of extent hints can (should) be book-keep in such architecture as
you move through the stack but it will always be clipped by the output
size (pixel data will never happen over that).
Yes, these brute force systems have issue with some types of effect,
e.g. a corner pin in normalized coords 0,0 to 0.5, 0.5 creates an output
window of 2X in each dimension for size of output buffer. So in such
host it's typical for effect to add some sort of max grow output buffer
menu (e.g. 2X max)... So why an effect must know it's this sort of host
context and this sort of architecture is its own Type. Host Type
definition is also important here so something like a point parameter on
a distort node is not move relatively to it's location in image by
adding a blur before it for example.
Similarly in such system when an effect on another layer requests a frame from that layer, also as that layer often has a post-effect transform in this sort of application, it's expected to get some compromise, e.g. get that input cropped to the COMP size for example (implementation varies). Some specifications might be useful here to understand the host behavior or options with regards to that: Get Layer without effect applied, Get Layer with effects applied but no post-effect transform (details TBA, trying to avoid host by name)...
[...]
I stop here for today - Spatial format supported would be a state in API added, then to this can be added interpretation based on what is supported.
OK completing my listing, so I have rows to check against properties.
Just sharing notes, my hypothesis overall is now we need to sort of
define a transcoding/parenting model where spatial, temporal and
pixel/color variables are more clear - while creating the minimal amount
of new properties - so plugin in some place in host has a clear picture
of what they can know and do, and then so we can when we then get to
"dynamic" metadata have something to sit things like transform, timeline
(e.g. a plugin that marks cut in video even), AWB/AE over time
meta-data from a video stream.... so all this is mappable to OFX
universe.
Host type 4-6 can be summarized quickly, overall recommended reading
if the purpose of this reflection is not clear, would be latest Resolve
help manual - chapter 9 about Spatial Formatting. (task: Insert plugin
here! :) ).
Host Type 4: Do support on main input clip/frame basis a
full image/frame region processing, including for example if you
stabilize and the clip is now e.g. 500 pixels left on frame 12 and
400
on frame 11, all still good (at least until one inserts a diagonal
blur before you - again note a stabilization is a transform not a
render region, a render window can optimize the render from the transform but it's
different kind of information). However these "type 4" hosts type share
the issue of
additional inputs on an effect not being fully evaluated (see end
of Type 3). These sort of hosts can be unfolded from a layer
based view to a nodal connection view so we avoid here the nodal
compositing language. To continue using examples
of hosts that don't exist anymore, remember Combustion had that
dual view mode, and still having comps inside the nodal views and
in
their case 32b float only existed within a layer scope... (so on
the "color" level). Using that example to remind that the parent
domain of effect can be different for time, spatial and color (or I
should maybe say pixel) attributes.
Host Type 5: Generally we would descride these as nodal
compositor that would evaluate the full graph (typically a DAG not
a tree - perhaps above is a tree with "symbolic links" - not
necessarily a doubly linked list), although at a theorical level
not all nodal compositor evaluate the same. Type 5 host rely when
evaluating on rules about how to handle A vs B clip with different
attributes when there are multiple inputs. Some hosts vary by
purity of certain attributes, which in this sub-context amounts to
how we can separate a change of the pixels size (scaling) versus
the size of defined pixel data window or not. Whether one wants
such convenience maintained at node level or force the same via
special nodes that break the graph to accomplish the same (via
making it up via parameters on special nodes) will vary per
implementation.
To just bang on defunct hosts, for example
initially Shake did not support 3:2 pulldown on outputs (or
anything else than fileIn), a simplest case of temporal processing
and formatting.
Eventually as a workaround to simplify discussion they added a
cache node one could insert in middle of a graph (write out - read
in). Same with temporal processing, one host might carry along the
graph video interlacing information until it meets a node that
resets that to deinterlaced frame, while another will only handle
that at image input level... Or if one remembers FCP Legacy,
they were internally YUVA, and they supported initially AE plugins
RGBA, so when one selected say a certain codec as output, you
could get terrible banding as every plugin processing when
rendering (the whole project) was now conformed to the output
format selected... If the effect of that is not clear, say you
have a threshold value slider - it might not be the same value because
the pixel precision changed... ETC
The example I used from Nuke NDK: (the UI menu thing as in Nuke is optional, an host could even respond by automatically upon application add a node before or after to conform to this, that part is irrelevant), note NDK format also supports a static window not just a spatial resolution (back to our letterbox window).
Stores: FormatPair (as defined in Format.h).Proxy adaptive: noNotes: Presents a single select drop down of all formats currently available in the script. FormatPair stores both full size and proxy adapted versions, and offers a useful ‘format()’ function for querying and setting the current script format list programmatically.
Now: example MultiResolution prop
kOfxImageEffectPropSupportsMultiResolution
Note: Trying not to create situation where tons of new props needs to be made up for each kind of host - but SupportsMultiResolution might be missing some cases here. E.g.: plugin supports inputs of different resolution but not different resolution of a same clip across time. However we cannot just add 2 - below as this will likely break something, if that was the case it needs to be subject to a new SpatialFormat master switch existing - where then this might be an option then but only if... Also not to start to have Prop Versions... So meaning of 0 and 1 below cannot really be changed if the intent is non-destructive/evolutive.
Indicates whether a plugin or host support multiple resolution images.
Type - int X 1
Property Set - host descriptor (read only), plugin descriptor (read/write)
Default - 1 for plugins
Valid Values - This must be one of
0 - the plugin or host does not support multiple resolutions
1 - the plugin or host does support multiple resolutions
Multiple resolution images mean…
input and output images can be of any size
input and output images can be offset from the origin
Host Type 6: I am not sure how to call these but let's
call it for simplicity "3D compositing" as grab all - we have
discussed before "deep" compositing, multi-channels, and
concatenation of transforms,.... ) a lot more columns in our then
not as simple chart. Concatenation of transform itself is a
whole topic as it relies on geometry not defined pixel data
window....
For clarity here example: we used to make Softimage plugins, in
their FXtree you either pipe the image processing graph into a
Mental Ray texture, or you went to a fileOut, in the later case
you were if you like post-render (this is related to e.g. DOF on a
render layer, like 3DS Max RPF DOF effect right in the render
settings of the app - the render builds up the deep channels and
a post-render process is called once it is filled)... In Softimage
XSI, you couldn't be both at once (you needed to cache to disk in
two steps if needed). I am not going to discuss this today as this
would take a whole new chapter, but we could have even an host
that
collects samples per pixels in such context (a step beyond layers
with z per pixels)...
Overall, this what you can do in an host is not unique, like one host we support allows user to apply effects as an image processing tree that is attached to main input and in other mode feeding their 3D compositor, we loose the power of temporal processing (accessing frames at other time). NET: What we have seen in this episode is that it might not be a good idea in a "meta-model" trying to accomodate different developers to confuse render areas (integer dimensions) and imaging geometry (a transform that has a variable amount of attributes - do we like "attributes" as word here, what USD uses - we use parameters to mean something else, and we switch between preferences, hints and properties). Although when resolving a render graph, the render areas are derived from the imaging/rendering geometry. This affects not only supporting a random h9st without operating it first and discovering how it works in certain user configurations but also probably limits the host in terms of representing itself for best behavior by effects.
OK making progress
Here's a spreadsheet I started
https://1drv.ms/u/s!Agx0V05pGlbqg-xJj4KFWrBBUZCbLQ?e=8gdPYJ
Has multiple purposes
1. Cheat Sheet for all properties
2. Make sure we can easily represent all host (and particularly a particular context within an host)
Not finished
- missing a diagram of order of things called
- I moved out of there all related to Interact for now
- I already have an errata - ProjectSize is intersected by Project
Extent in doc now, so it cannot act as my list describes as an effect
container/parent
Project/Effect/Clip/Frame
- I am still leaning on side that we need a SpatialFormat enveloppe over
Effect that is a static spatial reference and it would be a good thing
to have a checklist per host that is programmatic instead of an
interview questionnaire online. So my host typology (there would be
different host types/colums for temporal and pixel components/color
etc), right now is missing per property per host type, RO, RW, and NA
(host might or not support) - and some properties as host simplifies in
some context collapse as being the same (which one we use). This way an
host could just report with a single string what is possible for effect
in the host right now...
- Bonus: I noticed we have continuous sampling for temporal but not for
spatial (this is related to transform concatenation in a way).
Note not a new discussion: scroll down - Peter Loveday 2005 suggested the term
'target clip resolution' - sort of same thing I call Spatial Format
here, rereading there is definitively an issue with Project Space -
aside perhaps as the reference for interaction (point locator...) we
never resolved.
https://sourceforge.net/p/openfx/mailman/openfx-developer/?viewmonth=200502
reposting - I copied and paste double info first time...
Ok next pass up: updated spreadsheet (now has colors too)
https://1drv.ms/x/s!Agx0V05pGlbqg-xK_pQx2tsDx3Hv0g?e=DOkvkX
Context: step-by-step implementing Host Type per property /
action. We have a lot of isolated definitions to allow different kinds
of hosts to support the API as they can best, however for a plug-in
developer it becomes overall a bit mysterious. The hypothesis is we
could create a simple "enumeration" - strings in OpenFX to allow host to
qualify themselves.
Errata first: I typed the data type without looking at doc, turns out (already noted before) that kOfxImagePropRegionOfDefinition is an integer window unlike the others. Now have to look what it means against render scale...
I removed contexts in spreadsheet as a column, and also I realized
the proper way to do this is to isolate just what affects the
spatial formatting and what has connection to but is not
formatting per se. This is now just Spatial formatting/processing
properties. For example tiling is orthogonal to this, it's a
render model. Similar spatial parameters props
I now came up with a very simple way to qualify Type 1 vs Type 2 in my host typology.
Host Type 1. Simplest is easy - it just needs
PropBounds to get W and H - and nothing else
Assumes no windowing - you get all the pixels and you write all
the pixels
Everything same size (render scale is probably still an option)
On plug-in side, anything that is just pixel/point processing, like a
lookup, a transfer function, e.g. invert RGB is safe here.
Theorically we could or not add a Type 0, per pixel processing (allowing
concatenation down to per pixel) as plug-in descriptor for this. These
plug-ins should never have to look for RoD, RoI, FramesNeeded etc, and
be guaranteed not to crash.
Host Type 2. for now one buffer in and
one buffer out (the default Filter Context Scenario)
To express this we need two windows on each side
For input, the pixel data bounds and the imaging geometry
window
For the output (in practice that being a display window
i.e. a viewer node or a fileOut is irrelevant)
The proto-case here is anything that has a frame edge condition (e.g.
an edge detector, a blur (fill edges or not),...) and the situation
where a small clip is brought into a larger timeline (what happens for
example in Resolve edit page but not only in Resolve to be clear, if you
set the mismatch resolution menu not to scale). Then the small image
is in a sea of black pixels and we don't know where the edges of the
source images are. So in this scheme host sub-context for this would
switch to Type 1 (to be named with a string) when we are in that
situation if there is no source spatial format info provided. Plugin
can still accept to work but would need to either parametrize for user
entry Widht and Height or turn off some options.
We are or not API wise over-expressive here. Will wait until I fill Type 3 to update spreadsheet.
A simple example would be input is a film scan which also has the audio soundtrack scanned, the input window is then the actual pixel data to internally use from the actual frame of defined pixels buffer. On the output side, with same example one might want say to output a 16:9 crop (the render window). Within Type 2 subcontext not processing outside the output geometry window is not destructive, is just render time saving.
Next one, Type 3 - (for people familiar with AE, Type 3 is
like old AE API as implemented maybe by Premiere now, and Type 4 is like
AE SmartFX API) and Type 5 is like Fusion the app (type 5 differs from
type 4 in how it deals with additional inputs)...
Pierre -
Forgive me if some of this has been fogged by history, but there isn't much of a summary or use-cases in the GH issue as for why this feature is needed. I'm not entirely convinced you can't do what you need with the existing actions/properties. Could you add some more info to the issue as to what you want, and we can weigh in again?
Cheers,
-Paul
--
You received this message because you are subscribed to the Google Groups "ofx-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ofx-discussio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ofx-discussion/35317813-1efe-48aa-89a2-d9913e831a77n%40googlegroups.com.