'Remaining Dimension' examples

0 views
Skip to first unread message

Lucian Smith

unread,
May 6, 2021, 5:29:03 PM5/6/21
to sed-ml-...@googlegroups.com
I created a couple SED-ML examples that I think illustrate how the 'RemainingDimension' class is supposed to be used:

A simple one for 'average':

More involved for average, max, min, and stdev:

These both have corresponding outputs from Tellurium:



Basically, both use a combination of the 'symbol' and 'target' in a Variable:

        <variable id="task1_____S2" symbol="urn:sedml:function:average" target="/sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id=&apos;S2&apos;]" taskReference="task1" modelReference="model0">
          <listOfRemainingDimensions>
            <remainingDimension target="task0"/>
          </listOfRemainingDimensions>
        </variable>

Here, task0 is the time course, and task1 is the stochastic repeat of that time course.  So when 'task0' is the remaining dimension, that means that the result has the same dimensions as the base time course.

Does this seem reasonable to people?  Is it what you would expect?

-Lucian

Jonathan Karr

unread,
May 6, 2021, 5:45:57 PM5/6/21
to sed-ml-...@googlegroups.com
Thanks for sharing examples. This is really important for driving consistency.

I'm new too looking at dimension reduction. I was (perhaps naively) expecting MathML to be involved. To confirm, this is something that has to be separate from MathML?

Whatever we align on, we'd be happy to add this to our SED-ML test suite so software tools have concrete examples to check against and feature support can be clearly communicated.

Jonathan

--
You received this message because you are subscribed to the Google Groups "sed-ml-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sed-ml-discus...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/sed-ml-discuss/CAHLmBr0CkCsmE7Zqes1OkUtbfHaHBqf_zZ3P4BWobh6q3vYOdw%40mail.gmail.com.

Lucian Smith

unread,
May 6, 2021, 6:15:34 PM5/6/21
to sed-ml-...@googlegroups.com
I assumed MathML would be involved when I first started thinking about this as well.  And if we wanted to implement support for this inside MathML, I think in theory we could do it that way, but I also think it would involve using MathML in a non-standard way.

Essentially, what we want is something akin to "numpy.average(foo, axis=0)", i.e. a function with multiple arguments.  But the 'mean' function (or max, min, etc.) in MathML has no such 'second argument':


(I suppose MathML v3 might have something we could use more cleanly here, but that would clearly involve a larger change to SED-ML than we can afford right now.)

The other problem with this approach is that in general, any MathML that we use in SED-ML is assumed to work on an element-by-element basis.  In essence, we say that each symbol in the MathML is supposed to be interpreted as a scalar, and the vectors/matrices that are built from those are built up from the scalar level, and never work 'backwards' from a vector back to a scalar again.

There are ways to 'annotate' elements in MathML, so it would be possible to annotate the 'mean' function to say 'leave this dimension'.  So this is in theory a viable possibility, but we've struggled to make the concepts work in our previous meetings.

This way of encoding 'dimension-reducing' functions allows us to 'work backwards' in a clean separate way from the MathML, which can remain at the scalar level for all symbols.  I don't remember who came up with this scheme other than to say it wasn't me--I remember the discussion that led to the concept of the 'RemainingDimensions' class, but not how it ended up getting used outside of MathML instead of inside it.  But having implemented (basic) support for it at this point, I do like it.

-Lucian





Jonathan Karr

unread,
May 6, 2021, 7:00:37 PM5/6/21
to sed-ml-...@googlegroups.com
Using a combination of symbols/targets to encode reductions seems like a way to add a small amount of data reduction outside of MathML. I agree this works around some of the issues with vector functions and MathML, keeping variables in MathML to scalars.

A couple additional thoughts:
  • SED-ML L1V3 already includes vector MathML functions and more are being added in L1V4. I think it would be best to avoid two ways of doing similar things.
    • I don't love using combinations of symbols/targets to avoid vector functions while simultaneously having them. I would vote for (a) removing vector MathML functions or (b) fully embracing them, including for use with remaining dimensions.
    • I'm hesitant on the current vector functions with MathML. I brought this up in a series of GitHub issues (152-156). A key issue is that these vector functions require SED-ML variables to have different values depending on their context (in or not in vector functions). While this can be made to work, this seems ripe for confusion and mistakes, plus this seems unnecessarily complex to implement.
  • I think more flexibility is needed than what combination of symbols/targets would enable. 
    • Without mirroring MathML in SED URNs (or KiSAO terms), the combination of symbols/targets would only allow limited reductions. This will lead to the temptation to create more URNs, which would increase the implementational complexity of SED-ML further. 
    • If a MathML scheme could be worked out, it could be much more expressive. While I'm hesitant on the current vector functions, I think more flexibility in math is essential.
Another potential way to approach this to allow a second matrix-type of mathematical expression where the values of variables are matrices (e.g., vectors for time courses or matrices for repeated time courses). The math style could be captured by an additional attribute on each Computation object. I think this avoids the need to add annotation inside MathML expressions by having separating contexts for element-wide and matrix computations. To me, this would operate similarly to MATLAB or NumPy. This would also require the introduction of matrix operators e.g., to distinguish between element-wise multiplication and inner and outer products. Implementing this could be facilitated by adding functionality to evaluate mathematical expressions to libSED-ML or another separate library. 

Jonathan

Lucian Smith

unread,
May 6, 2021, 7:23:01 PM5/6/21
to sed-ml-...@googlegroups.com
I agree that only having One Way To Do Things is preferable.  At this point, I would vote for taking vector functions out of MathML.  I don't see a good way to use them at all, and certainly don't see a way to use them to duplicate the functionality in the examples.  (And unless I'm mistaken, they have zero implementations.)

I agree that it seems like it should be possible, but we've been discussing this for years and have had no proposals that actually solve the problem.  We could come up with another one, but having now used the version that's in the L1v4 draft spec, I think it actually does a good job of solving the immediate problem in a clean way.

I don't actually think that this will result in an explosion of SED-ML 'symbol' URNs.  There are a limited number of MathML vector/matrix functions, and if we give each one a SED-ML URN, we're done.  The ID for the result (the Variable id) can then be used in MathML as before, with the advantage that everything is now guaranteed to be a scalar.

You are correct that this scheme does not allow vector operations like dot products and the like.  Is there call for this?  Is anyone supporting dot products in their simulators that is crying out for a way to store this functionality in SED-ML?  If so, then the current L1v4 draft spec won't help them, and we'll need to figure out another option, or push it out to SED-ML L2.

-Lucian

Jonathan Karr

unread,
May 6, 2021, 9:57:27 PM5/6/21
to sed-ml-...@googlegroups.com
I agree the current vector MathML functions are tough. I agree with removing them (or fixing the issues).

I'm not familiar with previous discussions about matrix algebra. What's the hang up on this? This seems like a well-defined topic with clear semantics.

My point about explosion is about applying multiple reductions together. For example, if one had a multiply nested repeated task, I could imagine doing an average over an inner dimension and then calculating a variance on top of that. Since URNs aren't easily composable like mathematical expressions, a solution could be to create more URNs for combinations, which would get complicated. 

Lucian Smith

unread,
May 6, 2021, 11:28:19 PM5/6/21
to sed-ml-...@googlegroups.com
The hangups are pretty much what we've discussed here (though I could be forgetting others).  The main ones being 'how to define whether a symbol should be used as a matrix/vector/scalar' and 'how to define mean/max/etc. on particular dimensions'.

-Lucian

Jonathan Karr

unread,
May 7, 2021, 1:18:52 AM5/7/21
to sed-ml-...@googlegroups.com
I see the issues with the current MathML scheme with mixed scalars and matrices depending on the local context in which each variable. This creates expressions that break from norms established by MATLAB, NumPy, etc. This seems guaranteed to cause confusion and mistakes.

Expanding MathML into matrix computations seems like a reasonable path, although it would take some work. But, it sounds like extending MathML hasn't been well received.

I'm struggling not to arrive at this conclusion:
  • Arbitrarily computations on results are necessary for some studies
  • Such computations are beyond the scope of SED-ML due to the limitations of MathML
  • Expanding MathML seems like a fair amount of work, both for specification design and implementation
  • Given the need to do computations outside SED-ML, I'm hesitant to introduce small bits of functionality into SED-ML that won't remove the need for computations beyond SED-ML
  • Instead, I think SED-ML should focus on defining a clear interface for exporting simulation results. Other tools such as NumPy, Matlab, Scilab, R can take over beyond this. The key thing is for the interface to be clear. This will enable results to be used with various other tools.
  • This keeps SED-ML's focus on the things unique to this domain, avoids increasing the complexity of SED-ML, and opens a path to a much broader range of visualizations without forcing that complexity into SED-ML.
  • Entire projects, including SED-ML plus reductions and plotting beyond this can be organized into workflows (e.g., CWL or WDL)

Reply all
Reply to author
Forward
0 new messages