Hi Sébastien,
Thanks for your interest in the Trace Query Language!
It is possible to print each and every event of a trace using the following query:
match e return (time[e], rule[e])
The two other queries you mentioned are currently not supported, although it would be easy to handle them by making minor additions to the TQL.
For those interested, I am now starting a long explanation on how this could happen and why it involves nontrivial design choices.
--------
We currently support
match e:{ K(x{u/p}) } return (time[e], rule[e])
to match events that phosphorylate the x site of a K but you cannot replace x and K by wildcards. Adding support for "name wildcards" in patterns may be an interesting addition.
The third query you provided raises a different question. First, there are several interpretations of what it may mean for an event to "involve" an agent of type K:
- Does it mean that an agent of type K gets modified?
- Does it mean that there is a path between an agent of type K and an agent that gets modified?
- Does it mean that the rule r that e is an instance of features an agent of type K?
- Even the formulation above is somewhat ambiguous. Indeed, a rule could be written without explicitly mentioning an agent of type K but require that an agent A and B are connected together in order to apply. If this connection is realized by an agent K, should we say that K is "involved"?
Second, the following query is syntactically valid:
match t: { K } return rule[t]
However, it has a different semantics from what you may expect. Indeed, the pattern between curly braces is called a "transition pattern" (see the TQL paper) and it matches any transition
t such that the mixture before
t features a K agent (not necessarily affected by the rule). However, if you run the query, you'll get the following error message:
[Fatal error] Ambiguity detected.
This is because the query engine adds an additional requirement that every agent that appears in a pattern has to be dynamically resolvable in a non-ambiguous way. This is not the case here as a mixture can feature multiple K agents. One of the (many) reasons for this requirement can be understood by looking at the following query:
match t: { k:K } return int_state[.t]{k, "x"}
You may say that k should be resolved to the kinase that is "involved" in transition t, if any. However:
- In general, a rule may involve several kinases
- More importantly, it is very useful to have the ability for a transition pattern to match agents that are not directly affected by a transition (see examples on the documentation page) and so I would not change the default semantics.
--------------
There is another reason why transition patterns have limited expressiveness. Indeed, although many people want to use the TQL to perform very simple queries that match single transitions (in which case the TQL is only a convenient replacement of a script that does a single pass through a trace file), the original main design goal of the TQL was to handle complex "trace patterns" featuring several transition patterns with shared variables. Therefore, making transition patterns too expressive quickly makes the problem of evaluating complex queries intractable. This is the reason I do not allow arbitrary predicates to appear in transition patterns.
However, it arbitrary predicates are allowed in the "when" clause of a query. As a reminder, the query:
match e when P return E
is somewhat equivalent to the (invalid) query
match w return (if P then E else nothing)
Therefore, I think that the best way to express your third query would be something like this
match e when involves(rule[e], "K") return (time[e], rule[e])
where involves is a function that takes a rule R name along with an agent type A and returns whether or not the rule R "involves" an agent of type A. Such a function does not currently exist but it could be trivially added.
---------
Ideally, it should be very easy to add custom functions to the TQL such as the involves function above. Currently, you would need to add a couple of lines to 3 different files in the TQL sources (the grammar, the AST and the interpreter). What I think would be cool would be to allow external calls to arbitrary python scripts. This way, your script would not have to handle the gory details of streaming a JSON trace file, computing matchings, capturing event and agent names... You would just have to specify the core computation to be performed on every matched pattern and the TQL would take care of the rest. I don't know if the current python interface is mature enough to allow a clean implementation of this idea.
Don't hesitate to give me feedback on how you think the TQL should evolve. Also, I am happy to help anyone willing to contribute to the TQL. :-)
Best,
Jonathan