Dear Arya, dear all,
thanks for pointing that out. In my original error table, the ?? were not meant to be recommendation but just a mark that the scheme required something that isn't there. As for *, it unfortunately comes with the naive interpretation as a wildcard, i.e., an arbitrary sequence of characters, and this can be problematic, because it means losing position information if multiple features are concatenated. So if multiple sub-features involve the same ascii character, and we cannot recover their position, we cannot tell them apart. Another placeholder that doesn't come with the sequential connotation would be better. In Perl regular expressions, such a placeholder would be ".".
In fact, a ranking-based encoding for polyvalent verbs in head-marking languages has a number of advantages: It prevents unrestricted tagset explosion (which is inevitable if a compositional scheme is used), it allows to encode different features independently rather than by concatenation (and thus, in a way that is more consistent with the annotation for languages without double agreement), and it allows to account for different language-specific rankings (e.g., either based on expected morphological case as currently recommended or based on grammatical roles as necessary for Kartwelian languages).
Moreover, if the highest-ranking argument (say, subject, if defined as such for a particular language) goes unmarked, and all others get numerical indices (say, features of direct object marked by -1, indirect object by -2, etc.), the actual annotations for this argument are actually identical to annotations we would have in a language without double agreement. This is ideal for projection experiments.
Our suggestion is downward-compatible in the sense that *not a single* Unimorph data set at the time used the ARG schema as it was been de(/pre)scribed. Very different ideas had been implemented, including some resembling our own (e.g., ARG-encoding of individual features rather than by concatenation, non-marking of top-ranking arguments). All documented in our Github, with a mapping for all languages with ARG-encoding.
Some other suggestions (fully downward-compatible, in that case) have been implemented as well, e.g., for recursive inflectional morphology as necessary for nominal morphology in Sumerian and languages from different language families in the Caucasus.
My idea was and is that the Unimorph community takes a thorough look on our fork, and if finds approval -- or, if it inspires an alternative extension --, that we merge our fork in the original repos.
All the best,
Christian