Several levels.
Level 1 (essentially Alexi's original question): How do you assess that your reference shows the ancestral situation?
Level 2: When you treat insertions and deletions as different processes, and optimise via RAxML multistate option, RAxML will optimise a rate to insert, and another one to delete. As a geneticist, I'd argue insertions and deletions cannot be taken apart. I further would argue that each part of the genome with length-polymorphism has a different probability for indels. In all sequence data, I looked at, I could observe regions attracting indels (insertions and/or deletions), and others that did not. That is trivial for regions with structural constraints (e.g. ribosomal DNAs, tRNA-genes, protein-coding in general), but can also be observed in non-coding, non-transcribed regions. And since its and/or, there is little reason to seperate inserts and deletions beyond binary abscence/presence.
Level 3: You have the same issue with treating inserts and deletions as two different states as you have with treating ambiguous base calls as additional states. If one of them is rare, it will have a greater effect than the other. Which brings us back to level 2 (and 1): are your insertions (with respect to the reference) the consequence of a different process than your deletions?
But easy to test: you code them just binary (absent/precense), run the analysis, and compare it with the analysis using ternary coding: deletion – no modification – insertion.
E.g. lets say we code the entire data for indels as binary: 0 = gap, 1 no-gap, or ternary with respect to reference: 0 = deletion, 1 = no modif., 2 = insertion
reference: GGGG - - AGAT - ATCT - CCCC -> binary: 1 - 0 - 1 - 1 - 1 -> ternary: 1 - 1 - 1 - 1 - 1 (or just only the variation: 0 - 1 / 1 - 1)
type1: GGGG - - AGAT - - CCCC -> binary: 1 - 0 - 1 - 0 - 1 -> ternary: 1 - 1 - 1 - 0 - 1 (0 - 0 / 1 - 0)
type2: GGGG - GGGG - AGAT - ATCT - CCCC -> binary: 1 - 1 - 1 - 1 - 1 -> ternary: 1 - 2 - 1 - 1 - 1 (1 - 1/ 2 - 1)
In both cases you have two distinct alignment patterns, at the same positions (2 and 4), and both show the same pattern. Only difference being that for binary your model will have one parameter less (speedier), and the topological effect (and also data-bias) may increase for ternary: RAxML notices for binary that there a two site that changed absence-presence, a certain probability (can't calculate that). For ternary RAxML notices that there's an equal chance to have a deletion OR insertion, so the probability for either one may be halved. So you get more decisiveness from ternary than binary with the risk of over-weighting (Level 3-issue above). So ternary is a double-edged sword.
Another example (inspired from a dataset, non-coding, non-transcribed, I recently worked)
reference: GGGG - AGAT- - ATCT - CCCC [a hairpin-like sequence]
type 1: GGGG - AGAT- AGAT - ATCT - CCCC [simple 5' duplication]
type 2: GGGG - AGAT- AGAT - - CCCC [a length-compensating deletion]
Using the ternary coding (only seen variation), we would code this as
ref: 1 - 1
type 1: 2 - 1
type 2: 2 - 0
But only if we realise the mutation sequence: The AGAT is duplicated, and the ATCT is lost. Binary it would be
ref: 0 - 1
type 1: 1 - 1
type 2: 1 - 0
Again, the patterns are equivalent per se, just their weighting differs.
But what is when your reference has already a modification? For ternary this would need to be taking into account
[actual ancestor: GGGG - AGAT- - CCCC -> ternary: (1 - 1 -) 1 - 1 (-1) -> binary: 0 - 0 * this is the ancestral sequence, of the all-ancestor, which you don't have in your sample]
reference: GGGG - AGAT- - ATCT - CCCC -> 1 - 2 -> 0 - 1
type 1: GGGG - AGAT- AGAT - ATCT - CCCC -> 2 - 2 -> 1 - 1
type 2: GGGG - AGAT- AGAT - - CCCC -> 2 - 1 -> 1 - 0
The pattern is still the same, but now you have no deletion, but two insertion events; this changes the matrix you code, and would effect the model you optimise for how likely it is to have an insertion (quite likely) and deletion (unlikely). Quite a difference to above, where we had equal probability for insertion and deletion.
But the binary coding will be the same (your level 1 issue).
Cheers, Guido
PS My data doesn't stop there, I have additional types, the original duplications appears inverted in some sequences of the same clade forming a pseudo-loop in the pseudo-hairpin, and deletions/additional duplications are found in both the 5' and 3' parts of the pseudo-hairpin (pseudo, cause it's a non-coding plastid spacer). By reconstructing the patterns within terminal clades, and compare them beyond clades, I have a good idea how it originally looked like, but usually indels are really simply absence/presence and rarely genetic synapomorphies (in a classic, strict Hennigian sense), and if you reference is not the actual ancestral sequence or from a known low-evolving taxon (a genetic living fossil), you have no way to decide wether it does include insertions (compared to the best reference: the LCA – last common ancestor), which you will code as deletions (or deletions you score as insertions). But as I said, you can just code both options, and see what comes up. In the best case, it doesn't matter at all. And if there a difference, you just compare the models RAxML optimised (in the RAxML_info files). If the probability for insertion vs deletion is very different, then you may have a insertion/deletion sample bias, which you can cross-check for by estimating the proportion of ref-based insertions vs. deletions. If they are about the same amount, but have different substitution probabilities, than one may really be more probable than the other.