Aloha Robert,
Just to clarify, the xAPI Spec itself is actively managed by a loose community including ADL, LRS vendors and community members across industry and academia who do take into consideration backwards compatibility and breaking changes. Where the rubber meets the road in your post is in the application of the specification across Learning Record Providers (LRPs). I think you know that, but I want to clarify for anyone else reading who may not.
It's true that right now there is still relatively little consistency among LRPs on their use of vocabulary. This is not how xAPI is designed to work, but rather this is likely just the by-product of an open specification moving through growing pains.
Ideally, LRPs would use 'Profiles' to define a common vocabulary for their applications that is versioned and can assist with backwards compatibility:
Even better.. it would be great if LRPs that represent a common group would get together and agree on a common profile. For example, it would be wonderful if videojs, mediaelement, youTube, vimeo, etc... would all come together and adopt or define a common vocabulary (such as already done in the video profile):
If that happened, then we would be seeing something much closer to the API approach. Realistically, it is unlikely these LRPs will do so until there is sufficient business need (or internal grassroots movement) to justify the time and effort to provide this common effort. In the meantime the community continues to raise awareness, teach about xAPI, and expand profile utilization while creating more tools/processes to ease adoption by LRPs.
To answer some of your specific concerns:
The vocabulary could be changed or added onto at any time such that previously unseen statements appear possibly in addition to what was previously designed for, or worse, replacing some of the previous statements
"Changing" the vocabulary is definitely a problem if the xAPI consumer is attempting to use data from multiple sources and interpret them collectively without transformation. I am not sure 'added' is really a problem so much a feature. LRPs should be able to append data to an xAPI statement that clarifies details around their learning environment/intervention as needed. So yea, if you do not define your vocabulary or use a profile with a predefined vocabulary then the probability is that data will require some sort of transformation or you may break whatever functionality you are trying to implement. The fact that profile adoption is still rare among LRPs makes it challenging to just have interoperable data 'out of the box'.
I don't think this will always be the case. For example, the video heatmap visualization my group at UH made ALWAYS checks that the statement is conforming to the video profile before attempting to generate the visualization. If it does not have the video profile URI in the categories under contextActivities then we do not even attempt to visualize the data. We can now consume data from any video profile compliant tool for our visualizations though!! This has been great for our own internal learning science research.
The underlying learning experience could be altered so we are trying to track something completely different than what was originally designed for. It is possible that the vocab and statements could remain the same so that the LRS is blissfully unaware.
I am not totally sure I understand the event you are describing here.
If you mean you want to track a learning intervention that's completely different than what the xAPI statements were originally design for... yes, I suppose you could keep the vocab and statements the same. The LRS has no idea what you are trying to do and is only checking that the statements conform to the spec and their ingest requirements. I am not sure why you would keep the statements the same if the change was that significant though. If you can't change them because your LRP all of sudden changed the xAPI statements on you then that would be unfortunate. That's really on the LRP though... not the spec.
I can give you somewhat of an example of this. When we were modifying H5P to send xAPI video profile compliant statements we got in an issue around the word 'completed'. H5P defined as 'reaching the end of the video timeline' while we defined it as 'playing X% of the video'. Their trigger was the slider hitting the end of the timeline while ours was a consistently updated % based on tracking of the segments being played through. We ultimately came to a compromise that kept both statements so people relying on the old definition could keep that event. We are stilling waiting on them to adopt all of our changes but you can see the trail here:
On another note, the spec requires that the IRI associated with the verb in the xAPI statement link to a definition of the word. You'll see many examples where 'fired' is used as an example verb that could mean discharging a gun or removing an employee. The IRIs for the verbs should define the difference. This may not always happen in practice though. I suppose the LRS vendors could force the IRI to be resolvable before storing the data, but that's not the case as far as I know.
Both of the above could happen. Often when there is a system change, there will be an accompanying vocab and/or statement shift that accompanies it.
This would ideally result in a new version of the profile.
Questions
1) Is this a viable cost-effective way to operate?
I don't know. This is certainly a common approach taken in 'Open' projects with results ranging from projects completely failing to running the majority of the worlds server infrastructure.
The other common model is what IMS Global is doing with Caliper. They fund and control a select group of people to develop profiles on behalf of various communities. This certainly has the benefit of improved control and probably consistency then the Bazaar approach. It also excludes a lot of people from putting their opinions in. Whether or not that is more cost-effective.. I have no idea.
2) Won't this be very disruptive to any longitudinal analysis of learner data?
Assuming the LRP is using a profile.. or at the very least defining/tracking the differences they introduce into xAPI statements across their software versions over time then at worst this would require some data transformation prior to analysis. It really depends on your research study though. If you are going to do a longitudinal study in a context in which you do not have control of the technology sending, collecting and consuming your xAPI data then that's something you would have to adjust your research design for. For example if you wanted to look at LMS interaction rates for students over the course of 4 years to answer some question.. then you don't know for sure that the LMS will be the same for 4 years unless you have direct control of that. It's possible midway the Principle wants the school to switch to a non-xapi LMS at you will need to change to log data (assuming that data even exists). These are the risks of longitudinal studies.
I think the community has a lot of work to do when it comes to improving profile implementation. Sadly, I don't think this shift is going to happen suddenly so expect there to be a lot of inconsistency in vocabularies for a bit. It is getting a lot better though and I think I see a light at the end of the tunnel. Some day, many of these concerns you listed will be the outlier and not the norm. I hope so at least.
Hope that helps!
- Jon