By all means, yes!!!
I noted in crown Aves matrices that character scoring is often too subjective or at least too context-sensitive. Some analyses go to great lengths to precisely describe characters and delimit scorings, but many use ambiguous/relational terminology (e.g. "large" vs "small" - what constitutes "large" or "small" is often dependent on taxon sample in a particular analysis). Thus, a character that is "relationally" scored one way in a specific taxon when compared to one particular taxon set may be scored differently when using a different set of comparison taxa. This a) limits the ability to compare different analyses[*], and b) introduces analytical error by re-applying scores within altered taxon sets and without verifying that the "ambiguous" scorings are unaffected by the change in analyzed taxa.
Your proposal is absolutely sensible and perhaps the only way to tackle this problem (or at least I cannot think of any better, FWIW): Rather than scoring characters using whatever limited comparative material is at hand, and subsequent authors having to drag these scores from one study to the next without being able to verify them, there should be a database documenting the actual material evidence in question, so that ambiguous scorings can be argued for or against based on a comparison of the next best thing after directly inspecting the actual material: specimen photos, with sufficient resolution, and documenting a wide taxon set. And if that is not available, the third-best thing are high-quality drawings; digitization projects such as BHL have provided lots of high-resolution scans of papers from the lithography era, and e.g. for material lost in World War 2 there is often no other way but to resort to historical publications. And an expert comment/discussion feature to resolve the most disputable cases (e.g. when characters are obfuscated by damage, or for ontogeny-related issues especially in nonavian theropods).
A further step would be to allow for inclusion of never-figured material. This is probably not pressing for stem dinosaurs, but for crown dinos there is a considerable amount of highly interesting material that has never been figurewd an only ever described in relative/qualitative terms (by comparison to better-documented taxa, like in a differential diagnosis) - which are often enough quite useful for cladistic analysis, if they would only be methodically collated: e.g. if a humerus is merely described as having a "stronger curvature than [comparison taxon]", this can already permit a sufficiently precise scoring (if the comparison taxon has a strongly curved humerus already).
This would ideally yield a maximally comprehensive taxon set as an open-access database, wherein all pertinent characters ever proposed are collated, and even low-hypodigm taxa are scored in a way that is defensible and reproducible. The latter are less useful for studies across a wide set of taxa, but for more restricted analyses (e.g. specific analyses of less-comprehensive clades, or of anatomical/evolutionary "modules") or for dedicated hypothesis-testing (e.g. by testing sensitivity to inclusion/exclusion of taxa with small hypodigms), etc, they are often crucial. And given the state of analytical theory (I do not expect major conceptual breakthroughs here, and the current methodological research focus is on handling "big data"sets like whole-genome anyway), and seeing the results of the current molecular supermatrices in Aves, I do not expect that the morph data, at the present state - with little cross-study validation, and quality control being essentially ad-hoc and at each author's individual discretion -, are capable of addressing questions like Ornithoscelida or the initial radiation of Neoaves in a reproducible fashion (QED). Such a database would also facilitate analyses whose taxon sets are tailored to represent a particular timespan of rapid radiation - putting roughly coeval more ancestral taxa into a framework held up by more advanced members of their putative clades. (Conventionally, outgroup choice is more important, but for rapid radiations near the base of a clade, there should probably be more focus on which more-advanced ingroup taxa to include.)
(And technically, if one wants to do an analysis that truly supposes as little as possible in advance, one would have to treat each set of articulated material - such as every individual "Archaeopteryx sensu lato" fossil - as an independent OTU. At least in "species" of disputed circumscription, a specimen-based rather than taxon-based approach for such a database is certainly warranted. Sure, little harm is done by considering all the iguanodontids from the Sainte-Barbe Clays at Bernissart to represent a single species/population/OTU - but the same cannot be said for the Daiting/Langenaltheim/Eichstätt Archies, or for the Gargantuavis hypodigm if all "referred to" material is included, etc.)
That being said: perhaps the first step could be a literature index collating the sources of published illustrations - possibly using a primarily specimen-based format right from the start, and treating taxonomic assignment as secondary, so that a specimen may have more than one taxonomic assignment (the least-inclusive clade to which it can be assigned with high confidence, plus various competing higher-level taxa - e.g. Brontornis material being covered by Neognathae, Galloanseres, and Cariamiformes).
Mickey has already done incredible work in this regard, assembling a vast literature index (see e.g.
https://www.theropoddatabase.com/Ornithothoraces.htm). But the workload for comprehensive coverage is immense, and different skillsets are required to evaluate the literature, define characters, score material, and finally assemble all this into a usable database. Thus, a crowd-sourced expert-vetted approach seems to be the way to go. I started a literature survey for (mostly) Telluraves in that regard - specifying published descriptions, published illustrations, and published scorings for each named taxon as well as for unassigned material -, but due to the aforementioned problems never progressed beyond that, and I would love to feed these data into a more comprehensive project. (I am no database expert, but I *think* it would be wise to keep the "descriptive/discussion", "literature/references", "image-file" and "scorings" databases distinct and having them interface only via the output - a Web page that may ultimately even be used to pre-generate Nexus datablocks for a selected set of OTUs.)
[*] I found it often necessary to consider specimens to even understand how authors actually *defined* their characters - what consititutes a "large" tubercle, for example, can in retrospect not usually be easily understood without taking an analysis' scorings and comparing them to actual specimens, across the entire taxon set! This *should* not be necessary, but the absence of such a comprehensive database does make it necessary, and I suspect many of the issues Gregory mentions are caused to this subjectivity ands need to double-check (which practical constraints such as time pressue or lack of access to specimens often preclude, leading to inadvertent mis-scorings).
Best regards,
Eike Wulfmeyer