Hello, everyone. In response to the various replies to
Karl's Twitter poll, which has really usefully drawn out some initial discussion (is the clarity of the result '
approximately 'certain', 'less-certain', or 'uncertain'? ' - Twitter says 81.3%):
- 'Imperfect logic versus false precision': 'certainty' is not 'precision'. The idea of using a real number between 0 and 1 to indicate certainty is as a measure of probability or confidence (perhaps a better term?), and is only as precise as the number of decimal points used.
- 'Uncertainty is multi-dimensional, and the degree is subjective. It is therefore discrete and not continuous!': It depends how you define your certainty. If it is derived solely from a mathematical calculation of statistical probabilities, then it can indeed be continuous.
- 'B) more flexible & future-proof, but gives false sense of accuracy and precision. A) perhaps too simple and inflexible but a lot clearer. I've often gone for 0-4 or 5 integer scale for more range but even that invokes arguments on meaning!': yes, but that false sense is only in the mind of a naive researcher, and the invocation of arguments on meaning highlights the biggest problem with verbal attributes...
- 'I would vote: only certain OR uncertain - nothing in between will have useful inter-annotator agreement or inter-user meaning.': Indeed - where is the line drawn between 'less-certain' and 'uncertain' - but how then would you discriminate between a near-certainty and a least-worst-guess?
Furthermore:
- LPF offers the facility for multiple alternative geometries, but at present no quantitative means to discriminate between them, and no means at all to discriminate between more than 2 that are less than certain.
- Some researchers (economic historians, for example), undertaking analysis of geospatial patterns, underpin their findings with numerical expressions of statistical probability, which could be enhanced by less blunt measures of
certainty within the spatial data (where such measures are calculable, of course).
- The preceding point highlights the potential usefulness of applying a 'certainty' attribute to other properties. For example (albeit unlikely), 'exactMatch' and 'closeMatch' links of the place-name 'London' to GeoNames or Wikidata might benefit from an indication of certainty that the record should be linked to the UK rather than Ontario.
I think this discussion reinforces the desirability not of enforcing simplicity, but of an optional root-level 'indexing' or 'citations' property for LPF datasets, in which a 'description' property should clarify the methodologies used for defining certainty, be they subjective, statistical, or a combination of both. Absent that, then the subjectivity of a property's certainty should be implicit in the considered choice of either verbal or numerical values - both should be allowed as alternatives.
Best wishes,
Stephen