Precise definitions on data, measurement, quantity, etc.

106 views
Skip to first unread message

Frank Farance

unread,
Jun 19, 2024, 11:48:05 PMJun 19
to ontolog-forum, Frank Farance, Gillman, Daniel - BLS
Folks-

This E-mail excerpts a paper Dan Gillman <Gillman...@bls.gov> and myself Frank Farance <fr...@farance.com> are publishing and we are providing an excerpt here as it is relevant to the current E-mail discussion.  Feel free to send comments/critique to us.

Dan and I did a fair amount of research and study over a variety of terms from 1998 through 2010.  We based our work on terminology principles ISO TC 37 (Language and Terminology), ISO/JTC 1 (Information Technology) existing IT vocabularies (ISO/IEC 2382 multiple parts and other terminologies), and ISO TC 46 (Information and Documentation, including ISO 5127).  Also, I'm the Convenor of JTC 1/WG 15 (JTC 1 Vocabulary) where we are updating the approx 5000-term vocabulary, and terms like data and information and related terms (data processing, information system, etc.) are an important foundational concept systems for IT work.  I also have a long history in programming language standards (and have written lots of code), so that influences my perspective as a software engineer and systems engineer.

This E-mail largely draws upon our research, which we call the Terminological Theory of Data (Farance/Gillman).  We have used these and similar examples in our papers and presentations.  Dan and I have different styles of writing: he likes the short rejoinder, and I like citing resources.  I've chosen a longer format here because, in our assessment, there appears to be some confusion and/or inconsistency in the E-mail discussion.  Thus, this is presented linearly rather than replying to individual E-mails.

We hope this presentation makes sense to you.

We are providing a brief history because it provides rationale and thinking that led to the results.  This history is expressed in 7 parts:
  • Exploring Definitions of Metadata, Data, and Information
  • Datatype Theory and the Notion of Equality
  • Observation, Measurement, and Quantity
  • Levels of Measurement
  • Metadata Factors as a Kind of Information
  • Data vs. Information
  • Different Kinds of Information

EXPLORING DEFINITIONS OF METADATA, DATA, AND INFORMATION

Because Dan and I were on the US and ISO/IEC committees named Metadata and we were (and still are) standardizing metadata,  we felt we should have a good understanding about this term "metadata" as we wanted to use it precisely.  Circa 1998, a common definition was to use the prefix "meta-" to mean "X about X", thus:
metadata: data about data    // poor definition
That is snappy and easily rolls off the tongue, but it is imprecise for a couple of reasons.  On terminology, the standards ISO 704 (Principles of Terminology) and ISO 1087 (essentially, terminology about terminology) were good resources.  The above definition has two flaws: the first use of "data" is too broad (not all data is metadata); and the second use of "data" is too narrow (metadata can be about non-data objects, like books, e.g., Dublin Core Metadata).  A key feature of this metadata is that it needs to be descriptive; non-descriptive data (e.g., results of calculations, random numbers) by themselves are not descriptive.  A descriptive relation needs to be present between the descriptive data (the metadata) and the target of that description (the object(s)).  The definition is reformulated as:
metadata: descriptive data about an object(s)        // good definition
with object being singular or plural, as when multiple objects have common metadata - well that is of particular interest.  This turned out to be a very good definition, and we've tested it, and it is standardized, too.

But what does "data" mean?  We explored the existing definition, circa 1993/1999, from ISO/IEC 2382.  In that definition, we discovered circularity and errors:
data: reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing   // ISO/IEC 2381-1:1999 - poor!!!

information (in information processing): knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning  // ISO/IEC 2381-1:1999 - poor!!!

knowledge: maintained, processed, and interpreted information  // ISO 5127 - poor!!!
With the circularity in most every collection of definitions, Dan and I felt that it might be easier to start with defining "data" first, and we're ignoring a definition of "knowledge" for this discussion.  This worked out well as we had a good terminological basis for this: in distinguishing concept and designation, we saw similarity between number (a concept) and numeral (a designation).  Here is the ISO 1087 definition with "signifier" replacing "sign" to avoid confusion:
designation: representation of a concept by a signifier which denotes it in a domain or subject // ISO 1087
There are several kinds of designations (term, symbol, appellation, name, etc.) but we don't need to worry about that for now.  Data as a designation is necessary, but not sufficient as we haven't described the delimiting characteristics that distinguish "data" from its superordinate concept of "designation".

DATATYPE THEORY AND THE NOTION OF EQUALITY

Here is a slight detour, but it is important for deeply understanding data.

A key insight was ISO/IEC 11404 General Purpose Datatypes (GPD).  This standard, originally called Language-Independent Datatypes (LID), along with ISO/IEC 13886 Language-Independent Procedure Calling (LIPC), was developed in the early 1990s as a strategic effort to address complexity of cross language calling interfaces among programming and database languages.  For example, if there are N languages, then there are approximately N*N interfaces that need to be defined and specified.  Programmers of Fortran wanted to call Pascal code, programmers of C wanted to call Fortran code, etc..  The hope, which was partially achieved, was to understand all the datatypes and the calling frameworks and make some common statements about data and calling techniques in these two standards.  Also in the 1990s, and updated since, there is a Language-Independent Arithmetic (LIA) ISO/IEC 10967 series, which described what arithmetic functions are supposed to do in computing systems.  In the end (1996), all three standards (LID now GPD, LIPC, and LIA-*) became reference documents.  In the case of 11404, they described datatypes across programming languages, remote procedure calls, codings (e.g., XML), etc..  To my knowledge, 11404 first espoused a theory of datatypes that had not been previously expressed elsewhere.

Although not every 11404 datatype has been implemented in every language, the 11404 datatype mapping is the formal framework.  Some 11404 datatypes, such as "integer" are implemented in various degrees of precision (bit lengths of 8, 16, 32, 64) and ranges (signed vs. unsigned integers) for convenience in programming languages.

In particular, the datatype theory expressed in 11404 can be summarized with:
  • datatype: set of distinct values, characterized by properties of those values, and by operations on those values
  • value space: set of values for a given datatype
  • characterizing operations: collection of operations on, or yielding, values of the datatype that distinguish this datatype from other datatypes with identical value spaces
  • datatype properties - related to characteristics such as
    • equality: every datatype has the equality property
    • order: some value spaces are ordered, some value spaces (Rock, Paper, Scissors) are not
    • cardinality: the mathematical sense of the size of the value space - finite, denumerably infinite, non-denumerably infinite
    • boundedness: some values spaces have lower bounds, upper bounds, both, or neither
    • exactness: some value spaces can represent their values exactly (integers) and some are approximations (reals)
    • numeric: some value spaces, conceptually, are quantities
  • notion of equality - fundamental/axiomatic to data
This Notion of Equality is further defined, as excerpted from 11404:
In every value space there is a notion of equality, for which the following rules hold:
  • for any two instances (a, b) of values from the value space, either a is equal to b, denoted a=b, or a is not equal to b, denoted a≠b;
  • there is no pair of instances (a, b) of values from the value space such that both a=b and a≠b;
  • for every value a from the value space, a=a;
  • for any two instances (a, b) of values from the value space, a=b if and only if b=a;
  • for any three instances (a, b, c) of values from the value space, if a=b and b=c, then a=c.
On every datatype, the operation Equal is defined in terms of the equality property of the value space, by:
  • for any values a, b drawn from the value space, Equal(a,b) is true if a=b, and false otherwise
This Notion of Equality is axiomatic: data => a notion of equality; and if there is a notion of equality, it can be data.

With this in mind, the Terminological Theory of Data (Farance/Gillman) defines data as:
datum (singular), datums (countable plural), data (uncountable plural): designation whose concept is a value

value, value concept: concept with a defined notion of equality to that concept
The importance of this Notion of Equality is expressed in the following example:
  • Given the concepts Red (visible light in the range of wavelengths 650 to 700 nanometers) and Yellow (visible light in the range of wavelengths 525 to 570 nanometers), it is undefined whether Red equals Yellow and, hence, these concepts by themselves are not values.
  • These concepts can become values once a Notion of Equality is defined - equality can be defined within the concepts' definition or defined external to the concepts' definition.
Thus ...
  • If the Notion of Equality is defined as "has visible electromagnetic radiation?", then one can ask whether Red equals Yellow (true) and whether Red equals Infrared (false, because infrared is not visible).
  • If the Notion of Equality is defined as "has the same range of wavelengths?", then none of Red, Yellow, Cornflower, or Blue are equal to each other.
  • If the Notion of Equality is defined as "has overlapping ranges of wavelengths?", then Red does not equal Yellow (their wavelengths don't overlap) but Cornflower equals Blue (Cornflower is a slice of the Blue portion of the visible spectrum).  However, this is NOT a Notion of Equality because "overlap" is not transitive, which is required for an equivalence relation: i.e., a=b and b=c implies a=c does not work for "overlap".
  • If the Notion of Equality is defined as "has same primary colors red, green, and blue" (of which there are 8 combinations), then Red does not equal Yellow (primary colors Red-none-none vs. Red-Green-none) but Cornflower equals Blue (Cornflower is a slice of the Blue portion of the visible spectrum, i.e. both have same primary colors none-none-Blue).
OBSERVATION, MEASUREMENT, AND QUANTITY

This portion is in response to a couple E-mails in this thread.  In terminology, the following terminological concepts:
property: feature of an object
Note: A property is a kind of concept.

characteristic: abstraction of a property of an object or of a set of objects
Note 1: Characteristics are used for describing concepts.
Note 2: A characteristic is a kind of concept.
In many cases, a characteristic can be thought of as a determinable and a property is a determinant with respect to that characteristic.  The determinable-determinant framing doesn't imply much more, e.g., the following questions are NOT answered: Is it a measurement? Is it a quantity? Is it data?  Typically, a determinable-determinant pair is expressed as an attribute:
attribute: property paired with its characteristic
Note 1: A property can be a non-scalar value.
Note 2: A property can be expressed along with a system of reference, e.g., a unit of measure.
Example: For the characteristic weight (with respect to adult humans) the property for an individual human might be 80 Kg (weight), which can be expressed in the attribute: "Weight 80 Kg", "Weight: 80 Kg", "Weight = 80 Kg", etc..
Note 3: Many attributes can be framed themselves as characteristics, e.g., the attribute Color=Red (characteristic: color, property: Red), can itself be framed as the characteristic Color-Red-ness (or Red-ness) for which it might have property values { saturated, tint, white, shade, black }
Observations and measurements, in context, are typically expressed as attributes.
to observe: to notice
Note: As applied in normative documents, observations can have requirements for describing noticing, e.g., who, what, when, where, why, how, and other elements.
The following include excerpts from ISO/IEC Guide 99, "International Vocabulary of Metrology (VIM) — Basic and General Concepts and Associated Terms".
quantity: property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference
Note: A reference can be a measurement unit, a measurement procedure, a reference material, or a combination of such.

quantity dimension, dimension of a quantity, dimension: expression of the dependence of a quantity on the base quantities of a system of quantities as a product of powers of factors corresponding to the base quantities, omitting any numerical factor
Example: In the ISQ (International System of Quantities), the quantity dimension of force is denoted by dim F =  L*M*(T^–2) where F is Force, L is Length, M is Mass, and T is time, i.e., Force = Mass * Acceleration.

measurement: process of experimentally obtaining one or more quantity values that can reasonably be attributed to a quantity
Note 1: Measurement does not apply to nominal properties.
Note 2: Measurement implies comparison of quantities and includes counting of entities.
Note 3: Measurement presupposes a description of the quantity commensurate with the intended use of a measurement result, a measurement procedure, and a calibrated measuring system operating according to the specified measurement procedure, including the measurement conditions.

measurement unit, unit of measurement, unit: real scalar quantity, defined and adopted by convention, with which any other quantity of the same kind can be compared to express the ratio of the two quantities as a number
Note: Measurement units of quantities of the same quantity dimension may be designated by the same name and symbol even when the quantities are not of the same kind. For example, joule per kelvin and J/K are respectively the name and symbol of both a measurement unit of heat capacity and a measurement unit of entropy, which are generally not considered to be quantities of the same kind. However, in some cases special measurement unit names are restricted to be used with quantities of a specific kind only. For example, the measurement unit "second to the power minus one" (1/s) is called hertz (Hz) when used for frequencies and becquerel (Bq) when used for activities of radionuclides.

measurand: quantity intended to be measured
Note: The specification of a measurand requires knowledge of the kind of quantity, description of the state of the phenomenon, body, or substance carrying the quantity, including any relevant component, and the chemical entities involved.
The measurand-measurement framing is a kind of determinable-determinant relation.

Thus:
  • observations can occur
  • in some cases, normative documents (standards, specifications) prescribe the kinds of details to be noticed (in the observations)
  • typically, determinable-determinant framing of characteristic-property pairs are prescribed for observations
  • some observations with a defined Notion of Equality can be data
  • measurand-measurement framing is a kind of determinable-determinant
  • some of those observations can be measurements (those having a property value with a magnitude function)
  • all measurements have a defined Notion of Equality and, thus, have a nominal datatype, but there might be a functional datatype that is incomplete, e.g., a spectrometer can measure wavelengths, but the Notion of Equality is not yet established (see Color example above) and, thus, the datatype is incomplete.  Note: In practice, a nominal datatype (characterstring, octetstring, bitstrong) is typically used to copy or construct a datum.
  • some of those measurements have a unit of measurement
  • some of those measurements are quantities with a dimension
  • some of those quantities use quantity dimensions within a system of quantities
  • some of those system of quantities (e.g., SI units) are well established and standardized
In summary, the terms observation, measurement, and data cannot be used interchangeably.  Also, see ratio data below, i.e., quantity dimensions must use a unit of measurement that has a "true zero".

LEVELS OF MEASUREMENT

A common system used in statistics classifies these kinds of observations-measurements-data into several categories (which have their own 11404 mapping).  The Wikipedia article on "Level of Measurement" at "https://en.wikipedia.org/wiki/Level_of_measurement" explains this and other Level-of-Measurement systems.

This system (Steven's Typology) involves categorizing data into four kinds:
  • nominal data: Classifying data that only has the 11404 Equal property, e.g., Rock-Paper-Scissor, ISO 2-letter country codes.  Thus, Equal() is the only 11404 property-based characterizing operation.
  • ordinal data: Data that has the 11404 Order and Equal properties, e.g., Cold-Cool-Warm-Hot.  Thus, Equal() and InOrder() are the only 11404 property-based characterizing operations.
  • interval data: Data that has the 11404 Order, Equal, and Numeric properties but only includes addition, subtraction, and scalar multiplication.  For example, temperature in C or F are interval data because one can only subtract two values: 30 C (a coordinate point) minus 20 C (a coordinate point) results in 10 C (an interval).  One cannot add the coordinate points in this system.  One can multiply an interval, e.g., 2 * 10 C (interval) = 20 C (a new interval); and one can add an interval (20 C the interval) to a coordinate point (30 C) to produce a new coordinate point (50 C).  In the 11404 system "temperature_point" and "temperature_interval" would be two datatypes whose characterizing operations intermixed the use of both.  Neither the Celsius and Fahrenheit scales have a "True Zero" (where zero actual means zero quantity of something), thus they are interval data.  Note: The terms "interval data" and "data interval" are different concepts.
  • ratio data: Data that has the 11404 Order, Equal, and Numeric properties and can include addition, subtraction, multiplication, and division.  This kind of data has a "True Zero", i.e., zero of this kind of quantity means there is none of this kind of quantity.
Two of these categories are defined in the VIM and referenced by some of the definitions (e.g., a measurement cannot be a nominal property):
nominal property: property of a phenomenon, body, or substance, where the property has no magnitude
Example 1: Sex of a human being.
Example 2: Colour of a paint sample.
Example 3: Colour of a spot test in chemistry.
Example 4: ISO two-letter country code.
Example 5: Sequence of amino acids in a polypeptide.
Note: A nominal property has a value, which can be expressed in words, by alphanumerical codes, or by other means.

ordinal quantity: quantity, defined by a conventional measurement procedure, for which a total ordering relation can be established, according to magnitude, with other quantities of the same kind, but for which no algebraic operations among those quantities exist
Example 1: Rockwell C hardness.
Example 2: Octane number for petroleum fuel.
Example 3: Earthquake strength on the Richter scale.
Example 4: Subjective level of abdominal pain on a scale from zero to five.
Note: Ordinal quantities can enter into empirical relations only and have neither measurement units nor quantity dimensions. Differences and ratios of ordinal quantities have no physical meaning.
These levels of measurement can be important because it can provide semantic constraints on the kinds of operations one can perform on data (e.g., can't add or multiply two temperatures on the C and F temperature scales) and it requires more details on defining datatypes that use this kind of data, e.g., intervals and coordinate points.

Stepping back to high school chemistry, physics, and math classes ... we all learned that "you can't multiply dollars times dollars (currency times currency)" and "you can't add seconds (time) to meters (length)" - even though calculators will let you do this.  The 11404 datatype theory provides a framework why these kinds of calculations make no sense: there are no characterizing operations in the definitions that permit such operations with such operands.

METADATA FACTORS AS A KIND OF INFORMATION

With the above as a preface, let's look at some measurements and the kinds of metadata they might imply.  Let's say we use a thermometer to measure the temperature outside.  In data collection (whether physical, perceived, conceived, ... whatever), there is metadata associated with the collection.  Metadata can be embedded (HTML, XML), attached (a pointer, reference, URI, URL), adjacent (another column in the same row in SQL or Excel), or implied (described in a document or data exchange specification).  In this example, I've shown all the known metadata, but the scientific instruments might log only a portion of this (timestamp, location), some of it is known (instrument type), some of it is implied (accuracy and precision are written in the operating guides for the activity).
metadata factor: metadata, as an attribute, that is an element of more comprehensive metadata
Note: The relationship between the metadata factor and the more comprehensive metadata is a part-whole relation.
Here the number 17 is recorded as a datum, and nominally its concept is seventeen-ness (the number after 16).  But there is more than this mere concept in the data recording.  You can read the diagram below as specializing this concept of "17" where an arrow points away from its specialization (similar to UML).

 

Note: The above framing, as metadata, might be presented as columns in the same row of a table, embedded elsewhere, referenced elsewhere, or implied in documents and practice (described elsewhere).

In spoken words, the above might be expressed:
"there is the concept of 17 ...
and it's a special 17 in that it's a temperature,
and it's a temperature that's measured in degrees Celsius,
and it's a 17 that was measured on 2007-07-09 at 10:00 UTC at Roosevelt Island, New York City,
and it's a 17 that was measured with a precision of 0.1,
and it's a 17 that was measured with accuracy 0.2,
and it's a 17 that was measured using instrument XYZ,
and it's a 17 that was measured using an instantaneous technique"

Here, attributes-as-characteristics provide context around the 17 to given it additional meaning such as:
  • it's a temperature, e.g., can't add/subtract this with a length
  • it has accuracy of 0.2, which can guide the user/computer on which kinds of data it can be aggregated with (sum, average, etc.), e.g., mixing an accuracy 0.2 datum with a dataset whose elements are all accuracy of accuracy 0.01 poisons the dataset aggregation
  • it was recorded at Roosevelt Island, New York City and there might be other datums of interest at that location
  • it was recorded using the instantaneous technique, which produces data that is incompatible with other measurement techniques
  • and so on
This collection of metadata associated with this datum can be framed as a concept system:
concept system: set of concepts structured in one or more related domains according to the concept relations among its concepts
In other words, surrounding this original, unadorned concept of 17 that was recorded in this measurement event,  exists this concept system giving that 17 additional meaning beyond the mere number 17.

In our work on the ISO/IEC 11179 Metadata Registry and related standards, it is this kind of framing that can help one better describe data, especially considering that these kinds of approaches can be automated at various points in data processing and information processing.

Summary: The metadata above provides information about this event.  Some of that information is reified as data (reify = turned into data).

Here is our definition of information, which is defined in the field of information science (and information technology) and not in information theory (a different kind of "information"):
<information science> information: given context of an object, such as a concept system, that gives it meaning or more meaning
Note 1: Defined context is concepts, relations, and concept systems around the object, but (in the case of data) not the signifier itself.
Note 2: Information is not data, but the defined context surrounding the datum.
Each datum already has meaning: its value (concept).  Additional meaning might be provided, such as:
  • revealing one or more concept systems that datums belong to (e.g., relations and relationships among data)
  • describing the circumstances of the act of designation (e.g., who-what-when-where-why-how the data was created or changed)
  • providing mappings to/from the designations, their signifiers, and/or their values (concepts)
  • other methods are possible for giving additional meaning
This "additional meaning" can exist without an actual datum (i.e., presence of a signifier), such as additional meaning from concepts and relations (concept systems) of related datums.

In the diagram above about the datum "17", the factoring reflects a kind of concept system for many measurements.  All the concepts (attributes), including the value concept of the number 17 but excluding the signifier (a string of two characters "1" and "7"), are the information for this datum "17".  This information can be discussed - and processed automatically - without a particular signifier, i.e., talking about information, not the datum.

DATA VS. INFORMATION

Dan and I felt that much of the current literature waffled on the distinction between data and information.  Surely it can be confusing and many people use them interchangeably, but there are distinctions.  An important distinction is that a datum is a kind of designation, so there must be a signifier present - a signifier is a kind of perceivable object.  In a discussion, if there are only concepts and relations then most likely one is talking about information.

To polish a point, if Accuracy = 0.2 is an attribute (pairing a property with its characteristic), and an attribute itself is a characteristic, and both characteristics and properties are kinds of concepts, then in the Accuracy attribute, the 0.2 (property) is a kind of concept, i.e., it refers to the number 0.2, not the numeral 0.2, and thus all this discussion involves concepts (conceivable objects) and no perceivable objects (e.g., the signifier of a datum).

Sometimes people express this as "levels of data" of information and data, e.g., we can talk about some data, and its information, but there is data at a lower level, like the magnetic flux changes on hard drives.   We have devised the following guidance to refine the difference between data and information:
It is possible to provide successive contexts, each revealing more information, the result of each iteration (information) can itself be considered data for the next iteration of revelation (above), which produce "layers" of information and data.  The reverse process is possible, too: each layer of information is stripped of some context that produces data; then the data itself is treated as information and a second iteration of context is stripped from that information to produce data (below).  This guidance should not be interpreted to suggest that there is a rigid set of information and data layers, or that these layers are standardized, or that they are the same from project to project.  This kind of revelation/stripping should be seen from a local viewpoint.

Example: Datums can be recorded in a file with signifiers (binary or character codes) with associated information.  Meanwhile, at a lower level such as disk storage encoding, datums might be the positive/negative magnetic fields, and the information of record gaps and sector data might be the bits/octets of the above layer.

Because of the successive nature of revelation (revealing concept systems) or stripping (removing concept systems), it may appear that terms data and information can be used interchangeably, but this is incorrect.  Data is characterized by the signifiers, their associations with concepts (+ relations = concept systems), and their notions of equality.  Information is characterized by referencing the context(s) overlaid upon the data.

Both data and information might be present.  Of course, outside of data processing, such as welding, there might only be information in "The oxygen hose threads are usually right-handed, whereas acetylene and fuel gas hose threads are left-handed", which can be expressed as a concept system - see John Sowa's Conceptual Graphs as an example of that technique.

Likewise, because context can always be revealed/added or stripped, and concepts and relationships can be reified into data ...

It is impossible to say that something is purely data (but no corresponding information) or purely information (but no corresponding data).
When speaking about data and information, both can overlap.  It is appropriate to call "data" (or "datum" or "datums") anything that involves the recording of a perceivable object (the signifier of the datum).  For example, one can refer to "data" in:
  • datasets, databases
  • data in the computer's memory or storage
  • data traveling over computer networks
  • data on a computer screen
  • data on a presentation
These are all acceptable uses of "data".  Is there "data" in one's mind and thinking?  That is questionable, we'd need to establish a Notion of Equality - it might be better to talk about "information" in one's mind.  But no worry, as you can always reify the information into data.

Regarding speaking about "information" (of information science), this is very broad and we talk about information regularly and mostly without talking about data, examples include:
  • My friend is getting information to sign up for the volunteer activity
  • Did you read the safety information before using the welding equipment?
  • The evidence showed phone call logs - can we glean any information from those records?
  • X: This file looks like garbage, everything is random data there is no information here.  Y: Maybe it is encrypted, with the password, you might find information.  [Note: Here data is probably meant with respect to a nominal datatype of characterstring, octetstring, or bitstring.]
  • My learning preference is kinesthetic, I learn hands-on, but my supervisor wants me to share this know-how with my co-worker in PowerPoint form - what information should I include?
The above definition of "information" is broad, it can apply outside of data and uses of data.

DIFFERENT KINDS OF INFORMATION

The above presentation speaks about data, metadata, and information (from an information science perspective), and these are practical and consistent insights in processing data and understanding data (data semantics).  Dan and I spent much time researching and understanding information, including: What Is the Difference between this Information and Shannon's Information (of Information Theory)?  The answer to this question is, possibly, similar to why ontology in information science is different than ontology in philosophy - they have overlapping concepts but different grounding.  For example the information science "information" discussed above is largely about concepts, relations, and information processing systems (which involve data processing systems).

The Shannon "information" is about solving communication problems, which was researched at Bell Labs.  These communication problems were broad: addressing noise, signalling, errors, etc. in telegraphs, voice transmissions over copper, radio transmissions, television, and other related areas.  In Shannon's 1948 paper "A Mathematical Theory of Communication", he says:
[...] In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message and due to the nature of the final destination of the information.

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. [...] These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.

[...] If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely. As was pointed out by Hartley the most natural choice is the logarithmic function [...] we will in all cases use an essentially logarithmic measure.

[...] This observer notes the errors in the recovered message and transmits data to the receiving point over a "correction channel" to enable the receiver to correct the errors. [emphasis and breakpoints added]
Some have reported that Shannon believes that "information" has no meaning.  This is not true.  Shannon is aware that the information (a message) has meaning outside of the communication channel, but this is not the engineering problem Shannon was focused upon.  Shannon uses "information" in at least three senses.  The first sense of "information" concerns "one ... from a set of possible messages", which results from a sending party (source) transmits a message into the communications channel for which the receiving party (destination) receives the message.  The second sense of "information" concerns the communication channel for all possible messages, i.e., the engineering problem Shannon is addressing.  The third sense of "information" concerns the messages as a whole: if there are 8 possible messages to send, then we are concerned with 3 bits of information (3 = log2(8)).   This is similar to Hartley "Transmission of Information" (1928, also from Bell Labs Technical Journal), which is measured similarly in bits.  Nyquist "Certain Factors Affecting Telegraph Speed" (also 1928, also from Bell Labs) used the word "intelligence" for the word "information":
"A formula will first be derived by means of which the speed of transmitting intelligence, using codes employing different numbers of current values, can be compared for a given line speed, i.e., rate of sending of signal elements. Using this formula, it will then be shown that if the line speed can be kept constant and the number of current values increased, the rate of transmission of intelligence can be materially increased."
Shannon also uses "data" in the same paper about information: data is distinguished from information, and Shannon uses "data" consistent to the way "data" is defined here.

Although there are three senses of "information" in Shannon 1948, they are used consistently in the paper: it refers to the number of bits in a message, or the number of bits/second in the communication of the message.  This is an entropy-based approach in that the random message of 17 octets requires 136 bits to transmit, whereas a message of 17 letter-Qs (depending upon the entropy of the source) might require only 1 bit.

Using System Architecture/Engineering Techniques to Situate Both Perspectives ...

In comparing these two approaches - information theory (entropy-based) vs. information science (concept system-based) - these can easily be explained and accepted by framing the roles of the entities involved and how one perspective is the flip side of the coin of the other.
  • Shannon worked for Bell Labs (of AT&T - American Telephone and Telegraph) which was providing communication services (telegraph, telephone, television, etc.) where its transmission media (channels) is the main element of its business (a primary design priority) and the endpoints (customers) are the secondary element of its business (not concerned about the meaning of messages, but the capacity of that media and using in maximally).  That's how one might frame this from a system architecture perspective and layering.
  • For the information user, the main focus is the creating, processing, storing, retrieving, and presenting in a way that preserves meaning, while a secondary focus is communicating those messages.  From a system architecture perspective and layering, this is orthogonal and complements Shannon.
  • Or said differently, the pipe is concerned with the distinction among signifiers and channel capacity whereas the user is concerned with concepts and meaning.
The dual perspective doesn't invalidate either, they are different perspectives that are compatible ... they are different perspectives, which requires different concepts and different definitions.

Thus, the information science perspective of "information" is complementary to the information theory perspective of "information", and both are necessary.

Happy to hear critique and comments.

Frank Farance
m: +1 917 751 2900


Alex Shkotin

unread,
Jun 20, 2024, 6:36:28 AMJun 20
to ontolo...@googlegroups.com, Frank Farance, Gillman, Daniel - BLS

Hi Frank and Dan!


Nice history and interesting ideas!

And where is  TToD itself?

It is great you have theory, but it's hard to discuss just citations.

For example your two definitions are strange for me:

"datum (singular), datums (countable plural), data (uncountable plural): designation whose concept is a value."

"value, value concept: concept with a defined notion of equality to that concept."

But maybe I just don't know your axioms for "designation" and "concept". 

It is very possible that data is a prime (aka primitive) term when it has axioms but not definition.

Let's consider Hilbert's axioms - Wikipedia as the example of theory for such a rich domain as Geometry.

Is there a chance to read "Terminological Theory of Data" itself?

And as the experts could you please recommend the ISO standard for data definition?


It is absolutely great that you have a theory! I am looking forward to reading it.


My five pences: It is much-much simpler if we talk about data of one or another formal language. Then we have a kind of literals, constructors, and variables to keep them. I mean the particular rigorous definition of data situated in a particular language. In this case we have very special but regular formalization for structures and theory about these structures. The point is that if we have theory for data in for example Pascal, the guys who are using this language need not have any other theory.

Maybe we just have a lot of theories, not a Unified one?


Alex



чт, 20 июн. 2024 г. в 06:48, Frank Farance <fr...@farance.com>:
--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/b9d3d9a8-9224-49f7-b52a-1ddf99f27d3d%40farance.com.

Gillman, Daniel - BLS

unread,
Jun 20, 2024, 9:53:20 AMJun 20
to Alex Shkotin, ontolo...@googlegroups.com, Frank Farance

Alex,

 

To understand definitions, designations, and concepts from our perspective, start with ISO 704 – Principles and Methods of Terminology (link to 2000 edition of the standard - https://edisciplinas.usp.br/pluginfile.php/312607/mod_resource/content/1/ISO%20704.pdf).

 

Our work and approach were to fill in gaps that ISO/TC 37 decided not to include in ISO 704, and we built a framework for tying all the terminological principles together in a description of data.

 

As Frank described, to move from designations in general to those specific to datums, we need ideas from ISO/IEC 11404 (General Purpose Datatypes). This standard is published freely by ISO and can be found at https://standards.iso.org/ittf/PubliclyAvailableStandards/index.html. Scroll down to link to the standard.

 

We look forward to hearing from you.

 

Yours,

Dan

 

 

Dan Gillman

Information Scientist

Office of Survey Methods Research

US Bureau of Labor Statistics

Washington, DC 20212 USA

Work +1.202.691.7523

Cell    +1.410.624.9582

Email Gillman...@BLS.Gov

--------------------------------------------

“Whatever it is, I’m against it.

No matter what it is or who commenced it,

I’m against it!”

   ~Prof Quincy Adams Wagstaff

 

 

 

 

 

From: Alex Shkotin <alex.s...@gmail.com>
Sent: Thursday, June 20, 2024 6:36 AM
To: ontolo...@googlegroups.com
Cc: Frank Farance <fr...@farance.com>; Gillman, Daniel - BLS <Gillman...@bls.gov>
Subject: Re: [ontolog-forum] Precise definitions on data, measurement, quantity, etc.

 

CAUTION: This email originated from outside of BLS. DO NOT click (select) links or open attachments unless you recognize the sender and know the content is safe. Please report suspicious emails through the “Phish Alert Report” button on your email toolbar.

Susanne Vejdemo

unread,
Jun 20, 2024, 9:59:55 AMJun 20
to ontolo...@googlegroups.com
Hi Alex,
This is a great guide to writing definition of classes (unsurprisingly:)). I have however had problems finding recommendations for how to write definitions (descriptions) of _properties_ - the more detailed the better, so it can be a shared understanding for a group of ontologists. 

Do you, or anyone else, have any recommendations for that?

Best,
Susanne 


Gillman, Daniel - BLS

unread,
Jun 20, 2024, 10:34:50 AMJun 20
to Alex Shkotin, ontolo...@googlegroups.com, Frank Farance

Alex,

 

See also the Appendices in the document for a metadata standard I helped develop. This standard from the Data Documentation Initiative, DDI-CDI (Cross Domain Integration, subsumes our definition of a datum among many other innovations.

 

See https://bitbucket.org/ddi-alliance/ddi-cdi/src/master/source/high-level-documentation/DDI-CDI_Model_Specification.docx. Click “view raw”. The Appendices contain an earlier version of the TToD. We have expanded it, Nothing there is superseded.

 

Yours,

Dan

 

From: Alex Shkotin <alex.s...@gmail.com>
Sent: Thursday, June 20, 2024 6:36 AM
To: ontolo...@googlegroups.com
Cc: Frank Farance <fr...@farance.com>; Gillman, Daniel - BLS <Gillman...@bls.gov>
Subject: Re: [ontolog-forum] Precise definitions on data, measurement, quantity, etc.

 

CAUTION: This email originated from outside of BLS. DO NOT click (select) links or open attachments unless you recognize the sender and know the content is safe. Please report suspicious emails through the “Phish Alert Report” button on your email toolbar.

Hi Frank and Dan!

Frank Farance

unread,
Jun 20, 2024, 12:37:43 PMJun 20
to ontolo...@googlegroups.com
On 2024-06-20 09:59, Susanne Vejdemo wrote:
Hi Alex,
This is a great guide to writing definition of classes (unsurprisingly:)). I have however had problems finding recommendations for how to write definitions (descriptions) of _properties_ - the more detailed the better, so it can be a shared understanding for a group of ontologists. 

Do you, or anyone else, have any recommendations for that?

Best,
Susanne

Susanne-

In the link below, please find the ISO/IEC JTC 1 SD 20 (JTC 1 = Information Technology, SD 20 = Standing Document 20), which is the Best Practices Guide for Information Technology Vocabulary.  This was written circa 2012, and it is a very short summary of ISO 704 and ISO 1087 - essentially, a Terminology 101 for Standards Developers.  A key element of this document uses the discussion of "Is Pluto A Planet?" that is, at its core, a terminological problem.  The document covers the basics of object, concept, property, characteristic, extension, intension, definition, extensional definition, intentional definition, principal of substitution, and iterative development of a definition.

https://drive.google.com/file/d/1TZXqAcUdGUtLOzVlX-phjJsoAbR9U93n/view?usp=sharing
 
I caution that the terminological principles develop good terminology and concept systems - from the perspective of terminology - but might be incomplete with respect to other kinds of definitions.  For example, let's take data definitions - which are widely used in data exchange specifications and standards.  Have a good terminological definition for a data element is incomplete as a data definition.  Using ISO/IEC 11179 (Metadata Registries) standard can help write better (more interoperable) data definitions as it breaks the data definition into
  • Data Element (top-level description)
  • Data Element Concept (describes the measurand, e.g., measuring height of humans) - the particular measurands make use of terminological definitions, as measurand is a kind of determinable, which is a kind of characteristic, which is a kind of concept
  • Conceptual Domain (concepts that define each element of the value space) - the concept definitions make use of terminological definitions
  • Value Domain (describes the datatype and symbology for each of these permissible values)
As you can see, for data definitions, it's important to know terminological methods, but the terminology-only approach is only part of the picture.

For ontologies, per ISO/IEC 21818-1 Top Level Ontologies (TLO), here are some terms for talking about ontologies:
entity, object: item that is perceivable or conceivable
Note: The terms "entity" and "object" are catch-all terms analogous to "something". In terminology circles "object" is commonly used in this way. In ontology circles, "entity" and "thing" are commonly used. See B.3.3. [SOURCE: ISO 1087-1:2000]

class: general entity
Note 1: In some ontology communities, all general entities are referred to as classes. In other ontology communities, a distinction is drawn between classes as the extensions of general entities (for example, as sets of instances) and the general entities themselves, sometimes referred to as "types", "kinds", or "universals". The expression "class or type" is used in this document in order to remain neutral regarding these different usages.

particular: individual entity
Note 1: In contrast to classes or types, particulars are not exemplified or instantiated by further entities.

relation: way in which entities are related
Note 1: Relations can hold between particulars (this leg is part of this lion); or between classes or types (mammal is a subclass of organism); or between particulars and classes or types (this lion is an instance of mammal). On some views, identity is treated as a relation connecting one entity to itself.
Note 2: On the difference between "relation" and "relational expression" see 3.6, Note 1 to entry.
Note 3: "Relation" is a primitive term. See 4.1.1, NOTE 1.

expression: word or group of words or corresponding symbols that can be used in making an assertion
Note 1: Expressions are divided into natural language expressions and expressions in a formal language.

relational expression: expression used to assert that a relation obtains
Example: "is a" (also known as "subtype" or "subclass"), "part of", "member of", "instantiates" "later than", "brother of", "temperature of".
Note 1: The term "relational expression" is introduced in order to remove any confusion that can arise if a person uses "relation" to refer to the real-world link or bond between entities (as in 3.4), while another person uses "relation" to refer to the linguistic representation of this real-world link or bond.
Note 2: In OWL 2, relational expressions are referred to as Properties. "Expression" is used to connote logical composition: a Class Name in OWL 2 is logically simple, a Class Expression is logically complex. In FOL, "n-ary predicate" is often used as a synonym of "relational expression".

term: expression that refers to some class or to some particular
Note 1: An ontology will typically contain a unique "preferred term" for the entities within its coverage domain. Preferred terms may then be supplemented with other terms recognized by the ontology as synonyms of the preferred terms.

definition: concise statement of the meaning of an expression
Note 1: For the purposes of this document, definitions can be of two sorts: (1) those formulated using a natural language such as English, supplemented where necessary by technical terms or codes used in some specialist domain; (2) those formulated using a computer-interpretable language such as OWL 2 or CL.

axiom: statement that is taken to be true, to serve as a premise for further reasoning
Note 1: Axioms may be formulated as natural language sentences or as formulae in a formal language. In the OWL community, "Axiom" is used to refer to statements that say what is true in the domain that are "basic" in the sense that they are not inferred from other statements.
Here is some of the 21818-1 TLO guidance on defining ontology terms:
A TLO shall include a textual artefact represented by a natural language document providing: (1) a list of domain-neutral terms and relational expressions, incorporating identification of primitive terms, and (2) definitions of the meanings of the terms and relational expressions listed. Natural-language definitions may incorporate semi-formal elements if these are needed for readability.

Note 1: In the case of primitive terms, definitions can take the form of elucidations of meaning supplemented by examples of use.

Example:  An example of a definition with semi-formal elements is:
transitivity =def. relation R is transitive if whenever a stands in R to b and b stands in R to c it follows that a stands in R to c.
Given the nature of a TLO, a portion of its terms and relational expressions will be so basic in their meaning that there will be no logically simpler, and thus more easily intelligible, expressions on the basis of which they can be defined in a non-circular way. Ontology terms and relational expressions for which this is the case are called "primitives", and they have definitions in the sense of 3.8, but these are circular or are mere paraphrases.

A TLO shall specify which of its terms and relational expressions are primitive in this sense. For all other terms and relational expressions in the TLO, definitions shall be provided which satisfy the conditions that:
a) they are non-circular;
b) they form a consistent set;
c) they are concise.
Note 2: Concise signifies that the definition contains no redundant elements (for example, lists of examples, explanations of usage, and so on).

These requirements apply both to the natural language definitions and also to the definitions provided in the OWL 2 and CL axiomatizations referenced in 4.2 and 4.3.

Non-circularity excludes not only immediate circularity (where the defined term or a term with equivalent meaning is used in the definition) but also mediated circularity (for example, where a term is used in the definition of a second term, which is itself used in the definition of the first term). To ensure non-circularity it is recommended that definitions are formulated as statements of singly necessary and jointly sufficient conditions for the correct application of the defined term.

Example: Triangle = def. closed figure that lies in a plane and consists of exactly three straight lines.

Consistency of the collection of natural language definitions is shown through the development of an axiomatization that is proven consistent, as described in 4.2 and 4.3.

Note 3: Consistency, non-circularity and conciseness of definitions are features that distinguish ontologies from traditional dictionaries and other lexical resources.
As you can see, terminological definitions are important but incomplete as there is additional work necessary for an ontology term.  In the standards world, one might say the these terminological principles of ISO 704 and ISO 1087 are applied (extended and amplified) in data definitions and ontology definitions.

Alex Shkotin

unread,
Jun 20, 2024, 12:39:57 PMJun 20
to ontolo...@googlegroups.com
Hi Susanne,

Let me write shortly but quickly. I am not sure there is a general way to write definitions. It depends on the science or technology we are in.
So you have some particular term and need definition. Which one?

Formally speaking class is an unary predicate and property is just a binary one. An attribute is usually an unary function.
Usually we have an informal definition in the community. Or did you invent new property?
There is an article by Barry Smith and Selja Seppala about definitions in formal ontologies. I can find it tomorrow.

Alex

чт, 20 июн. 2024 г. в 16:59, Susanne Vejdemo <sus...@vejdemo.se>:

Susanne Vejdemo

unread,
Jun 20, 2024, 1:33:44 PMJun 20
to ontolo...@googlegroups.com
Frank,
Thank you for responding to my question about extant guides for writing good property definitions. I did not find a lot of that in the "terminology 101" link you shared, but thanks for reaching out. For now it looks as if guidance specifically on property definitions remain illusive, as one can also infer from Alex' (thank you!) answer. 

I will probably end up writing something myself - if so, I'll share it with the community.

I apologize for clearly writing a question that confused you, since the rest of your email  looks as if you are giving me an unsolicited intro to basic terminology and ontology work(?) I'll try to be more precise next time I write to this list.

Happy midsummer everyone!
Susanne



--
Dr. Susanne Vejdemo


--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.

Burkett, William [USA]

unread,
Jun 20, 2024, 1:43:28 PMJun 20
to ontolo...@googlegroups.com

Fwiw, Suzanne, my short heuristic (“rule of thumb”) for writing good definitions is:

 

Given a definition D for term T and an arbitrary object X, is D clear, complete, and unambiguous enough for any average reader of English to ascertain with a very high degree of accuracy and confidence whether or not X is a T?

 

I often recommend Hurley’s “Rules for Lexical Definitions” as well: https://www.scribd.com/document/633116844/Criteria-for-Lexical-Definitions

 

There is subjectivity involved in all of these, but I’m not sure if it’s possible to get away from that.

 

Bill Burkett

Chris Mungall

unread,
Jun 20, 2024, 2:33:01 PMJun 20
to ontolo...@googlegroups.com
I think this is the best guide for ontologies:

Guidelines for writing definitions in ontologies
Selja Seppälä, Alan Ruttenberg & Barry Smith

I wrote an informal summary of it here:


(Hosted wordpress now has too many spammy ads, apologies for this, am looking to migrate this to gh pages)

We have been experimenting with using LLMs to validate ontology definitions, incorporating the OBO and Seppälä guidelines as prompt:




Alex Shkotin

unread,
Jun 21, 2024, 3:50:03 AMJun 21
to ontolo...@googlegroups.com
Chris,

Thank you. This is what I mentioned to Susanne :-)
With great additions.

Alex

чт, 20 июн. 2024 г. в 21:33, Chris Mungall <cjmu...@lbl.gov>:

Alex Shkotin

unread,
Jun 21, 2024, 5:55:39 AMJun 21
to ontolo...@googlegroups.com

Bill,


Your recommendation is paywalled. And I remember some formal requirements to write definitions.

First of all any definition begins from the sentence "Let" in which we describe parameters of definition. 

For example "Let v be a vertex, e an edge."

Then we have one may be a complex sentence which begins with a new term introduction.

For example "v is the endpoint of e if and only if".

The connector "if and only if" is not the only possible. There are others.

After the connector we have a Sentence which may be true or false on parameter values.

For example "e is incident with v".

So, concentrating theoretical knowledge in the theory framework [1], we keep every definition in separate block like this


rus

Пусть v - вершина, e - ребро. v есть концевая вершина ребра e если и только если e инцидентно v.

eng

Let v be a vertex, e an edge. v is the endpoint of e if and only if e is incident with v.

yfl

declaration enp func(TV vertex edge) (v e) ≝ (e inc v).

Where it's available in different languages ready to be formalized.

There are some requirements for the Sentence:

-It must be falsifiable,

-It must be truifiable,

-It should be useful - there is relaxation - interesting, and even more relaxing - exiting :-)


Any definition is a unit of a theory. Any theory is about some usually moving and interacting entities.

In all processes with all entities and during all interaction on Earth's surface and plus/minus 10 km from it, nuclei are not changeable.

Nuclear reactors and equipment for nuclei synthesis are exceptions.

There is a nice authoritative diagram for nuclei we know about experimentally https://www.nndc.bnl.gov/ensdf/ 


Alex


[1] https://www.researchgate.net/publication/374265191_Theory_framework_-_knowledge_hub_message_1


чт, 20 июн. 2024 г. в 20:43, 'Burkett, William [USA]' via ontolog-forum <ontolo...@googlegroups.com>:

Alex Shkotin

unread,
Jun 21, 2024, 7:10:00 AMJun 21
to Gillman, Daniel - BLS, ontolo...@googlegroups.com, Frank Farance

Dan,


Thanks. I got it. It will take time to study. And it's great "The DDI-Cross Domain Integration (DDI - CDI) specification provides a model for working with a wide variety of research data across many scientific and policy domains."

It looks to me as a unification of terminology about scientific and technological data and data processing. 


Very interesting!


Alex



чт, 20 июн. 2024 г. в 17:34, Gillman, Daniel - BLS <Gillman...@bls.gov>:

Luan Fonseca Garcia

unread,
Jun 21, 2024, 9:30:19 AMJun 21
to ontolog-forum
Regarding definitions and LLMs, we have been experimenting on two fronts for classifying domain terms under top-level ontologies:
-  Using ChatGPT prompt

- Using transformer-based LLMs and fine-tuning with terms and informal definitions

Both approaches look promising and we achieved good results but yet far from ideal.

Regards,
Luan

John F Sowa

unread,
Jun 21, 2024, 6:48:09 PMJun 21
to ontolo...@googlegroups.com, ontolog...@googlegroups.com, CG
Folks,

Many subscribers to Ontolog Forum, including me, are or have been working or collaborating with various standards organizations --  ISO, OMG, IEEE, etc.   And many others have important ideas, requirements, or suggestions for standards of various kinds.

But Ontolog Forum is not a standards organization, and any emails that anybody posts to Ontolog Forum will go no further than the Ontolog website.   Anybody is free to use those suggestions, but they will have no official standing or certification of any kind.

For theoretical issues about representations of any kind, first-order logic is fundamental.  Anything that is specified in FOL is guaranteed to be absolutely precise to the finest detail.  Furthermore, anything and everything implemented in  or on any digital device of any kind can be specified in FOL.

Therefore, I strongly recommend FOL as the foundation for specifying all computable representations of any kind.  There are, however, some kinds of information that may require extensions to modal or higher order logics for certain kinds of features.  Issues that go beyond FOL have been specified in logics, such as Common Logic and the IKL extensions to CL.

But in every case, logic is fundamental.   It's impossible to have a precise specification  of anything that cannot be translated to and from FOL or to some formally defined logic that includes FOL as a proper subset.

I know many of the Ontolog subscribers with whom I had been discussing these and related issues in meetings and email lists since I first began to work with standards organizations in the early 1990s.  I'm sure that they can add much more info about these matters.

Fundamental issue:  It's pointless to waste large amounts of human time and computer cycles on discussion of standards without considering whether and how any of this discussion could be developed into international standards by some official standards organization(s).  And FOL and/or some extensions to FOL should be the ultimate foundation for any of those standards.

John

PS:  It's OK to use subsets of FOL for some purposes, since any subset can be translated to FOL.   But FOL itself has a very clean and simple translation to and from natural languages with just seven common words:  and, or, not, if-then, some, every.  It is the ideal common representation for anything computable.  Any specification in FOL can be accompanied by an automatically generated translation to any desired natural language.

Frank Farance

unread,
Jun 22, 2024, 3:30:05 PMJun 22
to ontolo...@googlegroups.com
On 2024-06-21 18:47, John F Sowa wrote:
Folks,

Many subscribers to Ontolog Forum, including me, are or have been working or collaborating with various standards organizations --  ISO, OMG, IEEE, etc.   And many others have important ideas, requirements, or suggestions for standards of various kinds.

But Ontolog Forum is not a standards organization, and any emails that anybody posts to Ontolog Forum will go no further than the Ontolog website.   Anybody is free to use those suggestions, but they will have no official standing or certification of any kind.

For theoretical issues about representations of any kind, first-order logic is fundamental.  Anything that is specified in FOL is guaranteed to be absolutely precise to the finest detail.  Furthermore, anything and everything implemented in  or on any digital device of any kind can be specified in FOL.

Therefore, I strongly recommend FOL as the foundation for specifying all computable representations of any kind.  There are, however, some kinds of information that may require extensions to modal or higher order logics for certain kinds of features.  Issues that go beyond FOL have been specified in logics, such as Common Logic and the IKL extensions to CL.

But in every case, logic is fundamental.   It's impossible to have a precise specification  of anything that cannot be translated to and from FOL or to some formally defined logic that includes FOL as a proper subset.

I know many of the Ontolog subscribers with whom I had been discussing these and related issues in meetings and email lists since I first began to work with standards organizations in the early 1990s.  I'm sure that they can add much more info about these matters.

Fundamental issue:  It's pointless to waste large amounts of human time and computer cycles on discussion of standards without considering whether and how any of this discussion could be developed into international standards by some official standards organization(s).  And FOL and/or some extensions to FOL should be the ultimate foundation for any of those standards.

John

John-

I participate in national and international standards activities.  I encourage y'all to participate in the standards work, such as ISO/IEC JTC1 SC32 WG2 Metadata, which has developed the ISO/IEC 24707 Common Logic (John, I appreciate your participation in that effort as you were championing Conceptual Graphs), and then later on the ISO/IEC 21838-* series on Top-Level Ontologies.  Barry Smith and Michael Gruninger have been active participants.  I spent much time helping Barry Smith get his BFO work into standards form.  The following standards have been published:
  • ISO/IEC 21838-1:2021 Information technology — Top-level ontologies (TLO) — Part 1: Requirements
  • ISO/IEC 21838-2:2021 Information technology — Top-level ontologies (TLO) — Part 2: Basic Formal Ontology (BFO)
  • ISO/IEC 21838-3:2023 Information technology — Top-level ontologies (TLO) — Part 3: Descriptive ontology for linguistic and cognitive engineering (DOLCE)
  • ISO/IEC 21838-4:2023 Information technology — Top-level ontologies (TLO) — Part 4: TUpper
The following work is in progress:
  • ISO/IEC 21838-5 - Unified Foundational Ontology (UFO)
FYI, we spend much time getting a harmonized definition of "ontology" for the TLOs.  It was based upon decades of experience in discussion lists (like this one) going back to the 1990s.

We have had some presentations on Mid-Level Ontologies, which can be broad but also have domain-specific features.

The purpose of these structures is to create a re-usability framework, just like other library/re-use mechanisms is software engineering and system engineering.

For those interested in participating, you'll need to find your "National Mirror Committee" (a committee in one nation that mirrors the international work for the purpose of developing national positions among their experts).  When joining your local committee, you will need to let them know that you are interested in participating as an expert in ISO/IEC JTC1 SC32 WG2 Metadata.  Your National Body is the entity that represents your nation, typically it is your national standards body, which might be ANSI (US), SCC (CA), BSI (UK), DIN (DE), AFNOR (FR), SA (AU), SAC (CN), JISC (JP), KATS (KR), or others.  Here is the full list: https://www.iso.org/about/members

There is no direct individual participation, only participation as an expert (or delegate) through your National Body.

As for this list, I like to share ideas and get feedback, just like I did this week.  Some of it goes into papers and some of it goes into standards work.  This is no different than in the 1980s when those of use work working on C and UNIX standards, and we used USENET groups, such as "comp.lang.c" (C Programming Language) and "unix-wizards" (UNIX expertise), to share ideas and standards work among colleagues.

I've been involved in national and international standards work for over four decades.  Feel free to ask questions here (I'll do my best) or E-mail me directly (put "Standards" in the Subject: line).  Looking forward to seeing more of you in standards work and ALSO sharing your ideas here among colleagues.  Thank you for your interest.

David Leal

unread,
Jun 22, 2024, 5:24:14 PMJun 22
to ontolo...@googlegroups.com

Dear All,

"Measurement" and "quantity" are in the Subject.  The principal source of definitions for terms in this area is the VIM (International Vocabulary for Metrology) - https://www.bipm.org/en/committees/jc/jcgm/publications .

Sadly I fear that around "quantity", the definitions in the VIM do not meet Bill's criterion for a good definition.  I have struggled with the distinction between "quantity" and "quantity_value", and maybe the distinction is: epistemological:

  • A quantity is defined by reference to a particular or kind of phenomenon.  Inevitably there is some uncertainty about the identity of the phenomenon and its stability. 
  • A quantity value is defined by reference to one or more kinds of phenomenon deemed by metrologists to have unique identities and to be stable - until an experiment shows that they are not.

Between 1960 and 1983, the wavelength of the orange-red emission line in the electromagnetic spectrum of the krypton-86 atom in vacuum was a quantity value - 1/1650763.73 metre.  Since 1983, it has been a quantity. :)

Best regards,
David

To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/15b04ea5-0e33-4923-a1ff-fbfa7b699acan%40googlegroups.com.
-- 
CAESAR Systems Limited
mob: +44 77 0702 6926
registered address: 56 Micheldever Road, London, SE12 8LU, UK
company number: 02422371
VAT number: 548 0510 55

Ravi Sharma

unread,
Jun 22, 2024, 10:38:16 PMJun 22
to ontolo...@googlegroups.com
All, 
As the subject of email threads keeps changing
My email was  related in this thread called: [External] Re: [ontolog-forum] top layer ontologies, domain layer, application layer and data models
Great Treatises and a lot related to my own life
  •  in IT and Computer hardware and software and systems engineering Enterprise Architecture and
  •  only a bit such as spectral and time definition in  another life in physics and particle and space sciences.
  • specifically one year also spent on Practical data at BLS including IV&V and CPI and PPI which depend on how data are dealt with and how sampling and statistics can relate to national economy etc. (Kind Attn Dan at BLS)

Summary:
What I see is within contexts and limits in the past 100 years only the above attempt definitions of data.

What John Sowa has been pointing us to is that in many above examples and standards you have been extensively using FOL and a bit of higher order logic. That by logic and math is beginning of metadata, data, types, etc.
Finally how do you relate provenance namely texts, temperatures measured from balloons in atmosphere vs what is measured using IR data from satellites?
What is not correct in 
Simply, useful data (or process of making data useful) is information and aggregation of useful information is knowledge and useful knowledge is wisdom!" ?
need to take the conversations perhaps for meaningful convergence at least for summit."

Regards.




Thanks.
Ravi
(Dr. Ravi Sharma, Ph.D. USA)
NASA Apollo Achievement Award
Former Scientific Secretary iSRO HQ
Ontolog Board of Trustees
Particle and Space Physics
Senior Enterprise Architect
SAE Fuel Cell Standards Member



Alex Shkotin

unread,
Jun 24, 2024, 11:26:19 AM (13 days ago) Jun 24
to ontolo...@googlegroups.com

David,


The logic of physical quantities and their meanings is a broad topic. It can be discussed using the example of using physical quantities in a statics problem. For example,

A weightless beam is held in a horizontal position by a hinged-fixed support at point A and a vertical rod BC.

At point D, a concentrated force F = 30 kN is applied to the beam at an angle of 50° down to the right.

Dimensions: AB=0.6m, BD=0.4m.

Calculate the reaction forces of the supports acting on the beam.

Here are several geometric and physical quantities that will be used to solve the problem.


My task is to formalize a solution. What do you think?


Alex




вс, 23 июн. 2024 г. в 00:24, David Leal <david...@caesarsystems.co.uk>:

David Leal

unread,
Jun 24, 2024, 4:11:23 PM (13 days ago) Jun 24
to ontolo...@googlegroups.com

Dear Alex,

Your example is an idealisation, perhaps defined as part of an engineering design.  The length BD is deemed to be 0.4 metres.  In this idealisation, the quantity (the length of the path BD) and the quantity value (0.4 metres) are the same thing.  0.4 metres is the sole "true quantity value" for the quantity.

In a real world, however much we attempt to constrain the definition of "length BD", there is always uncertainty.  We can specify "length BD at 20 deg C, 1 bar atmospheric pressure, 10% humidity, etc.", but we have to assume that there is always something uncontrolled for, so that there are many "true quantity values". 

Best regards,
David

alex.shkotin

unread,
Jun 25, 2024, 11:24:42 AM (12 days ago) Jun 25
to ontolog-forum
Dear David,

It is important for me to emphasize that we are doing formalization to clarify the situation and the means used to formulate and solve the problem.

For now, I can write the following in advance.

Let _BD denote the part of the beam located between point B and the end D of the beam.

Let us have a function lng that assigns a representative (in our case _BD) whose object has linear size the corresponding value in the form of a decimal number equipped with a unit of measurement. In this case, the accuracy of the value is assumed to be specified ±0.05, as is usually accepted when the accuracy is not specified.

Let lng(_BD):=0.4m denote assigning the lng function a value on _BD equal to 0.4m.

etc. later on.

Other topics


It is quite possible to list all the conditions under which a beam exists as a physical system, and instead of “etc” they should be completed to the end, because only one physical field is not mentioned - electromagnetic.


Fundamental to solving a problem is the ability, the ability not to “idealize” (there is a critical connotation here) but to abstract from the unimportant, to neglect it. Theoretical mechanics begins with the statement that a material point is a physical body whose dimensions can be neglected when solving a given problem. And they continue: for example, when calculating the movement of planets around the Sun ⚡


We apply theoretical knowledge to solve practical problems throughout our lives.

And now so much theoretical knowledge has been accumulated that it is time to concentrate it [0]. What formal ontologies, for example, the OBO Foundry project, do in their own way.


What kind of knowledge do we keep in our formal ontologies? Theoretical, practical, and ultra-practical: when we keep task description. Keeping in mind that some reasoning machine derives a solution - may be semi-automatically. 


It's amazing but in TToD [1] the term "quantity" is absent!


Best regards,


Alex


[0] https://www.researchgate.net/publication/374265191_Theory_framework_-_knowledge_hub_message_1 

[1] https://bitbucket.org/ddi-alliance/ddi-cdi/src/master/source/high-level-documentation/DDI-CDI_Model_Specification.docx

понедельник, 24 июня 2024 г. в 23:11:23 UTC+3, david.leal:

Dan Brickley

unread,
Jun 26, 2024, 11:16:15 PM (11 days ago) Jun 26
to ontolog...@googlegroups.com, CG, ontolo...@googlegroups.com
On Fri, 21 Jun 2024 at 23:48, John F Sowa <so...@bestweb.net> wrote:
Folks,

Many subscribers to Ontolog Forum, including me, are or have been working or collaborating with various standards organizations --  ISO, OMG, IEEE, etc.   And many others have important ideas, requirements, or suggestions for standards of various kinds.

But Ontolog Forum is not a standards organization, and any emails that anybody posts to Ontolog Forum will go no further than the Ontolog website.   Anybody is free to use those suggestions, but they will have no official standing or certification of any kind.

The “go no further” bit seems to contradict the text that is automatically added to all distributed emails here, namely

“All contributions to this forum are covered by an open-source license.

For information about the wiki, the license, and how to subscribe or 
unsubscribe to the email, see http://ontologforum.org/info/ “


Unfortunately http://ontologforum.org/info/ is currently a 404 missing link but in general an opensource license woud typically provide for widespread copyying, re-use, derivative works etc. It looks like at some points in time the license was CC -BY-SA4:

All contributions to this forum by its members are made under an open content license, open publication license, open source or free software license. Unless otherwise specified, all Ontology Summit content shall be subject to the Creative Commons CC-BY-SA 4.0 License or its successors”

This seems well intentioned but complex in practice and reminds me of the early W3C years before patent disputes forced a more complex patent policy (since copyright rules don’t address the same issues as patent-related commitments.

My guess is that most participants here don’t notice that copyright message anyway…

Dan


For theoretical issues about representations of any kind, first-order logic is fundamental.  Anything that is specified in FOL is guaranteed to be absolutely precise to the finest detail.  Furthermore, anything and everything implemented in  or on any digital device of any kind can be specified in FOL.

Therefore, I strongly recommend FOL as the foundation for specifying all computable representations of any kind.  There are, however, some kinds of information that may require extensions to modal or higher order logics for certain kinds of features.  Issues that go beyond FOL have been specified in logics, such as Common Logic and the IKL extensions to CL.

But in every case, logic is fundamental.   It's impossible to have a precise specification  of anything that cannot be translated to and from FOL or to some formally defined logic that includes FOL as a proper subset.

I know many of the Ontolog subscribers with whom I had been discussing these and related issues in meetings and email lists since I first began to work with standards organizations in the early 1990s.  I'm sure that they can add much more info about these matters.

Fundamental issue:  It's pointless to waste large amounts of human time and computer cycles on discussion of standards without considering whether and how any of this discussion could be developed into international standards by some official standards organization(s).  And FOL and/or some extensions to FOL should be the ultimate foundation for any of those standards.

John

PS:  It's OK to use subsets of FOL for some purposes, since any subset can be translated to FOL.   But FOL itself has a very clean and simple translation to and from natural languages with just seven common words:  and, or, not, if-then, some, every.  It is the ideal common representation for anything computable.  Any specification in FOL can be accompanied by an automatically generated translation to any desired natural language.

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the email, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontology-summit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontology-summ...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontology-summit/a68ea00351f04c90b676c6920eb12466%40bestweb.net.

Ken Baclawski

unread,
Jun 26, 2024, 11:51:34 PM (11 days ago) Jun 26
to ontolo...@googlegroups.com, ontolog...@googlegroups.com, CG
Thank you for catching the missing link.  It should work now although I cannot guarantee that all browsers and mail clients will handle it correctly.  The problem is the trailing slash on the URL link.  The actual mailing list description does not have the trailing slash, and the name of the information page was always "info" with no extension.  I presume that the google mail server is responsible for "correcting" the URL in the description by adding a trailing slash.

Ken

unsubscribe to the forum, see http://ontologforum.org/info/
---
You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/CAFfrAFoO4JTfvvSwPgzro%3D5LW_%3D-Q-_MufNLdRV1SkHJ3Cm4ew%40mail.gmail.com.

alex.shkotin

unread,
Jun 27, 2024, 5:29:51 AM (11 days ago) Jun 27
to ontolog-forum

Frank,


It would be great if you find a way to keep our community of practice informed about UFO development.

Not unknown flying objects but this one https://www.iso.org/standard/89915.html.

We discussed here from time to time that any philosophical system can be formalized. And it is interesting which one do ISO prefer.

Moreover, today the question is what philosophical system is behind ENZ's Principia [1].


And as far as we know JFS prefers C.S. Peirce one, for example.


Alex


[1] https://mally.stanford.edu/principia.pdf



суббота, 22 июня 2024 г. в 22:30:05 UTC+3, Frank Farance:

David Leal

unread,
Jun 28, 2024, 1:08:06 PM (9 days ago) Jun 28
to ontolo...@googlegroups.com

Dear Alex,

There is a map, defined via lambda calculus from a unit, between length and real number as described by Gruber and Olsen - https://tomgruber.org/writing/an-ontology-for-engineering-mathematics .  I wish this paper was read more.

However, I think you are wrong not to regard the process as idealisation.  We start with some statements about the physical world, but there is missing information that we have to guess (use engineering judgement for), and assumptions about behaviour which enable the problem to be mathematically tractable.  Our engineering judgment may be that the material of the beam under load has linear elastic behaviour.  We may assume that the beam has "slender" deformation in which planes perpendicular to the axis remain so throughout.

In general there is a simulation process which includes:

  • a path from problem statement to a "model" which can be analysed;
  • a validation of the model by internal consistency and by comparison with tests.

The NAFEMS Simulation Data Management Working Group - https://www.nafems.org/community/working-groups/simulation-data-management/ - is working on a vocabulary to support the recording of simulation processes.

Best regards,
David

Reply all
Reply to author
Forward
0 new messages