Proposal for enhancing field (and range) definitions

2 views
Skip to first unread message

Stephen McConnel

unread,
Feb 17, 2010, 12:36:52 PM2/17/10
to lexiconinter...@googlegroups.com
Proposal for enhancing field (and range) definitions in LIFT

The current LIFT specification (0.13) has an underspecified mechanism for
specifying nonstandard data, those which are implemented by <field> (or
<trait>) elements in the body of the LIFT file. The current specification
contains the following for field-defn and field.

------------------------------------------------------------------------------
field-defn
A field definition gives information about a particular field type that
may be
used by an application to add information not part of the LIFT standard.

Inheritance
extensible Allows adding traits and annotations to a field definition.
multitext Contains a multilingual description of this particular field.

Attributes
tag [Required, key] This key corresponds to the tag attribute found in all
fields for which this is the definition.

The current LIFT specification has the following for the use of field in the
body of the LIFT file:

field
A field is a generalised element to allow an application to store
information
in a LIFT file that isn't explicitly described in the LIFT standard.
Fields are
described as part of the header information so that applications can
give some
descriptive meaning to the information they add to a file.

Inheritance
multitext Stores the language and multiscript form of the information.

Attributes
type [Required, key] The identifying key that gives the field name of the
field. Applications may share data by agreeing on the tag to use.
dateCreated [Optional, datetime] Gives the creation date of the field.
dateModified [Optional, datetime] Gives the modification time of the field.

Content
trait [Optional, Multiple, flag] Gives additional information about the
field.
form [Optional, Multiple, span] The multilingual representation of the field
content for a particular writing system.
annotation [Optional, Multiple, annotation] Adds metainformation
describing the element.
------------------------------------------------------------------------------

I think the specification for field-defn and field are so close that they
should be the same (they almost are if you compare extensible with the
additional attributes and content of field, and both of them are
realized in a
LIFT file by a <field> element), but with additional meta data for the
field-defn to allow different applications (or different instantiations
of the
same application) a reasonable chance to share nonstandard data. Thus, my
proposal has two parts: a simpler definition of field to match the
bare-bones
definition of field-defn, and an expanded definition of field-defn to
provide
specific guidance for implementers.

Here's my proposed redefinition of field and field-defn:

------------------------------------------------------------------------------
field
A field is a generalised element to allow an application to store
information
in a LIFT file that isn't explicitly described in the LIFT standard.
Fields are
described (by field elements!) as part of the header information so that
applications can give some descriptive meaning to the information they
add to a
file.

Inheritance
extensible Allows adding traits and annotations to a field definition.
multitext Stores the language and multiscript form of the information.

Attributes
type [Required, key] The identifying key that gives the field name of the
field. Applications may share data by agreeing on the type to use.

field-defn
A field definition gives information about a particular field type that
may be
used by an application to add information not part of the LIFT standard. It
follows the same schema as the field elements found in the body of the LIFT
file, but makes a very particular use of the available elements.

Inheritance
extensible Allows adding traits and annotations (and fields!) to a field
definition.
multitext Contains a multilingual description of this particular field.

Attributes
type [Required, key] This key corresponds to the type attribute found in all
fields for which this is the definition.

The dateCreated and dateModified attributes (which are part of
extensible) may
appear without violating validation, but applications are free to ignore
them
and to lose that information on writing an updated LIFT file.

The following standard <trait> elements are used to provide information to
applications:

* name=�owner� has as its value the name of the LIFT element in which it can
appear. If the LIFT element is a field, then the value takes the form of
�field:type� where type is the given key of the desired field element. (I
don't think that nested field elements happen in practice. If they do, the
outer field would have a different data-type than any of those listed
below.)

* name=�data-type� has as its value one of the following standard types, or
another value that the application uses. If it's not on the list of standard
types, then other applications will probably not be able to display the
data,
but other instances of the same application will be able to read and display
it properly.
1. multitext � a multilingual text string
2. multiparagraph � a longer text field (in one writing system?) that can
have multiple paragraphs
3. picture � a link to an external image file
4. sound � a link to an external sound file

* name=�lang� has as its value a valid language tag such as �en�. This
is the
only one of these trait elements that may be repeated, and then only for the
multitext type of data.

(Since these <trait> elements are specified in the standard, they do not
need a
corresponding range definition in the LIFT file.)

In addition to these trait elements (all of which are essentially required),
the optional form element(s) inherited as part of multitext provides a
multilingual description of the field. An optional nested field element with
type=�label� provides a multilingual display label value for the field.
(This
particular use of field does not require any specification in the LIFT file
because it is specified here in the standard.)
------------------------------------------------------------------------------
(This concludes the proposed redefinition of field-defn in the LIFT
standard.)

In addition to the types of fields listed above, applications may also have
standard and custom fields that refer to what FieldWorks calls possibility
lists and WeSay apparently calls option lists. These are handled by trait
elements in the body of the LIFT file, and by range elements in the header.
The range item is defined as follows in the current standard.

------------------------------------------------------------------------------
range
A range is a set of range-elements and is used to identify both the group of
range-elements but also to some extent their type.

Inheritance
extensible Allows a range of elements to have traits and annotations.

Attributes
id [Required, key] This is the identifying key for this particular
range-set and is used, for example in the range attribute of a flag. A
range id attribute is unique only among the set of ranges used in the
document.
guid [Optional, string] Allows a particular range to be uniquely identified,
particularly when referenced from a lexicon.
href [Optional, URL] This attribute may not be used within an external range
definition file. In a standard LIFT file, the href attribute may be
used to reference an external lift-ranges file that contains a
definition for this range. Any children to this range element in the
LIFT file override the values (by addition or replacement) in the
external range definition.

Content
description [Optional, multitext] Used to give a multilingual description
of the range.
range [Required, Multiple, range-element] This is the list of range-elements
that make up this range. This list is unordered.
label [Optional, Multiple, multitext] Gives a multilingual label to this
range-set for GUI purposes.
abbrev [Optional, Multiple, multitext] Gives an abbreviation for this
range-set in multiple languages, for GUI purposes.
------------------------------------------------------------------------------

My proposal is to enhance this definition by adding the following meta data
specification.

------------------------------------------------------------------------------
Two specific traits in each range definition is useful for applications
to have
available:

* name=�owner� has as its value the name of the LIFT element in which it can
appear. If the LIFT element is a field, then the value takes the form of
�field:type� where type is the given key of the field. (I don't think this
happens in practice.)

* name=�multiplicity� has as its value one of the following standard types.
1. zero-or-one � optional field that can occur only once
2. one � required field that can occur only once
3. zero-or-more � optional field that can occur multiple times
4. one-or-more � required field that can occur multiple times
------------------------------------------------------------------------------

(A nicely formatted version of this proposal in either PDF or ODT (Open
Office) format is available upon request.)

Martin Hosken

unread,
Apr 26, 2010, 7:01:03 AM4/26/10
to lexiconinter...@googlegroups.com
Dear Steve,

> Here's my proposed redefinition of field and field-defn:
> field
> Inheritance
> extensible Allows adding traits and annotations to a field definition.
> multitext Stores the language and multiscript form of the information.

This is your main change here, that fields become extensible and therefore can take their own fields. I see no problem with nesting fields if implementors are happy too.

> The following standard <trait> elements are used to provide information to
> applications:
>
> * name=”owner” has as its value the name of the LIFT element in which it can
> appear. If the LIFT element is a field, then the value takes the form of
> “field:type” where type is the given key of the desired field element. (I
> don't think that nested field elements happen in practice. If they do, the
> outer field would have a different data-type than any of those listed
> below.)

This is useful.

> * name=”data-type” has as its value one of the following standard types, or
> another value that the application uses. If it's not on the list of standard
> types, then other applications will probably not be able to display the
> data,
> but other instances of the same application will be able to read and display
> it properly.
> 1. multitext – a multilingual text string
> 2. multiparagraph – a longer text field (in one writing system?) that can
> have multiple paragraphs
> 3. picture – a link to an external image file
> 4. sound – a link to an external sound file

From what you way here I can only see two real types: text and URL. Whether a text is a string or a multiparagraph is entirely an implementation issue and can be analysed from the text contents (does it have any newlines in it). In addition, there is no convention on how to represent multiple paragraphs in lift, since we have no paragraph markers, unless people use the Unicode characters for these.

For a URL, you can analyse the mime type of the URL to tell you what the contents are. But even then I wonder how useful this information really is. Do you really want this? If so we should include all mime types really.

> * name=”lang” has as its value a valid language tag such as “en”. This
> is the
> only one of these trait elements that may be repeated, and then only for the
> multitext type of data.

Why are you not using the @lang in the multitext itself for this? Isn't this redundant?

> (Since these <trait> elements are specified in the standard, they do not
> need a
> corresponding range definition in the LIFT file.)
>
> In addition to these trait elements (all of which are essentially required),
> the optional form element(s) inherited as part of multitext provides a
> multilingual description of the field.

No trait may be required. If we want required traits we should make them full first class elements. There is nothing above that says to me that these are even essentially required. They can help but they aren't essential to the structure of lift.

> An optional nested field element with
> type=”label” provides a multilingual display label value for the field.
> (This
> particular use of field does not require any specification in the LIFT file
> because it is specified here in the standard.)

By specifying in the standard a particular type of field, we are breaking our own rules. Hence it needing to be a first class element and so field-defn really is something different. I appreciate your desire to reduce the number of classes, but I think it is going a little too data driven at this point. We need to make sure that we don't mix optional and required information.

> Two specific traits in each range definition is useful for applications
> to have
> available:
>
> * name=”owner” has as its value the name of the LIFT element in which it can
> appear. If the LIFT element is a field, then the value takes the form of
> “field:type” where type is the given key of the field. (I don't think this
> happens in practice.)

As above, this can be helpful but it isn't required.

> * name=”multiplicity” has as its value one of the following standard types.
> 1. zero-or-one – optional field that can occur only once
> 2. one – required field that can occur only once
> 3. zero-or-more – optional field that can occur multiple times
> 4. one-or-more – required field that can occur multiple times

I have problems with this. No trait may be required. So that knocks out your "one" and "one-or-more". Currently there is nothing in the definition of trait to say whether two traits with the same name may be used. Bear in mind that a trait isn't a triple (range, range-element, value) or it is a binary triple with an inferred value of true. So on that basis, I suppose there could be more than one trait with the same name. Does it help to document this? I suppose we could have a range called multiplicity with two elements: single, multiple but since all traits are optional we couldn't do any verification using it. On that basis, do you still want it?

GB,
Martin

--
You received this message because you are subscribed to the Google Groups "LexiconInterchangeFormat" group.
To post to this group, send email to lexiconinter...@googlegroups.com.
To unsubscribe from this group, send email to lexiconinterchange...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lexiconinterchangeformat?hl=en.

Reply all
Reply to author
Forward
0 new messages