ids & guids

4 views
Skip to first unread message

Martin Hosken

unread,
Sep 30, 2009, 11:38:02 PM9/30/09
to lexiconinter...@googlegroups.com
Dear All,

A discussion has arisen about the need for ids and for guids. This is an area of the standard that is a little rough around the edges, so please bear with me as I try to expand my understanding of it. Which has grown in recent discussions.

First a few definitions:

An id is a file unique string that is used when the item (entry or sense) is referenced from somewhere else in the file. Its only constraint is that it is file unique and theoretically, it can change between instantiations of the file.

A guid is an optional string that is guaranteed not to change once it has been set (despite only having a SHOULD on that requirement in 0.14).

LIFT in its original inception was never really designed to be easily mergeable in a version control sense. This need has grown and I am hearing that people want the mergeability requirements to be made part of the core standard. Is there anyone who does not want to see that happen?

For mergeability, the stability requirement of a guid is essential. But why do we need both a guid and an id? How about we just up the stability requirement of an id and we are done aren't we?

One problem that has been raised is that ids are designed to be somewhat human readable (they often consist of the lexical-unit followed by a guid (ugly, but effective). And if when you create a new entry (say), you want to be able to uniquely identify it, so you give it a guid, but you don't want to give it an id until the lexical-unit is set.

There are a number of aspects to this:

1. If the object is empty, then it is the same as any other empty object. So deleting it and recreating it makes no difference to the file.
2. If you want to cross refer to an empty element (or at least one without a lexical-unit) then perhaps we just take the confusion and have a more opaque id.

So I would be happy to increase the stability requirement on id and deprecate guid. Or we can leave things as they are. There is nothing to say that an application can't add guids on import. And if we tighten the SHOULD to MUST in the guid spec, we fulfill the merging requirement. By making no change at this time, but warning of future deprecation, applications can start to make ids stable and for the moment, hold both, but switch from identifying by guid and instead using id for identification.

Thoughts?

Yours,
Martin

cambell

unread,
Oct 1, 2009, 2:36:24 AM10/1/09
to LexiconInterchangeFormat

> So I would be happy to increase the stability requirement on id and deprecate guid. Or we can leave things as they are. There is nothing to say that an application can't add guids on import. And if we tighten the SHOULD to MUST in the *guid* spec, we fulfill the merging requirement. By making no change at this time, but warning of future deprecation, applications can start to make ids stable and for the moment, hold both, but switch from identifying by guid and instead using id for identification.

That should be "SHOULD to MUST in the id spec"?

C

cambell

unread,
Oct 1, 2009, 2:38:48 AM10/1/09
to LexiconInterchangeFormat
Hi,

So the proposal is this:

1) guid be deprecated.
It's currently optional in the spec, but in reality WeSay doesn't work
without it.

2) id be tightened up such that:
- It is required
- It is immutable

Regards,
Cambell

John Hatton

unread,
Oct 1, 2009, 3:51:51 AM10/1/09
to lexiconinter...@googlegroups.com
I'm concerned about deprecating guid.

Backing up... why did we have id the way it is today? Because there was a
desire on the part of some that ids be easily creatable from, say, a CC
table, so GUIDs were out, AND, we liked the idea of something vaguely human
readable. So WeSay and FLEx glued the lexical-unit and a guid together and
stuck that in ID. That way we knew it was unique, but still readable. The
problem is that the lexical-unit can change, and then the id is weird.

Yet today, we have several tools for doing the conversion, so I don't see
anyone doing this with something as weak as CC. And anything more powerful
can generate guids.

More history. How did we get also get guid attributes by themselves in
addition to ids? Well, this is what we machines and their programmers
really like, and since FLEx has guids for these things internally, it sure
simplifies life during import if one can read the guid and map it to the
corresponding FLEx object in the database.

Until a month or so ago, I'd been under the mistaken impression that id was
required, and guid optional. I was aghast to find that actually, both are
optional. Since learning, I'm now tired of writing code which constantly
deals with both, either of which can be missing. Life is too short for this.
So I'm happy that one of these two be made mandatory.

Some thoughts:

If id were simply made to a guid, then life would be simpler. It would work
well for FLEx, it would always be unique, and WeSay/FLEx would lose this
current problem of the id falling out of sync with the current version of
the lexical-unit once the user edits the form (for visual purposes only...
not really a data problem).

But then, I'm left with "sure was nice to be able to follow ids visually".
But maybe that's unrealistic of me. How often do I actually read these
things? Almost never, after 3 years of converting
to/reading/writing/xslt'ing LIFT. Does anyone have a contrary experience?
(One help would be to add a human-only hint on the referer's end: <relation
ref="...gui..." hint="feline">.

So, I will counter propose that we make GUID mandatory, and do away with id
altogether. Not as an obviously better approach, but as the simplest of
variously ok approaches. Maybe a compromise would be to call it "id", but
say it must contain a guid? Less clear that way...

Before I forget, we need to move towards unique ids on all complex
merge-able things that aren't otherwise machine-distinguishable. Without
this, for example, Example Sentences suffer during 3-way merging where both
parties edited something. This team-collaboration of LIFT is now more than
just a future thing, we've got one such dictionary project underway.

JH

Martin Hosken

unread,
Oct 1, 2009, 4:55:10 AM10/1/09
to lexiconinter...@googlegroups.com
Dear John,

> So, I will counter propose that we make GUID mandatory, and do away with id
> altogether. Not as an obviously better approach, but as the simplest of
> variously ok approaches. Maybe a compromise would be to call it "id", but
> say it must contain a guid? Less clear that way...

In effect, my proposal (as summarised by Cambell) would have the same effect. The only difference is that the standard doesn't say that it has to be a *G*UID. But I would be most surprised if people didn't use guids for them. Once they are set, it doesn't really matter what they are because they aren't going to change.

> Before I forget, we need to move towards unique ids on all complex
> merge-able things that aren't otherwise machine-distinguishable. Without
> this, for example, Example Sentences suffer during 3-way merging where both
> parties edited something. This team-collaboration of LIFT is now more than
> just a future thing, we've got one such dictionary project underway.

Can you give me a list of objects that you consider need such ids?

GB,
Martin

John Hatton

unread,
Oct 1, 2009, 6:35:11 PM10/1/09
to lexiconinter...@googlegroups.com
>In effect, my proposal (as summarised by Cambell) would have the same
effect. The only difference is that the standard doesn't say that it has to
be a *G*UID. But I would be most surprised if people didn't use guids for
them. Once they are set, it doesn't really matter what they are because they
aren't going to change.

Right, but now imagine the job of the guy maintaining the FLEx importer
code. His model (which predates FLEx by many years) has real guids. So now
how does h map the incoming LIFT file to FLEx's internal model? The very
first time, no big deal, he just generates new objects and guids. But the
next day, he has to import the latest version of that lift file again. Now
what does he do? He can't just go through and rewrite all the ids that may
or may not be guids. Under the "any unique id" plan, I'll tell you what I
would do in his shoes. I'd introduce an attribute called "guid", write that
out to the lift, get other apps to round-trip it, and now we're back to
exactly the same situation/problem as we have today, with both and id and a
guid. Alternatively, I'd let my life get more complicated by keeping a
lookup table somewhere... but I'd be grumbling... "why do they put me
through this pain"?


John Hatton
SIL PNG, Palaso, & SIL International Software Development
Google Talk chat: hattonjohn


cambell

unread,
Oct 2, 2009, 1:01:50 AM10/2/09
to LexiconInterchangeFormat
Hi,

one id (be it id or guid) seems less error prone (on developers part)
to use in tools.

Are there any tools that can't generate a guid?

The case would be say, producing a new lift file from existing data
and needing to express relations using the guid as a target. What
tools would find that tricky with a guid?

Martin Hosken

unread,
Oct 2, 2009, 1:58:57 AM10/2/09
to lexiconinter...@googlegroups.com
Dear John,

> Right, but now imagine the job of the guy maintaining the FLEx importer
> code. His model (which predates FLEx by many years) has real guids. So now
> how does h map the incoming LIFT file to FLEx's internal model? The very
> first time, no big deal, he just generates new objects and guids. But the
> next day, he has to import the latest version of that lift file again. Now
> what does he do? He can't just go through and rewrite all the ids that may
> or may not be guids. Under the "any unique id" plan, I'll tell you what I
> would do in his shoes. I'd introduce an attribute called "guid", write that
> out to the lift, get other apps to round-trip it, and now we're back to
> exactly the same situation/problem as we have today, with both and id and a
> guid. Alternatively, I'd let my life get more complicated by keeping a
> lookup table somewhere... but I'd be grumbling... "why do they put me
> through this pain"?

Given that no change in this area will happen without a version bump, that allows an import to tell what the meaning of id and guid is. Are you also telling me that Flex will fall over if the unique id is not a MS formatted 128 bit GUID?

I think you are also not hearing that the new proposed id has to be unique and can't change and one of the easiest ways of doing that is to use some kind of guid. So in effect the id becomes a guid. I'm just not saying it has to be a strict 128 bit number as formatted and generated by MS libraries.

Of course, if we are all happy with the status quo, then we can leave it as it is. But I have a feeling that people will want to change this at the next version bump (when they have to deal with other changes anyway).

GB,
Martin

John Hatton

unread,
Oct 2, 2009, 2:32:47 AM10/2/09
to lexiconinter...@googlegroups.com
> Are you also telling me that Flex will fall over if the unique id is not a
MS formatted 128 bit GUID?

Yup. GUID is a type in .net. They aren't just strings. So if I say

entry.Guid = Guid.Parse("somereallyreallyuniquestring");

then it's certainly not gonna be happy.

John Hatton

unread,
Nov 28, 2009, 10:32:52 PM11/28/09
to lexiconinter...@googlegroups.com
Ok, it's now been almost two months since the last post of this discussion.
I now move that we accept the following change to LIFT:

Each Entry must have a guid.
References must be by guid.
Existing data will have to be migrated from the id+guid system to this pure
guid system.

John Hatton
SIL Papua New Guinea, Palaso, & SIL International Software Development
Chat Google Talk: hattonjohn Skype: hattonjohn Google Wave:
hatto...@googlewave.com


Ken Zook

unread,
Nov 30, 2009, 9:39:53 AM11/30/09
to lexiconinter...@googlegroups.com
I certainly like the ability of a FLEX import to have a unique string that is not a GUID. It does let me do things with CC and have them work. But the way FLEX currently works, when those non-GUID unique IDs are imported, they become GUIDs inside of Flex, so as John mentioned previously, if one were to import the same file, you wouldn't be able to merge based on the ids. So I can see the problem with this for a standard that needs to allow merging. I certainly do not like having the headword be part of the id for the reasons John mentioned.

I'm willing to give up the nicety of unique strings and say the id in a LIFT file must be a GUID.

Ken
> --
>
> You received this message because you are subscribed to the Google
> Groups "LexiconInterchangeFormat" group.
> To post to this group, send email to
> lexiconinter...@googlegroups.com.
> To unsubscribe from this group, send email to
> lexiconinterchange...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lexiconinterchangeformat?hl=en.
>




Martin Hosken

unread,
Nov 30, 2009, 8:41:53 PM11/30/09
to lexiconinter...@googlegroups.com
Dear John,

> Ok, it's now been almost two months since the last post of this discussion.
> I now move that we accept the following change to LIFT:

Yippee, a concrete proposal :) Now I can respond!

> Each Entry must have a guid.
> References must be by guid.
> Existing data will have to be migrated from the id+guid system to this pure
> guid system.

I would like to counter propose with something that I feel is functionally similar.

1. @id must be unique and cannot change between versions of a file.

If one thinks about this for a moment, that means that in effect it has to be some kind of guid since if two people add a record, they have to be sure that they have different ids. But the difference here is that my definition does not require that the field conform to some notional 128bit encoding standard.

2. @guid is deprecated.

Because we don't need it now that @id does the same thing.

3. Data will have to be locked, but should not need to be migrated.

Some discussion.

But if I have real GUIDs, I can use them internally in my program, I hear someone wail. What difference is it from using a string internally. One still has to have a map of id (be it string or guid) to actual objects. One still has to store the id (be it string or guid) ready for output. I am yet to be sold on the idea that the id has to be a pure guid.

GB,
Martin

Cambell Prince

unread,
Nov 30, 2009, 9:51:07 PM11/30/09
to lexiconinter...@googlegroups.com
Hi,

Martin Hosken wrote:
>> Each Entry must have a guid.
>> References must be by guid.
>> Existing data will have to be migrated from the id+guid system to this pure
>> guid system.
>>
>
> I would like to counter propose with something that I feel is functionally similar.
>
> 1. @id must be unique and cannot change between versions of a file.
>
> If one thinks about this for a moment, that means that in effect it has to be some kind of guid since if two people add a record, they have to be sure that they have different ids. But the difference here is that my definition does not require that the field conform to some notional 128bit encoding standard.
>
> 2. @guid is deprecated.
>
> Because we don't need it now that @id does the same thing.
>
>

Whether the attribute it id or guid doesn't worry me. The key thing is
that there is only one.

> 3. Data will have to be locked, but should not need to be migrated.
>
> Some discussion.
>
> But if I have real GUIDs, I can use them internally in my program, I hear someone wail. What difference is it from using a string internally. One still has to have a map of id (be it string or guid) to actual objects. One still has to store the id (be it string or guid) ready for output. I am yet to be sold on the idea that the id has to be a pure guid.
>
In principle that's fine. The problem is that there are some very large
and very real software that use a GUID class that has a very narrow view
of what a GUID is. So, while I see the point I think specifying a GUID
is a practical solution. Currently there's very little software out
there that doesn't use GUID.

Regards,
Cambell

John Hatton

unread,
Nov 30, 2009, 11:39:58 PM11/30/09
to lexiconinter...@googlegroups.com
>But if I have real GUIDs, I can use them internally in my program, I hear
someone wail. What difference is it from using a string internally. One
still has to have a map of id (be it string or guid) to actual objects. One
still has to store the id (be it string or guid) ready for output. I am yet
to be sold on the idea that the id has to be a pure guid.

As I have previously explained on this list, saying that a process must be
able to round-tip arbitrary id strings means that FieldWorks would need be
rewritten to use string data structures, just to support this standard, or
would need to maintain some form of mapping. As before, I find it
unreasonable request this of them.

John Hatton
SIL PNG, Palaso, & SIL International Software Development
Google Talk chat: hattonjohn



Martin Hosken

unread,
Dec 1, 2009, 2:46:58 AM12/1/09
to lexiconinter...@googlegroups.com
Dear John,

> As I have previously explained on this list, saying that a process must be
> able to round-tip arbitrary id strings means that FieldWorks would need be
> rewritten to use string data structures, just to support this standard, or
> would need to maintain some form of mapping. As before, I find it
> unreasonable request this of them.

So what you are saying is that instead of the id being an immutable string, you want it to be an immutable 128-bit number that can be rendered in any form that other people recognise as a UUID (e.g. with or without hyphens)

I would like to hear from the Flex team on this. If we switch to saying this is a 128bit number which can have variable form, and then say Flex switches to using SQL server which doesn't have an atomic GUID type, then they are going to want to use a string as a key rather than a number.

So which is it to be? Do you want to key off a number or a string?

GB,
Martin

Cambell Prince

unread,
Dec 1, 2009, 2:59:34 AM12/1/09
to lexiconinter...@googlegroups.com
Hi,
I think that would be MySql Server. At any rate, in code many of us
currently use the .Net GUID class to model the guid attribute. This is
compatible with the python UUID class.
> So which is it to be? Do you want to key off a number or a string?
>
> GB,
> Martin
>

Stephen McConnel

unread,
Dec 1, 2009, 11:03:34 AM12/1/09
to lexiconinter...@googlegroups.com
Cambell Prince wrote:
Martin Hosken wrote:
  
Dear John,
    
As I have previously explained on this list, saying that a process must be
able to round-tip arbitrary id strings means that FieldWorks would need be
rewritten to use string data structures, just to support this standard, or
would need to maintain some form of mapping.  As before, I find it
unreasonable request this of them.
      
So what you are saying is that instead of the id being an immutable string, you want it to be an immutable 128-bit number that can be rendered in any form that other people recognise as a UUID (e.g. with or without hyphens)

I would like to hear from the Flex team on this. If we switch to saying this is a 128bit number which can have variable form, and then say Flex switches to using SQL server which doesn't have an atomic GUID type, then they are going to want to use a string as a key rather than a number.
    
I think that would be MySql Server.  At any rate, in code many of us
currently use the .Net GUID class to model the guid attribute.  This is
compatible with the python UUID class.
  
So which is it to be? Do you want to key off a number or a string?

GB,
Martin
    
Inside FieldWorks, GUIDs are the native unique identifiers for all objects.  This won't change in any foreseeable future.
As for the representation of GUIDs in native FieldWorks XML, we're thinking about using something like a base 64 encoding  (possibly with a leading letter to make it a valid XML ID) rather than the standard hexadecimal string with embedded dashes.  This doesn't necessarily "look" like a GUID to the casual reader of the XML, but saves some space wherever the GUID occurs in the XML.  Our current persistent data storage represents all objects as XML, whether in a single XML file or as a single table in a database, using the GUID as a key to the XML fragment representing the object.  (We haven't extensively used a database storage mechanism yet in our rewriting/refactoring work, but certainly haven't ruled it out.  So I can't say whether the GUID would be represented natively as a 128-bit number in the table, or as a string.)
Having said all that, the most important characteristic of these ids in LIFT is that they are immutable and unique, and that all implementations have an easy way to generate them and map them internally.  GUIDs fill all of these characteristics nicely, and, as far as I know, both of the major implementations of LIFT use GUIDs internally for object identifiers.  (Thus, the mapping in those programs is the identity map -- nothing to implement, which is certainly the easiest implementation!)  Trying to base the id off the lexeme form (or any other string data) is very problematic, because the spelling may change after the entry is created, and because homographs do exist, and homograph numbering may change.  My claim is that an arbitrary id must be generated, so why not use a commonly implemented and well defined standard like GUIDs?  How they're represented in the file can be specified to minimize space (like we're considering for FieldWorks), or can be specified to use the standard hexadecimal representation with internal dashes.  If being valid XML IDs is important, a leading character for the id must also be specified, since GUIDs (in either hexadecimal or base64 form) can start with a number, and XML ids cannot start with a number.
--
Steve McConnel

Martin Hosken

unread,
Dec 1, 2009, 8:18:37 PM12/1/09
to lexiconinter...@googlegroups.com
Dear Steve,

Just in case people are misunderstanding my unclear language. When I use the term GUID, I am using it to mean a specific type that is a 128bit number rather than as a way to get a string that can be ensured to be unique. For a unique string, I use the term 'unique string'. Given that using a GUID as part of a unique string generation is probably the best way to get such a 'unique string' then the concepts are linked but not synonymous. For example, program1 could get a unique string by storing a GUID + a lexeme, where the GUID is in traditional form. Program2 could get a unique string by storing half a guid (taking a greater risk of overlap) mime 64 encoded. Program3 could use a sha256 signature of the some data and store that uuencoded because they think that the probability of overlap with 128 bits is too high. None of those are of type GUID, but they are all unique strings.

> Inside FieldWorks, GUIDs are the native unique identifiers for all
> objects. This won't change in any foreseeable future.
> As for the representation of GUIDs in native FieldWorks XML, we're
> thinking about using something like a base 64 encoding (possibly with a
> leading letter to make it a valid XML ID) rather than the standard
> hexadecimal string with embedded dashes.

I agree that the most friendly way to pass GUIDs around in LIFT would be as 64 bit mime encoded. Since the ids aren't XML IDs per se, then there is no need for a leading character.

> certainly haven't ruled it out. So I can't say whether the GUID would
> be represented natively as a 128-bit number in the table, or as a string.)
> Having said all that, the most important characteristic of these ids in
> LIFT is that they are immutable and unique, and that all implementations
> have an easy way to generate them and map them internally.

You missed the all important word before the comma in the last sentence: whether they are immutable and unique *strings* or 128 bit *GUIDs*.

> implementation!) Trying to base the id off the lexeme form (or any
> other string data) is very problematic, because the spelling may change
> after the entry is created, and because homographs do exist, and
> homograph numbering may change.

Agreed. Although currently many use lexeme + guid.

> My claim is that an arbitrary id must
> be generated, so why not use a commonly implemented and well defined
> standard like GUIDs? How they're represented in the file can be
> specified to minimize space (like we're considering for FieldWorks), or
> can be specified to use the standard hexadecimal representation with
> internal dashes. If being valid XML IDs is important, a leading
> character for the id must also be specified, since GUIDs (in either
> hexadecimal or base64 form) can start with a number, and XML ids cannot
> start with a number.

Nobody is arguing about the best way to get a unique id. The question is whether that id must be possible to be transformed by an application using a standard algorithm, into an internal 128 bit representation. Or whether it is sufficient to always use the string form for keying.

So which does FLEX need?

GB,
Martin

Stephen McConnel

unread,
Feb 17, 2010, 5:14:50 PM2/17/10
to lexiconinter...@googlegroups.com
This thread seems to have died in early December without coming to a
resolution, which may be partially my fault. Since then, I had a face
to face discussion with Martin Hosken when he was in Dallas recently.
After some more thought, and after talking it over with some colleagues
on the FieldWorks team, this is the position (and proposal) that I would
like to make regarding id and guid attributes for LIFT elements.

The guid attributes should disappear, and the id attribute values should
be based purely on the object's guid value. Later, I'll provide a pair
of C# methods that would allow those id values to shrink to either 22 or
23 characters while fully representing the 128-bit guid and also being
totally valid XML ID values. (The normal hexadecimal representation of
a guid with embedded hyphens takes 36 characters, plus possibly a 37th
character if the first hexadecimal character is not a letter.)

Recognizing the desire behind Martin's proposal (and the current
behavior in practice although not specified), I further propose that
relation elements (those that provide actual links between entry and
sense elements) have an additional (optional) attribute to supplement
the required ref attribute. This attribute, which could be named "name"
or "target" or possibly have a target-specific name like "form" or
"gloss", would provide the most relevant human readable value from the
target object. For entries, this would be the lexical-unit form in the
(primary) vernacular language. For senses, this would be the gloss in
the (primary) analysis language (or possibly the definition if the gloss
is empty). This additional attribute would not be used by the program
since the ref attribute provides the formal link, but would allow a
human scanning the file to have at least some understanding of what the
target is, which is all that the alternative type of id could provide
anyway. The advantage of this approach is twofold: programs don't need
to remember anything for the id values but the guids that they are
already using, and the human readable values displayed in the relation
elements can change when the actual values do change. Admittedly, those
changes aren't necessarily frequent, but even some lexeme forms change
over time as different decisions are made for spelling choices, or what
are primary forms as opposed to allomorphs.

As promised earlier, here are a pair of C# methods that convert guids
into the shortest (UTF-8) strings that both represent all the bits in
the guid and form valid XML ID values. If the latter requirement is
dropped, then the methods could be simplified in obvious ways.

public static string ToLiftId(Guid guid)
{
string id = Convert.ToBase64String(guid.ToByteArray());
// -, ., and _ are valid ID chars, but only _ of those
can start an ID
id = id.Replace('+', '-');
id = id.Replace('/', '.');
id = id.Replace("=", ""); // eliminate useless
padding for now.
if (Char.IsDigit(id[0]) || id[0] == '-' || id[0] == '.')
id = '_' + id; // make it a valid XML ID
value
return id;
}

public static Guid FromLiftId(string id)
{
if (id.StartsWith("_"))
id = id.Substring(1); // strip leading nonsense
character
id = id.Replace('-', '+'); // convert back to native
Base64 value
id = id.Replace('.', '/');
byte[] data = Convert.FromBase64String(id + "=="); //
length must be multiple of 4
return new Guid(data);
}

John Hatton

unread,
Feb 17, 2010, 6:56:35 PM2/17/10
to lexiconinter...@googlegroups.com
Steve,
Thanks for writing this up. Question, are you proposing that both guids and
compressed guids may be used? Or just the latter. If the later, can you
help me understand the benefit from going from 37 to 22 characters? If it
were, to, say, 8 characters, I'd get it.

John Hatton
SIL Papua New Guinea, Palaso, & SIL International Software Development

Martin Hosken

unread,
Feb 18, 2010, 12:11:37 PM2/18/10
to lexiconinter...@googlegroups.com
Dear Steve,

> The guid attributes should disappear, and the id attribute values should
> be based purely on the object's guid value. Later, I'll provide a pair
> of C# methods that would allow those id values to shrink to either 22 or
> 23 characters while fully representing the 128-bit guid and also being
> totally valid XML ID values. (The normal hexadecimal representation of
> a guid with embedded hyphens takes 36 characters, plus possibly a 37th
> character if the first hexadecimal character is not a letter.)

What you are saying, I think, and from our face to face discussions, is that you want an id to be some kind of representation of a 128 bit number. Thus arbitrary unique strings are not appropriate.


>
> Recognizing the desire behind Martin's proposal (and the current
> behavior in practice although not specified), I further propose that
> relation elements (those that provide actual links between entry and
> sense elements) have an additional (optional) attribute to supplement
> the required ref attribute. This attribute, which could be named "name"
> or "target" or possibly have a target-specific name like "form" or
> "gloss", would provide the most relevant human readable value from the
> target object.

Or is a comment sufficient for this? If we want an informative attribute here, I'm happy with that too. What do people think?

GB,
Martin

Stephen McConnel

unread,
Feb 19, 2010, 12:12:53 PM2/19/10
to lexiconinter...@googlegroups.com
The proposal for representation of guids is independent from the
proposal to use guid based ids. In LIFT, the number of elements that
have a specific id is small enough that the space savings may not
justify going from 36 (or 37 for valid XML ID) to 23/24 characters in
the id fields, especially since it generally affects only the file
storage. (We've lived for years with using the standard guid
representation (prefixed with a letter, usually "I") for ids in the XML
files.) Now that we're using XML as our basic storage representation,
and expressing not only reference links but ownership links with a guid,
and since every model object has an id, using a shorter representation
in the XML is attractive for the space saving in memory. (The XML for
objects lives in memory until the object is actually accessed after the
project is loaded.) I don't really care how the guids are represented,
but I do want an agreement on how that is done.

If we do agree on guid based ids, and on using the normal guid
hexadecimal-with-hyphens representation, could we agree on whether the
alphabetic characters are uppercase or lowercase? It makes no
difference for interpreting the string, but keeping the case consistent
facilitates searching when viewing the LIFT file in a text editor.
--
Steve McConnel

Stephen McConnel

unread,
Feb 19, 2010, 12:18:14 PM2/19/10
to lexiconinter...@googlegroups.com

A comment would be sufficient. Doing it that way would facilitate
giving more specific information, so it might be even better.
--
Steve McConnel

John Hatton

unread,
Feb 19, 2010, 10:02:57 PM2/19/10
to lexiconinter...@googlegroups.com
> If we do agree on guid based ids, and on using the normal guid
hexadecimal-with-hyphens representation, could we agree on whether the
alphabetic characters are uppercase or lowercase?

Ok, for raw guids, my brief research suggests lowercase.

Reply all
Reply to author
Forward
0 new messages