General Comments

Paul Jason Clegg

unread,

Jan 31, 1992, 2:44:36 PM1/31/92

to

Okay, I don't have much time here today, so I'm going to just wrap up in a big
bundle my opinions on the last 11 articles I read.

As for the entries... Good stuff, I like 'em. To Loren Haarsma in part-
icular, entries on radio shows are fine, I think, as long as they're real.

Now, for the point about not worrying about how we organize them all, I DO
have a problem with what I've been hearing. Sure, most of the problem lies
with the readers to be used, but storage is another thing also. Should we
store 100 entries in one big file? In chronological order? Alphabetical
order? Or should we just keep each entry in its own text file, so that it's
even more generic? Just writing this, I might be able to see the following
possibility: On the FTP sites, have a subdirectory for each year, then have
a subdirectory for the specific date of each update. Inside this subdir,
you can find 100 articles, in separate files. Does that sound good?

However, even if we decide to forgo the organization thing, there should
DEFINITELY be editors between initial posting and final archiving. The
article on Geneva, I think, is good proof of that (No offense to, Geraldo,
was it? It's just that it really needs an editor's touch-up.)

I'll re-post my copy of the "quasi-official" Format guide. Comment on it NOW,
or forever hold your peace.

I REALLY don't like the idea of putting fake articles with real articles. How-
ever, as long as they're not stored together, I don't oppose them using the
same style/format. The question is: Is it worth having a boolean fake/real
header in each article? If you add fake articles to your home database,
do you think that's worth the extra space it's going to take up in other
people's databases? Of course, there's ways out of that as well.

Someone mentioned keywords. This, I think is sufficiently covered by the %i
header, which stands for index, which is what came of the "permutation"
file idea I came up with last December was it? (At least I think I thought of
it, although it was definitely inspired by all... :)

Oh well, see you all later... (We still have to work on management... :)

...Paul

Paul Jason Clegg

unread,

Jan 31, 1992, 2:46:34 PM1/31/92

to

Writing Guide for Entries
For Publication in
The HitchHiker's Guide to the Known Galaxy

Entries in the HHGTTKG have a very specific format to allow for
easy parsing by third-party readers. Although we will accept entries that
do not conform with the following standard (standard as of 12-12-91), we
(the editors of the Guide) ask that potential field researchers and towel
owners try to put their entries in as close to true form as possible, making
it a lot easier on our editors.

Sample Format:

%t Title
%s summary
%a author
%d date (in yyyymmdd format)
%x xrefs (unlimited number)
%i index (unlimited number)
%e Entry beginning
This is a sample entry.
%e Entry end.

%t Title: Should not wrap around an 80-column screen, if possible. This
is the subject title of the entry. Names should be put in
last, first middle fashion.

%s Summary: Should not wrap around an 80-column screen, if possible. This
is just a short one-liner about the entry, without going into
too much detail.

%a Author: Your name, in straight first-middle-last fashion.

%d Date: The date you wrote the article. The fashion is yyyymmdd, so
December 25, 1991 would be written 19911225. Single-digit
numbers should be filled out with zeros. July 4, 1991 would
be 19910704, NOT 199174!

%x XRefs: Cross references. Your entry can contain any number of logical
cross references. Each should be on a separate line starting
with the %x marker. These lines should contain the names of
ACTUAL ENTRY TITLES.

%i Indexes: These are names by which the Title might also be known by.
Your entry may contain any number of these, so long as each is
on a separate line marked by the %i marker. For entries on
people, indexes that should definitely be included are their
full names in first-middle-last format, their first-last names,
their last-first names, and so on.

%e Entry Markers: These indicate the beginning and ends of the entries.
These markers should occupy a single line by themselves. Every-
thing after the first %e is the entry, and the entry is ended by
another %e on a separate line.

ENTRY FORMATTING:

For readability, space considerations, and technical reasons, the
editors of the Guide ask that your entries be written in a specific format.

Tabulation: The beginning of every paragraph should start with a 5 space
tab, and not just a tab character.

Paragraph spacing: There should be an extra blank line between paragraphs,
but NOT after the final paragraph, which should be followed by the %e.

Lists: If, for some reason, you find a list necessary for your entry, it
should be spaced in from the left 10 characters, and begin with a
number, followed by a ) . Thus, the 3rd entry in a list would begin:

3) This is the third part of a list.

If, for some reason, your list goes to 1000, do NOT add commas in the
number. Also, there should be a blank line just before the first
list entry, and just after the last entry, as seen in the example
above.

Underlining: Things that should be underlined (like book names) should be
preceded by an underscore "_" and then ended with an underscore "_",
but there should NOT be underscores between words. Thus, the name
of The HitchHiker's Guide to the Galaxy would be written as _The
HitchHiker's Guide to the Galaxy_, and NOT _The_Hitch.. etc...

Hyphenation: Words should NOT be hyphenated at the end of a line. If it
doesn't fit, just move the whole thing to the next line.

Final Notes:
That about covers it for now. Note that this is Alpha version 1x10 to
the negative googolplexplex. (Yes, that's a real number). Until later,
Don't forget your towel!

...Paul

Richard Betel

unread,

Feb 1, 1992, 2:01:25 AM2/1/92

to

In article <=6!sr...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>
>Now, for the point about not worrying about how we organize them all, I DO
>have a problem with what I've been hearing. Sure, most of the problem lies
>with the readers to be used, but storage is another thing also. Should we
>store 100 entries in one big file? In chronological order? Alphabetical
>order?

This is totally reader implementation dependant. If your
program is faster with 100 files, thats fine. Mine will be with a
single file. Chronological vs Alphabetical is meaningless. Once again,
it depends on your implementation. It may infact be neither. (mine
makes no guarantees either way.)

> Or should we just keep each entry in its own text file, so that it's
>even more generic? Just writing this, I might be able to see the following
>possibility: On the FTP sites, have a subdirectory for each year, then have
>a subdirectory for the specific date of each update. Inside this subdir,
>you can find 100 articles, in separate files. Does that sound good?
>

Actually, no. Generally, by writting your own index system and
looking stuff up in a single file, you get better throughput than by
relying on the operating system's directory search. The directory
search is a linear search. The os cannot make assumptions about
ordering, and hence it is very slow. (For those who think unix is
otherwise, you'll be interested to know that ls sorts its output
before displaying it).
Like I said, these issues are up to the software implementor,
not the general public. Even if you set up guidelines, I am going to
break them, as I do beleive that I can do better. That may sound
conceited, but as long as you discuss "alphabetical" vs "chronological",
while I am talking "Hashed" vs "Btree", I have no reason to listen to
your ideas...

>However, even if we decide to forgo the organization thing, there should
>DEFINITELY be editors between initial posting and final archiving. The
>article on Geneva, I think, is good proof of that (No offense to, Geraldo,
>was it? It's just that it really needs an editor's touch-up.)

I agree on this completely. The editor's job should be to
correct spelling and gramatical errors, and ensure that the article
follows the accepted distribution format.

>
>I'll re-post my copy of the "quasi-official" Format guide. Comment on it NOW,
>or forever hold your peace.
>

All seems perfect, with 2 exceptions. I think that the
factuality index is a good idea. Filters based on it are easily
implemented, so people not interested don't need to worry about them.
Next, I would really like to see the editor assign each
article a unique id number. Perhaps %m. A 8 digit hex number should be
more than ample, and it fits in a longint. This will allow you in the
future to say "delete article $AFEB1000. It is obsolete." as opposed
to "delete the article 'Famous hacks' by Richard Betel, written Aug.
4, 1992. " Should you ever wish to have a distribution format that
permits automatic update of the entire database, this will be an
asset.
By the way, I really like the concept of an automatic update.
I imagine the editor-in-chief accumulating articles, and once a month
he would run a program that would generate some files that would
provide all the commands necessary to bring anyone else's database up
to date. These files would be in ASCII, and would be capable of
specifying the deletion of old articles, addition of %i's and %x's for
existing articles, and the addition of new articles. I intend to
attempt to design such a protocol, perhaps this week end (depends on
how my studies in Combinatorics and Data Communications go).

>I REALLY don't like the idea of putting fake articles with real articles. How-
>ever, as long as they're not stored together, I don't oppose them using the
>same style/format. The question is: Is it worth having a boolean fake/real
>header in each article? If you add fake articles to your home database,
>do you think that's worth the extra space it's going to take up in other
>people's databases? Of course, there's ways out of that as well.

I really don't think it is a major problem. First of all,
there will be a few articles from the book that some people will want
in the system, just to make it seem authentic (eg: flying). Also, as I
said, if the reader systems are properly designed, they should be
capable of recognizing fake articles and never even putting them in
the database. It therefore does not need to take up space on other
people's machines.
If you don't want a %f type switch, but accept my idea for the
unique ID, then we could say "all articles wit ID greater than
$80000000 are considered fake". That way, we could appoint a second
editor, who will handle only the fictitious ~rarticles. He can then
assign unique IDs without coordinating with you. You are free to do
your work, he does his, and all software can distinguish between the
real and fake.
This way, we can identify the fake articles not only by their
ID, but also by who provided the update files. You could then ingore
fakes by killing any updates originating from the
fakes-editor-in-cheif's mail address.

>
>Someone mentioned keywords. This, I think is sufficiently covered by the %i
>header, which stands for index, which is what came of the "permutation"
>file idea I came up with last December was it? (At least I think I thought of
>it, although it was definitely inspired by all... :)

Yes. I didn't know about %i. They more than fill the need I
was trying to fullfill with the keywords idea. Good thing you decided
to post the distribution format.

To all those interested, I will post a new version of my 2
programs no later than next friday. They will handle the complete and
correct distribution format, though I may still do nothing with the
Xrefs, as I still haven't decided how to use them. Has no one tried
them, or are they so uninteresting that no one has any comments?
(BTW, yes, that was a plug! :-)

--- MTO hbetel@watserv1
Reality as you never expected to know it!

Inge Harkestad

unread,

Feb 1, 1992, 11:58:08 AM2/1/92

to

Paul writes:

> I'll re-post my copy of the "quasi-official" Format guide. Comment on it NOW,
> or forever hold your peace.

and then:

> For readability, space considerations, and technical reasons, the
> editors of the Guide ask that your entries be written in a specific format.
>
> Tabulation: The beginning of every paragraph should start with a 5 space
> tab, and not just a tab character.
>
> Paragraph spacing: There should be an extra blank line between paragraphs,
> but NOT after the final paragraph, which should be followed by the %e.

One question: Why should every paragraph start with the 5 space tab?

My opinion is that there should be no preceding spaces at the beginning
of a paragraph. First of all: 5 "useless" spaces in each paragraph
will increase the file sizes. Also, most people don't use tabs at the
beginning of paragraphs when submitting articles to this newsgroup, so
why should they use them when submitting articles to the Guide? Most
people would probably think it a bother.

It will be very simple for a reader to add 5 spaces at the beginning of
each paragraph (The paragraphs will be identified by the preceding blank
line).

--

// // // // //
// ___ ___ ___ //__// ___ ___ // ___ ___ _/_ ___ ___//
// // // // // //_// // // ___// // //_// //_// //__ // ___// // //
// // // //_// //__ // // // // // // || //__ __// //_ // // //_// //
//
__// ihar...@lise.unit.no | Norwegian Institute of Technology (NTH)
--- o ---
--- `` I never really found myself until I lost my mind '' ---

Paul Jason Clegg

unread,

Feb 1, 1992, 1:57:12 PM2/1/92

to

In article <1992Feb1.0...@watserv1.waterloo.edu> hbe...@watserv1.waterloo.edu (Richard Betel) writes:
>In article <=6!sr...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>>
>>Now, for the point about not worrying about how we organize them all, I DO
>>have a problem with what I've been hearing. Sure, most of the problem lies
>>with the readers to be used, but storage is another thing also. Should we
>>store 100 entries in one big file? In chronological order? Alphabetical
>>order?
> This is totally reader implementation dependant. If your
>program is faster with 100 files, thats fine. Mine will be with a

>> Or should we just keep each entry in its own text file, so that it's

>>even more generic? Just writing this, I might be able to see the following
>>possibility: On the FTP sites, have a subdirectory for each year, then have
>>a subdirectory for the specific date of each update. Inside this subdir,
>>you can find 100 articles, in separate files. Does that sound good?
>>
> Actually, no. Generally, by writting your own index system and
>looking stuff up in a single file, you get better throughput than by

Look, you'rre not reading what I wrote... I said I don't care how you want to
do it on your home system, but (emphasis:) FOR THE PURPOSE OF STORAGE ON FTP
SITES, we need to think of the best way to make the entries available to all.
It's generally easier to combine files than split files, so wouldn't it be
better to at least STORE FOR PUBLIC ACCESS the files in single entry-chunks?
Then, if someone's come up with a reader that works better with single files,
there's no problem with trying to separate them all.

> All seems perfect, with 2 exceptions. I think that the
>factuality index is a good idea. Filters based on it are easily
>implemented, so people not interested don't need to worry about them.
> Next, I would really like to see the editor assign each
>article a unique id number. Perhaps %m. A 8 digit hex number should be
>more than ample, and it fits in a longint. This will allow you in the
>future to say "delete article $AFEB1000. It is obsolete." as opposed
>to "delete the article 'Famous hacks' by Richard Betel, written Aug.
>4, 1992. " Should you ever wish to have a distribution format that
>permits automatic update of the entire database, this will be an
>asset.

The only problem I have here is this: We want the Guide to be open ended, not
stuck with a closed number of entries. I can agree to having the "factuality
index", but it should be a true/false or "fact/not fact" switch, not a true
index, but a marker. Even with this involved, I think it would be best to
make sure that on the FTP end, the files are kept separate, so that serious
Hikers don't have to waste time with xfers, or deletions, etc...

>unique ID, then we could say "all articles wit ID greater than
>$80000000 are considered fake". That way, we could appoint a second
>editor, who will handle only the fictitious ~rarticles. He can then
>assign unique IDs without coordinating with you. You are free to do
>your work, he does his, and all software can distinguish between the
>real and fake.

This can be done quite easily if the two are kept separate.

I'd like to think we're mmaking progress here? Are we?

...Paul

Paul Jason Clegg

unread,

Feb 1, 1992, 2:03:31 PM2/1/92

to

In article <1992Feb1.1...@ugle.unit.no> ihar...@Lise.Unit.NO (Inge Harkestad) writes:

>Paul writes:
> > For readability, space considerations, and technical reasons, the
> > editors of the Guide ask that your entries be written in a specific format.
> >
> > Tabulation: The beginning of every paragraph should start with a 5 space
> > tab, and not just a tab character.
> >
> > Paragraph spacing: There should be an extra blank line between paragraphs,
> > but NOT after the final paragraph, which should be followed by the %e.
>
>One question: Why should every paragraph start with the 5 space tab?
>
>My opinion is that there should be no preceding spaces at the beginning
>of a paragraph. First of all: 5 "useless" spaces in each paragraph
>will increase the file sizes. Also, most people don't use tabs at the
>beginning of paragraphs when submitting articles to this newsgroup, so
>why should they use them when submitting articles to the Guide? Most
>people would probably think it a bother.
>
>It will be very simple for a reader to add 5 spaces at the beginning of
>each paragraph (The paragraphs will be identified by the preceding blank
>line).

Interesting point. My reason for putting it in there is for readability.
There will be people who will be reading entries without software, and it's
alot easier on home-cooked programs to just dump the lines of the entry to
the screen than to worry about major formatting problems. It's generally
accepted good writing style to indent paragraphs, five spaces being about
average (some even indent ten). A reader could just add 5 spaces to the
beginning of a paragraph, because it's decked out with blank lines, but then
what are you going to do about word wrap? And, I think the reason no-one
indents on the net is because they're probably just quickly using a text
editor for their replies, and it's informal, so they're not worried about
style. I don't want to see an informal Guide, and I don't think anyone else
does, either.

...Paul

Richard Betel

unread,

Feb 1, 1992, 4:59:29 PM2/1/92

to

In article <34#s=7...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>In article <1992Feb1.0...@watserv1.waterloo.edu> hbe...@watserv1.waterloo.edu (Richard Betel) writes:
>> Actually, no. Generally, by writting your own index system and
>>looking stuff up in a single file, you get better throughput than by
>
>Look, you'rre not reading what I wrote... I said I don't care how you want to
>do it on your home system, but (emphasis:) FOR THE PURPOSE OF STORAGE ON FTP
>SITES, we need to think of the best way to make the entries available to all.
>It's generally easier to combine files than split files, so wouldn't it be
>better to at least STORE FOR PUBLIC ACCESS the files in single entry-chunks?
>Then, if someone's come up with a reader that works better with single files,
>there's no problem with trying to separate them all.
>

you're right. I wasn't really reading what you were saying.
OK> FTP sites. Yes, separating factual and fictional aricles in two
directories is a good idea. However, articles in separate files is a
bad idea. Articles won't tend to be very long, maybe four pages. That
will probably require about 4 and a hlf 1k clusters. That means that
every article will waste 20% of the space it occupies. Then, by
putting each article into a separate file you also use alot of space
for directory data, plus many inodes. In short, your HHGTTKG will
occupy alot of space real fast. Also, any distribution format should
not be dependant on HOW you received it. It should be possible to
download it from an FTP server, receive it by mail, or by USENET.
If you post a single article for every new article submitted
to the guide, you'll get damned tired, people will get annoyed at
having to handle each article individually, and I am sure someone will
complain about the network bandwitdth used. Packaging into a single
file is much better. Articles can be handled in batches, posted in a
single piece, and archived as a single file that will waste a minimum
of the space it occupies.

>> All seems perfect, with 2 exceptions. I think that the
>>factuality index is a good idea. Filters based on it are easily
>>implemented, so people not interested don't need to worry about them.
>> Next, I would really like to see the editor assign each
>>article a unique id number. Perhaps %m. A 8 digit hex number should be
>>more than ample, and it fits in a longint. This will allow you in the
>>future to say "delete article $AFEB1000. It is obsolete." as opposed
>>to "delete the article 'Famous hacks' by Richard Betel, written Aug.
>>4, 1992. " Should you ever wish to have a distribution format that
>>permits automatic update of the entire database, this will be an
>>asset.
>
>The only problem I have here is this: We want the Guide to be open ended, not
>stuck with a closed number of entries. I can agree to having the "factuality
>index", but it should be a true/false or "fact/not fact" switch, not a true
>index, but a marker. Even with this involved, I think it would be best to
>make sure that on the FTP end, the files are kept separate, so that serious
>Hikers don't have to waste time with xfers, or deletions, etc...

Ok. A boolean factuality marker seems just as good as an index
, specially since the index's value will be really subjective. And
ofcourse, it is real easy to split an archive up based on a single
boolean. (with an index we'd need 10 files or something)
So let's go for the factuality switch...
As for leaving the HHGTTKG open-ended, we could take 2
attitudes:
1) It is not likely that we'd use up 4.295*10^9 articles.
Specially when we can recycle the IDs of deleted articles.
2) we can use a string as the ID. EG: the first article's ID
would be 1, then 2 then ...9 then a then ...f. Then 11 then 12 etc..
If we store it as a string instead of a long, it is open ended. An 8
character ID string gives us at least 4 billion articles, so we can
choose a fairly small length of string and assume a maximum.
Either way, we now have the problem of numbering. Obviously,
the editor will assign the unique ID number, but if we have 2 editors,
one for fact, one for fiction, they MUST coordinate, as the articles
must be capable of co-existing.
I suggest that we assign number spaces (assume a fairly large
number, like 6 digit) and then when those are filled, assign more. Or
we could agree that all factual articles are even-numbered, and all
fictional articles are odd-numbered.

>
>>unique ID, then we could say "all articles wit ID greater than
>>$80000000 are considered fake". That way, we could appoint a second
>>editor, who will handle only the fictitious ~rarticles. He can then
>>assign unique IDs without coordinating with you. You are free to do
>>your work, he does his, and all software can distinguish between the
>>real and fake.
>
>This can be done quite easily if the two are kept separate.
>

Not if they co-exist in the same database. IF you accept
fictional articles at all, you have to accept them in the same
database. Otherwise, why bother acknoledging them at all? We might as
well have to factions, each claiming that their instance of the guide
is the better, or true or whatever. From there, it is easy to see
everything just collapsing. United we stand...

>I'd like to think we're mmaking progress here? Are we?
>

Yes, I think we are. Different, isn't it.

Paul Jason Clegg

unread,

Feb 2, 1992, 2:51:16 PM2/2/92

to

In article <1992Feb1.2...@watserv1.waterloo.edu> hbe...@watserv1.waterloo.edu (Richard Betel) writes:
>In article <34#s=7...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>
> you're right. I wasn't really reading what you were saying.
>OK> FTP sites. Yes, separating factual and fictional aricles in two
>directories is a good idea. However, articles in separate files is a
>bad idea. Articles won't tend to be very long, maybe four pages. That
>will probably require about 4 and a hlf 1k clusters. That means that
>every article will waste 20% of the space it occupies. Then, by
>putting each article into a separate file you also use alot of space
>for directory data, plus many inodes. In short, your HHGTTKG will
>occupy alot of space real fast. Also, any distribution format should
>not be dependant on HOW you received it. It should be possible to
>download it from an FTP server, receive it by mail, or by USENET.

Okay, I can see your point. If, for some reason, someone needs separate files
for separate entries, let them do it. It's easily done in a programming
language; at the time I was thinking solely in terms of OS stuff like cat or
copy. Okay, so we'll do the 100-file chunk idea?

> Ok. A boolean factuality marker seems just as good as an index
>, specially since the index's value will be really subjective. And
>ofcourse, it is real easy to split an archive up based on a single
>boolean. (with an index we'd need 10 files or something)
> So let's go for the factuality switch...

Okay, %f will be the header, followed by a FAKE or REAL? ie:

%f FAKE

> 1) It is not likely that we'd use up 4.295*10^9 articles.
>Specially when we can recycle the IDs of deleted articles.

Recycling IDs is a task I'd rather not have to even have nightmares about.

> 2) we can use a string as the ID. EG: the first article's ID
>would be 1, then 2 then ...9 then a then ...f. Then 11 then 12 etc..
>If we store it as a string instead of a long, it is open ended. An 8
>character ID string gives us at least 4 billion articles, so we can
>choose a fairly small length of string and assume a maximum.
> Either way, we now have the problem of numbering. Obviously,
>the editor will assign the unique ID number, but if we have 2 editors,
>one for fact, one for fiction, they MUST coordinate, as the articles
>must be capable of co-existing.
> I suggest that we assign number spaces (assume a fairly large
>number, like 6 digit) and then when those are filled, assign more. Or
>we could agree that all factual articles are even-numbered, and all
>fictional articles are odd-numbered.

I don't follow your idea of "number spaces"? Why not just leave it open
ended? It's not hard to convert a string of numbers (even hex) into a true
number... Heck, we could use the entire alphabet, and get a base 36 system!
I like the odd-number idea... I think fake articles should be ODD... :)
The only problem I see here is that this means that there can be only ONE
editor to assign numbers... More management problems again... ("Yes, sir,
I need your signature in triplicate over here, and I need you to stamp a new
number on this entry... " :)

However, if we assign "odd numbers" to fake entries, then why bother with a
factuality index? Just look at the last digit of the number and see if it's
a 1, 3, 5, 7, or 9 (if we use base 10 system), and you'll find out if it's
fake or not. So we need a number header, but NOT a factuality index--it's
self-contained within the number... Neat, huh? :)

...Paul (Juices flowin' today...)

Richard Levitte

unread,

Feb 3, 1992, 6:06:37 AM2/3/92

to

>>>>> On 1 Feb 92 19:03:31 GMT, cle...@aix.rpi.edu (Paul Jason Clegg) said:

Paul> Interesting point. My reason for putting it in there is for readability.
Ahmmmm.... According to most typesetting styles (that I know of), the following
rules are used for paragraphs:

- have a blank line between paragraphs and NO indentation at the
beginning

XOR

- have NO blank line between paragraphs and indent at the beginning!

The post I here follow up is a perfact example of the first variant...

--
!+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++!
! Richard Levitte, System manager ! tel: int+46-8-790 64 23 !
! Royal Institute of Technology ! fax: int+46-8-791 76 54 !
! Department of Teletransmition Theory ! Internet: ric...@ttt.kth.se !
! S-100 44 Stockholm, Sweden ! !
!---------------------------------------------------------------------------!

Inge Harkestad

unread,

Feb 3, 1992, 11:35:08 AM2/3/92

to

In article <-6_s...@rpi.edu>, cle...@aix.rpi.edu (Paul Jason Clegg) writes:
> Okay, so we'll do the 100-file chunk idea?

I think we should. Seperate files for each article is not a good idea.

> I like the odd-number idea... I think fake articles should be ODD... :)
> The only problem I see here is that this means that there can be only ONE
> editor to assign numbers... More management problems again... ("Yes, sir,
> I need your signature in triplicate over here, and I need you to stamp a new
> number on this entry... " :)

I don't think this is a big problem. If all files are to be stored at the
same place, it will be easy to let the program that'll insert (or append)
the new articles into the files assign the numbers as well. Someone has
to be responsible of running this program with regular intervals, and no
others should be able to be start it (due to safety and the synchronization
problem).

If the files are to be scattered around then each editor would have to
contact one "Number Assignment Editor" and request numbers for, say
"14 fake articles and 28 real ones". The N.A.Ed. then would grant the
requesting editor 42 numbers (of which 14 are odd and 28 are even) to
be divided among the 42 articles.

> However, if we assign "odd numbers" to fake entries, then why bother with a
> factuality index? Just look at the last digit of the number and see if it's
> a 1, 3, 5, 7, or 9 (if we use base 10 system), and you'll find out if it's
> fake or not. So we need a number header, but NOT a factuality index--it's
> self-contained within the number... Neat, huh? :)

Neat! I like it!

One point about using a program to assign numbers: The program needs to
know whether the article is fake or real, so I suggest that the author
uses the %f (FAKE/REAL) switch and that the program replaces it with an
odd/even number.

Tobias Oetiker

unread,

Feb 3, 1992, 2:01:13 AM2/3/92

to

Assining Hex Numbers to uniquifie articles ... How about some
number derived from current Data/Time ... the chance to have two equal
is remte if we take seconds.

The advantage would be, taht there could be several editors ...

tobi.
.

Inge Harkestad

unread,

Feb 5, 1992, 6:09:51 AM2/5/92

to

In article <1992Feb4.0...@watserv1.waterloo.edu>,

hbe...@watserv1.waterloo.edu (Richard Betel) writes:
> >Recycling IDs is a task I'd rather not have to even have nightmares about.
>

> The problem is that not everyone will be completely up
> to date on installing the latest articles. So if you recycle an ID too
> soon, it may cause some trouble.
> So lets make a general rule: IDs are not recycled. ( I am
> assuming that we are hashing out an ID policy here.)

I'm supporting you 100% there.

> >I don't follow your idea of "number spaces"? Why not just leave it open
> >ended?

> OK. Suppose [...f.fwd...]
> In otherwords, the two editors are each assigned a pool of IDs
> they can use. When you use up a pool, you get another.
> I think in terms of pools cause if the the IDs are open-ended,
> you can't assign half of infinity to one editor, and half to the
> other. So you take a finite piece, and work with it. When you use it
> up, you choose another finite piece. Capiche?

The problem with the pool assignment idea is the administrative costs
of requesting/assigning pools. Open ended IDs can be implemented with
no central administration like this:

Any editor has a unique number (1,2,3,...) which is included in the
article ID. Editors whose number is odd deal with 'fake' articles and
editors with even numbers deal with 'real' articles. The format of
the IDs may thus be: (Editor ID)-(Article ID)

To clarify, here is an example: A real article called 'Karate' that is
sent to editor 22 gets the ID '22-438' if it is the 438th article that
this editor receives.

Any editor that replaces another will of course get the old editor's
ID, and new editors simply get the next available number. If there
are for instance 54 editors (27 editors for 'real' files and 27 for
'fake' files) and editor 22 (who is in charge of "real K's") resigns,
his successor also uses the 22 ID, but if there is a need for an extra
editor dealing with "real K's", his ID will be 56 (not 55!)

JPM...@psuvm.psu.edu

unread,

Feb 4, 1992, 1:47:47 PM2/4/92

to

Wouldn't a block-style with a LF/CR be easier to read on more different machine
s? That gets my vote. 5 space tabs are both wasteful of space and difficult t
o read.

..................................................................
| "There is no great genius X John Meredith |
| without some degree of madness." X Penn State |
| --Aristotle X <JPM...@psuvm.psu.edu> |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Paul Jason Clegg

unread,

Feb 4, 1992, 4:14:13 PM2/4/92

to

In article <1992Feb4.0...@watserv1.waterloo.edu> hbe...@watserv1.waterloo.edu (Richard Betel) writes:

>In article <-6_s...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
> So lets make a general rule: IDs are not recycled. ( I am
>assuming that we are hashing out an ID policy here.)

Check.

>>I like the odd-number idea... I think fake articles should be ODD... :)
>>The only problem I see here is that this means that there can be only ONE
>>editor to assign numbers... More management problems again... ("Yes, sir,
>>I need your signature in triplicate over here, and I need you to stamp a new
>>number on this entry... " :)

> That is why I suggested numberspaces. The same concept can be
>used to divide the numbering problem among 12 editors.
> The actual numbering won't be too bad. just keep a little file
>with a single number in it, and write a program that will spit out the
>number in the file, then add one to the file.
> The base 16 system is much nicer. Easier to work with...

Let's see. I'm really hesitant about the number spaces thing. Really, the
odd-numbered idea is better, because it's compact, and simple to use. I
fully understand your idea of number spaces now, thanx for the insight. How-
ever, the number space idea, I don't think should be the main organizational
tool. How about this: FAKE articles are given odd numbers, real articles
are given even numbers. Thusly, there will be two main editors, one for
fake articles, and one for real articles. Now, below these two editors
can be any number of "lesser editors", to whom "number spaces" can be given,
although the number spaces still need to conform to the odd/even standard.
So, if there's only one fake article editor, he/she keeps track on their
own, and there's no problem. If there's, say, 4 editors for the real articles,
the main editor will give the other three, perhaps, even number from 252-500,
502-750, and 752-1000, respectively (retaining the first 125 numbers for him/
herself). Problem solved. You like? :)

This SHOULDN'T mean, however, that the articles get numbers and are sent off
before a "superior editor" gets a look at it; this has yet to be resolved,
as it comes under "management"...

...Paul

Tobias Oetiker

unread,

Feb 5, 1992, 2:02:00 AM2/5/92

to

Thake my vote for block style (not 5 spaces)

tobi
oet...@iis.ethz.ch
tobias oetiker

Richard Betel

unread,

Feb 4, 1992, 1:17:08 AM2/4/92

to

In article <-6_s...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>Okay, %f will be the header, followed by a FAKE or REAL? ie:
>
>%f FAKE
>

Sounds good.

>> 1) It is not likely that we'd use up 4.295*10^9 articles.
>>Specially when we can recycle the IDs of deleted articles.
>
>Recycling IDs is a task I'd rather not have to even have nightmares about.
>

Actually, I think that it is a problem too, but for different
reasons:
Keeoing a list of once used, but now outdated IDs is
easy. Recycling just becomes a problem of adding to the list at the
right moment.

The problem is that not everyone will be completely up
to date on installing the latest articles. So if you recycle an ID too
soon, it may cause some trouble.

So lets make a general rule: IDs are not recycled. ( I am
assuming that we are hashing out an ID policy here.)

>I don't follow your idea of "number spaces"? Why not just leave it open
>ended?
OK. Suppose we are using IDs, and we have both joke and real
articles. I assume that since you are NOT interested in joke articles,
you don't want to be responsible for their editing. So I assume that
you appoint someone else to handle them.
Now, as long as your policies on submission formats are the
same, the two of you should be able to work COMPLETELY independantly,
EXCEPT that you have to make sure that you don't use the same ID
numbers. So you make an agreement before you start. For now, you say
that when a new REAL article is added to the HHG, its number will be
in the range 1..100 (example numbers) and a new JOKE article will be
in the range 101..200. These are "number spaces". Now, REAL articles
are popular, and ALOT are written. You use up all the IDs in your
numberspace. So you say "I am now going to use the numberspace
201..500."

In otherwords, the two editors are each assigned a pool of IDs
they can use. When you use up a pool, you get another.
I think in terms of pools cause if the the IDs are open-ended,
you can't assign half of infinity to one editor, and half to the
other. So you take a finite piece, and work with it. When you use it
up, you choose another finite piece. Capiche?

> It's not hard to convert a string of numbers (even hex) into a true

>ended? It's not hard to convert a string of numbers (even hex) into a true
>number... Heck, we could use the entire alphabet, and get a base 36 system!
>I like the odd-number idea... I think fake articles should be ODD... :)
>The only problem I see here is that this means that there can be only ONE
>editor to assign numbers... More management problems again... ("Yes, sir,
>I need your signature in triplicate over here, and I need you to stamp a new
>number on this entry... " :)

`Grave' Dave Gymer

unread,

Feb 5, 1992, 7:44:03 AM2/5/92

to

In article <4_-...@rpi.edu> cle...@aix.rpi.edu (Paul Jason Clegg) writes:
>Okay, then let's vote... Which is more preferred? Blank lines/no indent, or
>tabulation/no blank lines? The former is probably better... :)

Paul Jason Clegg

unread,

Feb 3, 1992, 7:48:16 PM2/3/92

to

In article <LEVITTE.92...@elixir.lne.kth.se> lev...@lne.kth.se (Richard Levitte) writes:
>>>>>> On 1 Feb 92 19:03:31 GMT, cle...@aix.rpi.edu (Paul Jason Clegg) said:
>
>Paul> Interesting point. My reason for putting it in there is for readability.
>Ahmmmm.... According to most typesetting styles (that I know of), the following
>rules are used for paragraphs:
> - have a blank line between paragraphs and NO indentation at the
> beginning
> XOR
> - have NO blank line between paragraphs and indent at the beginning!
>The post I here follow up is a perfact example of the first variant...

Okay, then let's vote... Which is more preferred? Blank lines/no indent, or

tabulation/no blank lines? The former is probably better... :)

...Paul

Paul Jason Clegg

unread,

Feb 5, 1992, 12:35:30 PM2/5/92

to

In article <1992Feb5.1...@ugle.unit.no> ihar...@Lise.Unit.NO (Inge Harkestad) writes:
>The problem with the pool assignment idea is the administrative costs
>of requesting/assigning pools. Open ended IDs can be implemented with
>no central administration like this:
>
>Any editor has a unique number (1,2,3,...) which is included in the
>article ID. Editors whose number is odd deal with 'fake' articles and
>editors with even numbers deal with 'real' articles. The format of
>the IDs may thus be: (Editor ID)-(Article ID)
>
>To clarify, here is an example: A real article called 'Karate' that is
>sent to editor 22 gets the ID '22-438' if it is the 438th article that
>this editor receives.
>
>Any editor that replaces another will of course get the old editor's
>ID, and new editors simply get the next available number. If there
>are for instance 54 editors (27 editors for 'real' files and 27 for
>'fake' files) and editor 22 (who is in charge of "real K's") resigns,
>his successor also uses the 22 ID, but if there is a need for an extra
>editor dealing with "real K's", his ID will be 56 (not 55!)
>

That sounds very plausible. You don't even have to worry about re-using
an editor's ID, if you want, so you can have a record of who edited what,
even if they leave (or die, or get sucked into a large Hoover). There
must be a "separator" between the two, like your simple dash, but then
that's fine... :)

...Paul ("Why Didn't I Think of That?")

Your Imaginary Friend

unread,

Feb 5, 1992, 10:48:20 PM2/5/92

to

I would suggest that each article be filed seperately by alphabet, not by
date. In the long run, the date is going to be totally irrelevant to what
people want to read. We'll know what is old and new, because we'll have the
old and new articles there.
It will also prevent subject duplication--which will happen if we don't have
the files organized by subject.

Alan Terlep "I've got strawberry milkshake running
Oakland University, Rochester, MI out of my ears at unpredictable
atte...@vela.acs.oakland.edu intervals."
--Jeremy Crowley