RFC: Better formatting for long descriptions

Andreas Tille

unread,

Mar 20, 2009, 9:50:05 AM3/20/09

to

Hi,

I tried to find a clear advise how to reasonable format lists inside long
descriptions of packages. The only thing I know is that lines with two
leading spaces is considered verbose. This leaves a lot of freedom to
simulate for instance itemize lists. I'd like to give some examples for
package names starting with 'a' and stopped with the first package names
of 'b'. If you are bored by these examples continue reading below the
------ line.

Package: a2ps
- various encodings (all the Latins and others),
- various fonts (automatic font down loading),
- various medias,
^^ (two spaces)

Package: acerhk-source
* controlling LEDs (Mail, Wireless)
* enable/disable wireless hardware
^^^ (three spaces)

Package: acidlab
...
o Alert management by providing constructs to logically group alerts
to create incidents (alert groups), deleting the handled alerts or
false positives, exporting to email for collaboration, or archiving of
alerts to transfer them between alert databases.
.
o Chart and statistic generation based on time, sensor, signature, protocol,
IP address, TCP/UDP ports
.
ACID has the ability to analyze a wide variety of events which are
post-processed into its database. Tools exist for the following formats:
.
o using Snort (www.snort.org)
- Snort alerts
- tcpdump binary logs
o using logsnorter (www.snort.org/downloads/logsnorter-0.2.tar.gz)
- Cisco PIX
- ipchains
--> atempt to emulate a two level itemize list
==> The upper part has only one space which is most probably not intended.

Package: addresses-goodies-for-gnustep
adgnumailconverter
A tool that will merge your GNUMail address book into the Addresses
database.
adserver
A stand-alone Addresses network server.
adtool
A command-line tool for address database manipulation.
--> atempt to simulate a description list

Package: airport-utils
For the original Apple AirPort and the Lucent RG-1000 base stations only:
- airport-config: base station configurator
- airport-linkmon: wireless link monitor, gives information on the wireless
link quality between the base station and the associated hosts
.
For the Apple AirPort Extreme base stations only:
- airport2-config: base station configurator
- airport2-portinspector: port maps monitor
- airport2-ipinspector: WAN interface monitoring utility
.
For all:
- airport-modem: modem control utility, displays modem state, starts/stops
modem connections, displays the approximate connection time (Extreme only)
- airport-hostmon: wireless hosts monitor, lists wireless hosts connected
to the base station (see airport2-portinspector for the Snow)
--> sometimes two sometimes three spaces, broken indentation for continued lines

Package: alsa-utils
o amixer: command line mixer
o alsamixer: curses mixer
--> third type of marker (we had '*' and '-')

Package: altermime
* Insert disclaimers
* Insert arbitrary X-headers
^^^^ four spaces

Package: amanda-client
Features:
^^ useless verbose
* will back up multiple machines in parallel to a holding disk, blasting
finished dumps one by one to tape as fast as we can write files to
tape. For example, a ~2 Gb 8mm tape on a ~240K/s interface to a host
with a large holding disk can be filled by Amanda in under 4 hours.
* built on top of standard backup software: Unix dump/restore, and
later GNU Tar and others.
^^^ three spaces

Package: amaya
- eXtensible HyperText Markup Language (XHTML)
- Scalable Vector Graphics (SVG)
- Math Markup Language (MathML)
- Cascading Style Sheets (CSS)
^^^^^^^^ a lot of spaces

Package: amrita
* The template for amrita is a pure html/xhtml document without
special tags like <?...?> or <% .. %>
* The template can be written by designers using almost any html
editor.
^^^ continued line with 3 spaces instead of four as it would look nicer

Package: amsynth
* two analogue-style audio oscillators, featuring:
o sine wave
o saw/triangle wave with adjustable shape
o square/pulse wave with adjustable pulsewidth
o noise generation
o "random" wave (noise with sample & hold)
o oscillator sync
o of course, detune and range control
* mixer section with ring modulation
--> another atempt to simulate two level itemizing

Package: aoetools
* aoecfg - manipulate AoE configuration strings
* aoe-discover - trigger discovery of ATA over Ethernet devices
* aoe-flush - flush the down devices out of the aoe driver
--> a description list simulated as itemize + formating

Package: aolserver4-doc
(1) The AOLserver Administrator's Guide covers the setup options
and security issues relating to running the server.
(2) The AOLserver Tcl Developer's Guide covers the Tcl API which
can be used to add features to your web pages (similar in
some respects to PHP or Microsoft's ASP)
^^ enumerate list with more than needed spaces and numbers in ()

Package: apel
poe.el emulation module mainly for basic functions and special
forms/macros of latest emacsen
poem.el basic functions to write portable MULE programs
pces.el portable character encoding scheme (coding-system) features
--> another kind of description list

Package: apg
* Built-in ANSI X9.17 RNG (Random Number Generator)(CAST/SHA1)
* Built-in password quality checking system (now it has support for Bloom
filter for faster access)
* Two Password Generation Algorithms:
1. Pronounceable Password Generation Algorithm (according to NIST
FIPS 181)
2. Random Character Password Generation Algorithm with 35
configurable modes of operation
--> itemize list with enumeration list in second level (looks OK for me)

Package: balazar
...
.
Plot:
More than a thousand years ago, the three Gods that have created the
world became too powerful for the poor mortals. Then the Elves forged
three magical scepters to control the Gods, and the Gods were
imprisoned in the magical crystal of Arkanae (during Arkanae I).
.
Though the secret of the Elven blacksmiths has not been lost as time
goes on, monsters and powers are coming back. New scepters have been
reforged, giving birth to new Gods. But who can find the scepters and
imprison them in the Arkanae, or free them for ever by dropping the
scepters in the Abyss ? Who can judge the Gods ?
^^^ if this should be a description list I see no motivation for it

Package: bbmail
* All the colors an gradients can be changed.
* Support for multiple mail boxes and provides a menu showing
all of them (and their unread/total mail count)
^^^^^ five spaces

.... and many more - I spare you the other funny formatings

---------------------------------------------------------------------

I think we should try to implement some more strict formating rules
to our long descriptions. The rationale behind this is that with some
better standard formating some tools which display descriptions on web
pages might be enhanced to use <li>, <ol> and <dl> tags which finally
makes a better reading.

I do not propose drastic changes but a start for "Best practices" might
be reasonable and perhaps some lintian warnings might help to remind
developers to move to some standard. My proposal would be:

1. Itemize lists: (<li>)
------------------------

First line: "^ * \w+"
Continued line: "^ \w+"
First line of second level: "^ - \w+"
Cont. line of second level: "^ \w+"

Example:

* first line of text blabla
which is continued here. There is a second level itemizing
- second level item blabla
which is continued here
- another second level item
* back to first level

2. Enumerate lists: (<ol>)
--------------------------

First line: "^ ?1. \w+"
Continued line: "^ ? \w+"
If there are more than 9 items: "^ 10. \w+"
(that's the reason for the optional space - perhaps we should strictly
use three spaces for items < 10)

Example:

1. first line of text blabla
which is continued here
2. second line
...
10. tenth line

3. Description lists: (<dl>)
----------------------------

First descrition: "^ \w+: +\w+"
Continued line: "^ +\w+"

Example:

field1: description of field1 which is
continued here
field17: starting the description always in the same column might
look nicer
field4711: so wie might add extra space, but the separator ':'
between field name (<dt>) and description (<dd>) should
be mandatory.

This suggestion is far from complete and should be enhanced. I have even
heard suggestion to use some markup we might know from Wikis. I'm fine with
any suggestions which has two features:

1. Defines some kind of standard which can be parsed automatically.
2. Does not break any existing tool

Kind regards

Andreas.

--
http://fam-tille.de

--
To UNSUBSCRIBE, email to debian-dev...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

martin f krafft

unread,

Mar 20, 2009, 10:10:05 AM3/20/09

to

also sprach Andreas Tille <til...@rki.de> [2009.03.20.1445 +0100]:

> I tried to find a clear advise how to reasonable format lists inside long
> descriptions of packages. The only thing I know is that lines with two
> leading spaces is considered verbose. This leaves a lot of freedom to
> simulate for instance itemize lists. I'd like to give some examples for
> package names starting with 'a' and stopped with the first package names
> of 'b'. If you are bored by these examples continue reading below the
> ------ line.

What we really should do, instead of clinging to the NIH-behaviour,
reinventing the wheel, and polishing it over and over again is ditch
the pseudo-RFC822 format we have and use Yaml instead.

http://www.yaml.org/start.html
http://yaml.org/spec/1.2/

--
.''`. martin f. krafft <madduck@d.o> Related projects:
: :' : proud Debian developer http://debiansystem.info
`. `'` http://people.debian.org/~madduck http://vcs-pkg.org
`- Debian - when you have better things to do than fixing systems

"den stil verbessern, das heißt den gedanken verbessern."
- friedrich nietzsche

digital_signature_gpg.asc

Andreas Tille

unread,

Mar 20, 2009, 10:50:12 AM3/20/09

to

On Fri, 20 Mar 2009, martin f krafft wrote:

> What we really should do, instead of clinging to the NIH-behaviour,
> reinventing the wheel, and polishing it over and over again is ditch
> the pseudo-RFC822 format we have and use Yaml instead.
>
> http://www.yaml.org/start.html
> http://yaml.org/spec/1.2/

And most probably somebody else will revive the "switch to XML" suggestion.
I know the pros and cons for different formats but I want a solution *now*

Michael Banck

unread,

Mar 20, 2009, 12:40:10 PM3/20/09

to

On Fri, Mar 20, 2009 at 02:45:09PM +0100, Andreas Tille wrote:
> 1. Itemize lists: (<li>)
> ------------------------
>

> 2. Enumerate lists: (<ol>)
> --------------------------
>

> 3. Description lists: (<dl>)
> ----------------------------
>

> This suggestion is far from complete and should be enhanced.

Well, not sure this should be over-engineered; I guess itemize lists
already cover most of the cases (most enumerations could probably be
changed to itemizations I guess).

So a +1 from me.

Michael

Julien Cristau

unread,

Mar 20, 2009, 3:10:07 PM3/20/09

to

On Fri, 2009-03-20 at 19:03 +0000, Neil Williams wrote:

> On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
> Andreas Tille <til...@rki.de> wrote:
>
> > I tried to find a clear advise how to reasonable format lists inside long
> > descriptions of packages. The only thing I know is that lines with two
> > leading spaces is considered verbose.
>

> Packages.gz is already 26Mb - I'd like to find ways to shorten the
> package descriptions, not lengthen it. :-(

Yeah, I'm sure being consistent about whether we use 2 or 3 spaces for
indented lists in descriptions is going to make that file a lot harder
to compress.

Cheers,
Julien

Neil Williams

unread,

Mar 20, 2009, 3:10:12 PM3/20/09

to

On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
Andreas Tille <til...@rki.de> wrote:

> I tried to find a clear advise how to reasonable format lists inside long
> descriptions of packages. The only thing I know is that lines with two
> leading spaces is considered verbose.

Packages.gz is already 26Mb - I'd like to find ways to shorten the

package descriptions, not lengthen it. :-(

> This leaves a lot of freedom to

> simulate for instance itemize lists. I'd like to give some examples for
> package names starting with 'a' and stopped with the first package names
> of 'b'. If you are bored by these examples continue reading below the
> ------ line.

> ---------------------------------------------------------------------
>
> I think we should try to implement some more strict formating rules
> to our long descriptions.

Maybe starting with a way to provide extra long descriptions by some
means *other* than Packages.gz - which in turn means maintainers
deciding which bits of the long description *really* need to be visible
before download and which can wait until the user has decided to
download the package.

Can the long description be trimmed to only such data necessary to
identify the package compared to similar packages? We have debtags for
lots of other facets of a package description, maybe it is time that
the long description itself is trimmed so that it does not repeat any
information already encoded as debtags?

> The rationale behind this is that with some
> better standard formating some tools which display descriptions on web
> pages might be enhanced to use <li>, <ol> and <dl> tags which finally
> makes a better reading.

Oh no, please don't let Packages.gz get to 40Mb or 50Mb or more. There
has to be a limit somewhere.

What about a way of having a really long, detailed, nicely formatted
description on packages.debian.org but a much shorter, more basic
version in the Packages.gz file?

> This suggestion is far from complete and should be enhanced.

I think the entire suggestion should be redirected away from the
Packages.gz file.

--

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

Emilio Pozuelo Monfort

unread,

Mar 20, 2009, 3:20:09 PM3/20/09

to

Neil Williams wrote:
> On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
> Andreas Tille <til...@rki.de> wrote:
>
>> I tried to find a clear advise how to reasonable format lists inside long
>> descriptions of packages. The only thing I know is that lines with two
>> leading spaces is considered verbose.
>
> Packages.gz is already 26Mb - I'd like to find ways to shorten the
> package descriptions, not lengthen it. :-(

AFAICS he's not talking about lengthen the descriptions at all, but to
standardize the way lists are formatted in long descriptions. That is, formalize
whether we should be using 2 or 3 spaces, dashes or plus signs for items in the
lists...

Cheers,
Emilio

signature.asc

Neil Williams

unread,

Mar 20, 2009, 3:30:17 PM3/20/09

to

On Fri, 20 Mar 2009 20:08:43 +0100
Julien Cristau <jcri...@debian.org> wrote:

> On Fri, 2009-03-20 at 19:03 +0000, Neil Williams wrote:
> > On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
> > Andreas Tille <til...@rki.de> wrote:
> >
> > > I tried to find a clear advise how to reasonable format lists inside long
> > > descriptions of packages. The only thing I know is that lines with two
> > > leading spaces is considered verbose.
> >
> > Packages.gz is already 26Mb - I'd like to find ways to shorten the
> > package descriptions, not lengthen it. :-(
>
> Yeah, I'm sure being consistent about whether we use 2 or 3 spaces for
> indented lists in descriptions is going to make that file a lot harder
> to compress.

I'd like to get the longest descriptions out of Packages.gz completely,
so encouraging their retention it not ideal. It's not about whether 2
or 3 spaces should be used, it's about whether such detailed content
deserves to be in Packages.gz in the first place.

If there is going to be discussion on standardising on some form of
indentation, it's worth considering whether there isn't a better way of
providing the data itself to achieve other benefits. Indents would need
changes in all affected packages - it might be easier to provide a
different means that also reduces the size of the Packages.gz file
at the same time so that packages only need to be changed once.

My comment for this RFC is, therefore, that better formatting for long
descriptions should include a review of whether the long description
deserves to be that long in the first place, whether the long
description merely duplicates data already available via debtags and
whether the long description should be trimmed for the package in
question *as well as* standardising the formatting of what remains.

Better can be construed to mean more - I merely want maintainers to
consider whether better actually means less.

Michael Banck

unread,

Mar 20, 2009, 6:40:10 PM3/20/09

to

On Fri, Mar 20, 2009 at 07:20:43PM +0000, Neil Williams wrote:
> I'd like to get the longest descriptions out of Packages.gz completely,
> so encouraging their retention it not ideal. It's not about whether 2
> or 3 spaces should be used, it's about whether such detailed content
> deserves to be in Packages.gz in the first place.

Then I wonder why you hijacked this thread and did not rather start a
new one?

Michael

Filipus Klutiero

unread,

Mar 20, 2009, 7:20:10 PM3/20/09

to

>
> On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
> Andreas Tille <til...@rki.de> wrote:
>
> > I tried to find a clear advise how to reasonable format lists inside long
> > descriptions of packages. The only thing I know is that lines with two
> > leading spaces is considered verbose.
>
> Packages.gz is already 26Mb - I'd like to find ways to shorten the
> package descriptions, not lengthen it. :-(
>

Current squeeze main Packages.gz is 7 MB:
http://ftp.ca.debian.org/debian/dists/squeeze/main/binary-i386/

> Can the long description be trimmed to only such data necessary to
> identify the package compared to similar packages? We have debtags for
> lots of other facets of a package description, maybe it is time that
> the long description itself is trimmed so that it does not repeat any
> information already encoded as debtags?
>

debtags is not yet at a stage where this should be done (for one thing,
Synaptic, for "example", does not support debtags). Even if it would be
possible, I doubt this would help much.

> > The rationale behind this is that with some
> > better standard formating some tools which display descriptions on web
> > pages might be enhanced to use <li>, <ol> and <dl> tags which finally
> > makes a better reading.
>
> Oh no, please don't let Packages.gz get to 40Mb or 50Mb or more. There
> has to be a limit somewhere.
>

I don't understand the proposal as something affecting Packages's size
significantly.

> What about a way of having a really long, detailed, nicely formatted
> description on packages.debian.org but a much shorter, more basic
> version in the Packages.gz file?
>

The extended description needs to be available to APT, not only via
packages.d.o. I seem to remember that Mandrake Linux (or some other
RPM-based distribution) used two Packages-like files, a fat one about 5
times our Packages and a slim one about a fifth of Debian's Packages. I
remember finding the slim index cool, but now that there's
Packages.diff, I think that developing Mandrake-like Packages files and
seeing the results in, perhaps, 2 years, would not benefit much to the
kind of hardware Debian will run on by then.

Filipus Klutiero

unread,

Mar 20, 2009, 7:50:08 PM3/20/09

to

>
> On Fri, 20 Mar 2009, martin f krafft wrote:
>
>
>
> What we really should do, instead of clinging to the NIH-behaviour,
> reinventing the wheel, and polishing it over and over again is ditch
> the pseudo-RFC822 format we have and use Yaml instead.
>
> http://www.yaml.org/start.html
> http://yaml.org/spec/1.2/
>
>
> And most probably somebody else will revive the "switch to XML" suggestion.
> I know the pros and cons for different formats but I want a solution *now*
> and that's the reason why I wrote:
>
>
>
> > 2. Does not break any existing tool
>
>

I tend to agree with Martin. Do you have a particular reason making this
change urge? At worst, a format for extended descriptions could be
usable by Debian 7.
I noticed while checking if packages.debian.org rendered the current
descriptions decently that acidlab's description is rendered pretty
badly, but AFAICS that's just a packages.d.o bug. FWIW, I had never
noticed such an issue.
> Kind regards
>
> Andreas.

Paul Wise

unread,

Mar 20, 2009, 11:30:18 PM3/20/09

to

On Sat, Mar 21, 2009 at 8:15 AM, Filipus Klutiero <che...@gmail.com> wrote:

> The extended description needs to be available to APT, not only via
> packages.d.o.

I agree with Neil William's comment in the other thread about removing
long descriptions from the Packages files. I think the obvious place
to put them is in dists/unstable/main/i18n/Translations-en (or C) like
the descriptions from DDTP.

--
bye,
pabs

http://wiki.debian.org/PaulWise

Neil Williams

unread,

Mar 21, 2009, 3:50:07 AM3/21/09

to

On Fri, 20 Mar 2009 19:15:00 -0400
Filipus Klutiero <che...@gmail.com> wrote:

> > On Fri, 20 Mar 2009 14:45:09 +0100 (CET)
> > Andreas Tille <til...@rki.de> wrote:
> >
> > > I tried to find a clear advise how to reasonable format lists inside long
> > > descriptions of packages. The only thing I know is that lines with two
> > > leading spaces is considered verbose.
> >
> > Packages.gz is already 26Mb - I'd like to find ways to shorten the
> > package descriptions, not lengthen it. :-(
> >
> Current squeeze main Packages.gz is 7 MB:
> http://ftp.ca.debian.org/debian/dists/squeeze/main/binary-i386/

Bah, my fault - 26Mb uncompressed. I was looking at /var/lib/apt/lists/
Sorry.

> > Can the long description be trimmed to only such data necessary to
> > identify the package compared to similar packages? We have debtags for
> > lots of other facets of a package description, maybe it is time that
> > the long description itself is trimmed so that it does not repeat any
> > information already encoded as debtags?

> debtags is not yet at a stage where this should be done (for one thing,
> Synaptic, for "example", does not support debtags). Even if it would be
> possible, I doubt this would help much.

Any reduction, replicated across 13,000 packages (or even just the
ones from that 13,000 that have verbose long descriptions currently), is
only going to help reduce the size of the file.

> > What about a way of having a really long, detailed, nicely formatted
> > description on packages.debian.org but a much shorter, more basic
> > version in the Packages.gz file?
> >
> The extended description needs to be available to APT

Only for use by apt-search, the rest of apt doesn't care about it. apt
understands debtags, why duplicate that information? (Frontends can be
adapted or just rely on apt-cache search underneath.)

>, not only via
> packages.d.o. I seem to remember that Mandrake Linux (or some other
> RPM-based distribution) used two Packages-like files, a fat one about 5
> times our Packages and a slim one about a fifth of Debian's Packages. I
> remember finding the slim index cool, but now that there's
> Packages.diff, I think that developing Mandrake-like Packages files and
> seeing the results in, perhaps, 2 years, would not benefit much to the
> kind of hardware Debian will run on by then.

Debian is not exclusively for power-hungry servers and mega-powerful
workstations, Debian also runs on very small hardware and not
necessarily old stuff either. It is a mistake to think that Debian
should require more and more powerful hardware for the basic system.

Yes, there is software in Debian that needs a powerful machine, there
is also a LOT of software in Debian specifically designed for low
resource machines where the benefits of a <1Mb Packages.gz file are
appreciable.

Neil Williams

unread,

Mar 21, 2009, 4:00:11 AM3/21/09

to

On Fri, 20 Mar 2009 23:32:51 +0100
Michael Banck <mba...@debian.org> wrote:

> On Fri, Mar 20, 2009 at 07:20:43PM +0000, Neil Williams wrote:
> > I'd like to get the longest descriptions out of Packages.gz completely,
> > so encouraging their retention it not ideal. It's not about whether 2
> > or 3 spaces should be used, it's about whether such detailed content
> > deserves to be in Packages.gz in the first place.
>
> Then I wonder why you hijacked this thread and did not rather start a
> new one?

If large numbers of package descriptions are to change collectively,
it's best to make that one change with two aims rather than two separate
changes. Less work for everyone involved.

Just looking for a bit of consideration for those situations where the
Packages file is already too large.

Neil Williams

unread,

Mar 21, 2009, 4:10:15 AM3/21/09

to

On Sat, 21 Mar 2009 12:28:36 +0900
Paul Wise <pa...@debian.org> wrote:

> On Sat, Mar 21, 2009 at 8:15 AM, Filipus Klutiero <che...@gmail.com> wrote:
>
> > The extended description needs to be available to APT, not only via
> > packages.d.o.
>
> I agree with Neil William's comment in the other thread about removing
> long descriptions from the Packages files. I think the obvious place
> to put them is in dists/unstable/main/i18n/Translations-en (or C) like
> the descriptions from DDTP.

Now that's a good idea - thanks Paul. That way, the long descriptions
can be moved aside without needing changes by lots of maintainers and
other formatting changes like the original thread can proceed
independently.

It's another instance of duplication - why retain the long description
in the Packages file while a translated version also exists from DDTP?
Probably better for the description to be removed from the Packages
file completely and the DDTP one contains the translated version and
English ones for those with missing or outdated translations. That way,
apt spends less time parsing the (smaller) Packages file when doing
ordinary stuff like package installation and only needs to look at the
DDTP information when specifically called as 'apt-cache search'.

CC:'ing debian-i18n to see if there are problems with this approach.

Paul Wise

unread,

Mar 21, 2009, 4:20:09 AM3/21/09

to

On Sat, Mar 21, 2009 at 4:58 PM, Neil Williams <code...@debian.org> wrote:

> It's another instance of duplication - why retain the long description
> in the Packages file while a translated version also exists from DDTP?
> Probably better for the description to be removed from the Packages
> file completely and the DDTP one contains the translated version and
> English ones for those with missing or outdated translations. That way,
> apt spends less time parsing the (smaller) Packages file when doing
> ordinary stuff like package installation and only needs to look at the
> DDTP information when specifically called as 'apt-cache search'.

One issue is that many people will have disabled downloading
translations so they'll need to change their configuration from none
to en:

APT::Acquire::Translation "none";

Since en will now be a "Translation", perhaps a different config item
is more appropriate:

APT::Acquire::Description "en";

Andreas Tille

unread,

Mar 21, 2009, 5:50:08 PM3/21/09

to

On Fri, 20 Mar 2009, Neil Williams wrote:

> Packages.gz is already 26Mb - I'd like to find ways to shorten the
> package descriptions, not lengthen it. :-(

Please read again. Chances are good that packages files might
become shorter.

>> The rationale behind this is that with some
>> better standard formating some tools which display descriptions on web
>> pages might be enhanced to use <li>, <ol> and <dl> tags which finally
>> makes a better reading.
>
> Oh no, please don't let Packages.gz get to 40Mb or 50Mb or more. There
> has to be a limit somewhere.

You should definitely read again - in how far removing / adding some spaces
and use defined characters instead of random ones should have such an
effect?

Andreas Tille

unread,

Mar 21, 2009, 6:00:15 PM3/21/09

to

On Fri, 20 Mar 2009, Neil Williams wrote:

> My comment for this RFC is, therefore, that better formatting for long
> descriptions should include a review of whether the long description
> deserves to be that long in the first place, whether the long
> description merely duplicates data already available via debtags and
> whether the long description should be trimmed for the package in
> question *as well as* standardising the formatting of what remains.

I agree that some descriptions are definitely to long. I wonder who
should really read some descriptions to the end. Bad examples can be
viewn here:

http://debian-med.alioth.debian.org/tasks/typesetting.html

Kind regards

Andreas.

--
http://fam-tille.de

Andreas Tille

unread,

Mar 21, 2009, 6:10:11 PM3/21/09

to

On Fri, 20 Mar 2009, Filipus Klutiero wrote:

>> > 2. Does not break any existing tool
>>
> I tend to agree with Martin. Do you have a particular reason making this
> change urge?

Just to give the suggestion a small chance. I'm not against a "better"
format but I have read enough suggestions that ended in nothing. BTW,
getting the descriptions in some standard shape might make an automatic
transition to a "better" format easier.

Kind regards

Andreas.

--
http://fam-tille.de

Christian Perrier

unread,

Mar 21, 2009, 6:20:12 PM3/21/09

to

Quoting Andreas Tille (til...@rki.de):

> Package: a2ps
> - various encodings (all the Latins and others),
> - various fonts (automatic font down loading),
> - various medias,
> ^^ (two spaces)
>
> Package: acerhk-source
> * controlling LEDs (Mail, Wireless)
> * enable/disable wireless hardware
> ^^^ (three spaces)

.../...

Please note that debian-l10n-english suggests using the enumeration
style you mention for a2ps, when we're reviewing package
descriptions...

Of course, that triggers rewrites but these are generally coupled with
much more very good improvement suggestions (the team features an
artist of the English language and that's not /me....which is obvious
for everybody).

signature.asc

Michael Bramer

unread,

Mar 21, 2009, 8:10:11 PM3/21/09

to

Paul Wise schrieb:

> On Sat, Mar 21, 2009 at 4:58 PM, Neil Williams <code...@debian.org> wrote:
>
>> It's another instance of duplication - why retain the long description
>> in the Packages file while a translated version also exists from DDTP?
>> Probably better for the description to be removed from the Packages
>> file completely and the DDTP one contains the translated version and
>> English ones for those with missing or outdated translations. That way,
>> apt spends less time parsing the (smaller) Packages file when doing
>> ordinary stuff like package installation and only needs to look at the
>> DDTP information when specifically called as 'apt-cache search'.
>
> One issue is that many people will have disabled downloading
> translations so they'll need to change their configuration from none
> to en:
>
> APT::Acquire::Translation "none";
>
> Since en will now be a "Translation", perhaps a different config item
> is more appropriate:
>
> APT::Acquire::Description "en";

This will not work:

apt use a md5sum from the sort and lang description (from the packages
file) to find the right 'translation'. If you remove the long
description from the packages file, apt can't do this task...

if we like to remove the long description from the package file, we must
change apt in some way and use some other rules for select the right
description (a new 'Description-md5sum' or the Version-Nr)

Gruss
Grisu

Filipus Klutiero

unread,

Mar 21, 2009, 8:20:14 PM3/21/09

to

Neil Williams wrote:
> On Fri, 20 Mar 2009 19:15:00 -0400
> Filipus Klutiero <che...@gmail.com> wrote:
>

> [...]

>
> > > What about a way of having a really long, detailed, nicely formatted
> > > description on packages.debian.org but a much shorter, more basic
> > > version in the Packages.gz file?
> > >
> > The extended description needs to be available to APT
>
> Only for use by apt-search, the rest of apt doesn't care about it. apt
> understands debtags, why duplicate that information? (Frontends can be
> adapted or just rely on apt-cache search underneath.)
>

I don't understand what you mean. Where would apt-cache get the extended
description from? Again, debtags is not mature enough yet to shrink
descriptions.

> >, not only via
> > packages.d.o. I seem to remember that Mandrake Linux (or some other
> > RPM-based distribution) used two Packages-like files, a fat one about 5
> > times our Packages and a slim one about a fifth of Debian's Packages. I
> > remember finding the slim index cool, but now that there's
> > Packages.diff, I think that developing Mandrake-like Packages files and
> > seeing the results in, perhaps, 2 years, would not benefit much to the
> > kind of hardware Debian will run on by then.
>
> Debian is not exclusively for power-hungry servers and mega-powerful
> workstations, Debian also runs on very small hardware and not
> necessarily old stuff either. It is a mistake to think that Debian
> should require more and more powerful hardware for the basic system.
>

Actually, I was only saying that I thought such a reduction of the
hardware requirements would not help much.

> Yes, there is software in Debian that needs a powerful machine, there
> is also a LOT of software in Debian specifically designed for low
> resource machines where the benefits of a <1Mb Packages.gz file are
> appreciable.

I agree, after reading Paul's comment, that if we get a Translations-en
file via DDTP, removing the extended description from Packages would be
less work, and thus more interesting.

I tested the gain with
awk '$0 !~ /^(Description| )/'
and the result loses close to half of its compressed size.
-rw-r--r-- 1 chealer chealer 4224356 mar 21 20:12 nodesc.tar.gz
-rw-r--r-- 1 chealer chealer 7350583 mar 21 15:56
debian.savoirfairelinux.net_debian_dists_testing_main_binary-i386_Packages.tar.gz

Andreas Tille

unread,

Mar 22, 2009, 3:00:30 AM3/22/09

to

On Sat, 21 Mar 2009, Christian Perrier wrote:

> Please note that debian-l10n-english suggests using the enumeration
> style you mention for a2ps, when we're reviewing package
> descriptions...

BTW, once you answered in this thread: Shouldn't we make the suggested
enhancements part of the Smith-Project?

Andreas Tille

unread,

Mar 22, 2009, 3:00:38 AM3/22/09

to

On Sun, 22 Mar 2009, Michael Bramer wrote:

> if we like to remove the long description from the package file, we must
> change apt in some way and use some other rules for select the right
> description (a new 'Description-md5sum' or the Version-Nr)

I'd call the Version-Nr. a sinsible choice. ;-)

Kind regards

Andreas.
--
http://fam-tille.de

Christian Perrier

unread,

Mar 22, 2009, 3:30:15 AM3/22/09

to

Quoting Andreas Tille (til...@rki.de):
> On Sat, 21 Mar 2009, Christian Perrier wrote:
>
>> Please note that debian-l10n-english suggests using the enumeration
>> style you mention for a2ps, when we're reviewing package
>> descriptions...
>
> BTW, once you answered in this thread: Shouldn't we make the suggested
> enhancements part of the Smith-Project?

Certainly. I currently refrain myself from reading -devel (it seems
like we are in this state of the release cycle where flame wars and
complicated discussions increase.....and I try saving my own time for
productive work) but I would appreciate a summary in case things and
ideas converge (good luck for this..:-))

Another thing we encourage in Smith is the use of good boilerplates in
package descriptions, for multi-binary packages....The point is having
a repetitive part common to all packages of a give source package,
that is the description of the general use of the "framework" and 1 or
2 specific paragraphs for each binary package saying things like "This
package provides the development files for <foo>", etc.

A good example of this is the recent review of "nut" templates....that
was one of the most complicated review we did (mostly because this is
one of the few where the maintainer gave advices...:-))

That review starts at
http://lists.debian.org/debian-l10n-english/2009/03/msg00025.html

...and turned out into #520591.... I suggest interested parties to
look at debian/control for nut before and after the review..:-)

signature.asc

Lionel Elie Mamane

unread,

Mar 22, 2009, 4:00:14 AM3/22/09

to

On Sat, Mar 21, 2009 at 10:52:10PM +0100, Andreas Tille wrote:

> I agree that some descriptions are definitely to long. I wonder who
> should really read some descriptions to the end. Bad examples can
> be viewn here:

> http://debian-med.alioth.debian.org/tasks/typesetting.html

The very long lengths seem to come mostly from lists of CTAN packages
in a Debian package; I find these useful, as I can "apt-cache search
CTAN_package" to find it in Debian.

--
Lionel

Andreas Tille

unread,

Mar 22, 2009, 5:00:19 AM3/22/09

to

On Sun, 22 Mar 2009, Lionel Elie Mamane wrote:

>> http://debian-med.alioth.debian.org/tasks/typesetting.html
>
> The very long lengths seem to come mostly from lists of CTAN packages
> in a Debian package; I find these useful, as I can "apt-cache search
> CTAN_package" to find it in Debian.

Yes, I'm sure there are reasons for just putting everything into
the description of a package - but as this thread shows there are
also reasons against - and I wonder how many users are bored about
overlongish descriptions compared to those who grep apt-cache
output.

Kind regards

Andreas.

--
http://fam-tille.de

Ben Finney

unread,

Mar 22, 2009, 7:30:10 AM3/22/09

to

Lionel Elie Mamane <lio...@mamane.lu> writes:

> The very long lengths seem to come mostly from lists of CTAN
> packages in a Debian package; I find these useful, as I can
> "apt-cache search CTAN_package" to find it in Debian.

For that purpose, it would seem ‘apt-file’ can do the job better,
obviating the need for that listing to bloat the Packages file. Or am
I missing something?

--
\ “I bought a dog the other day. I named him Stay. It's fun to |
`\ call him. ‘Come here, Stay! Come here, Stay!’ He went insane. |
_o__) Now he just ignores me and keeps typing.” —Steven Wright |
Ben Finney

Raphael Geissert

unread,

Mar 22, 2009, 5:30:09 PM3/22/09

to

Neil Williams wrote:
>
> If large numbers of package descriptions are to change collectively,
> it's best to make that one change with two aims rather than two separate
> changes. Less work for everyone involved.

But Andreas' RFC affects the source packages, yours only affects the
infrastructure that builds and uses Packages.

IOW: maintainers need to do something to go ahead with Andrea's proposal
and do nothing to see package descriptions go away from Packages.

>
> Just looking for a bit of consideration for those situations where the
> Packages file is already too large.
>

Cheers,
Raphael Geissert

Michael Banck

unread,

Mar 22, 2009, 8:10:10 PM3/22/09

to

On Sat, Mar 21, 2009 at 11:13:54PM +0100, Christian Perrier wrote:
> Quoting Andreas Tille (til...@rki.de):
>
> > Package: a2ps
> > - various encodings (all the Latins and others),
> > - various fonts (automatic font down loading),
> > - various medias,
> > ^^ (two spaces)

> Please note that debian-l10n-english suggests using the enumeration

> style you mention for a2ps, when we're reviewing package
> descriptions...

What's the rationale? So far, I was under the impression that " * "
was the most used enumeration style in long descriptions.

Michael

Christian Perrier

unread,

Mar 23, 2009, 4:00:26 AM3/23/09

to

Quoting Michael Banck (mba...@debian.org):

> > Please note that debian-l10n-english suggests using the enumeration
> > style you mention for a2ps, when we're reviewing package
> > descriptions...
>
> What's the rationale? So far, I was under the impression that " * "

A not very strong one, I'm afraid..:-)

IIRC, we once found some reference indicating a tendency for dashed
enumerations to be an accepted "standard" but I can't quote this.

Another reason is the fact that we're using this in French
translations....which is a bad reason..:-)

Another is that we had to choose something and, based on purely
personal impressions, we were thinking that dashed enumerations were
the majority (nobody really verified).

I think that we never really went into this to be the only proposed
change. Most of the time, there are several other
changes...particularly when enumerations are involved because, in such
cases:

- they're often too long (enumerating each and every feature of the
software)
- they have formatting issues (punctuation, often)
- they have consistency issues (mixing verb sentences and noun
sentences for instance)

signature.asc

Andreas Tille

unread,

Mar 23, 2009, 5:10:12 AM3/23/09

to

On Mon, 23 Mar 2009, Christian Perrier wrote:

>>
>> What's the rationale? So far, I was under the impression that " * "
>
> A not very strong one, I'm afraid..:-)
>
> IIRC, we once found some reference indicating a tendency for dashed
> enumerations to be an accepted "standard" but I can't quote this.

Could you please clarify whether you mean *enumeration* (in the sense
of LaTeXs enumeration environment or HTMLs <ol>) or would you rather
mean *itemize* (in the sense of LaTeXs itemize environment or HTMLs
<ul>)? IMHO this are things which should be handled differently.
I don't care whether a ' *' or a ' -' is finally used - it just
should be used in the same way for all descriptions.

> - they're often too long (enumerating each and every feature of the
> software)
> - they have formatting issues (punctuation, often)
> - they have consistency issues (mixing verb sentences and noun
> sentences for instance)

I completely agree that this should be fixed as well - but it is hard
to code such tests in a lintian check or something like this.

Kind regards

Andreas.

--
http://fam-tille.de

Michael Banck

unread,

Mar 23, 2009, 6:10:16 AM3/23/09

to

On Mon, Mar 23, 2009 at 07:24:45AM +0100, Christian Perrier wrote:
> Quoting Michael Banck (mba...@debian.org):
>
> > > Please note that debian-l10n-english suggests using the enumeration
> > > style you mention for a2ps, when we're reviewing package
> > > descriptions...
> >
> > What's the rationale? So far, I was under the impression that " * "
>
>
> A not very strong one, I'm afraid..:-)
>
> IIRC, we once found some reference indicating a tendency for dashed
> enumerations to be an accepted "standard" but I can't quote this.
>
> Another reason is the fact that we're using this in French
> translations....which is a bad reason..:-)
>
> Another is that we had to choose something and, based on purely
> personal impressions, we were thinking that dashed enumerations were
> the majority (nobody really verified).

Well, ok; but your initial post to this thread made it sound like some
semi-or-mostly official description review process, so having to change
all my long descriptions to " - " (after all, standardizing on one
format is the point of this thread) does not fill me with pure joy. So
if I have to do that, I'd prefer having a reason like "80% of the
packages do it like that" or "this is the preferred form of itemization
in english according to ...", or something. The above reasons do not
look very convincing to me.

So it would be great if some numbers could be brought up first (maybe
Andreas has a rough overview now, because he looked at the different
kinds of itemizations).

Again, I don't think enumerations are used that much (and if they are, a
lot of them are really itemizations I guess), but standardizing on
itemizations strikes me as useful. Not just for packages.d.o HTML
output, but also for apt-cache show consistence etc.

Andreas Tille

unread,

Mar 23, 2009, 8:30:10 AM3/23/09

to

On Mon, 23 Mar 2009, Michael Banck wrote:

> So it would be great if some numbers could be brought up first (maybe
> Andreas has a rough overview now, because he looked at the different
> kinds of itemizations).

Well, I had not but you can get it somehow by

for tag in "\*" "-" "+" "o" ; do
echo "Tag $tag was used `grep "^ $tag " /var/lib/dpkg/available | wc -l` times"
done

Tag \* was used 5647 times
Tag - was used 2710 times
Tag + was used 85 times
Tag o was used 282 times

which only counts those who have proper spacing - but for a rough estimation
'*' wins definitely.

> Again, I don't think enumerations are used that much (and if they are, a
> lot of them are really itemizations I guess)

Just recommending: There is no real need for enumerations - lets use
itemize in any case might be a valid point as well. But IMHO whe need
descriptions (in the sense of LaTeX description environment or HTML <dl>).

Kind regards

Andreas.

--
http://fam-tille.de

Stefano Zacchiroli

unread,

Mar 23, 2009, 9:40:15 AM3/23/09

to

On Fri, Mar 20, 2009 at 02:45:09PM +0100, Andreas Tille wrote:
> I do not propose drastic changes but a start for "Best practices"
> might be reasonable and perhaps some lintian warnings might help to
> remind developers to move to some standard.

Laudable initiative, thanks for raising the issue. The current
handling of "list" is dumb at best.

I agree with Martin that we should avoid the NIH syndrome though, but
that does not necessarily mean that we should switch entirely control
files to a new format. It just means that we should think big.

In particular, I observe that we (IIRC) already have psuedo-parsing
code which is used at least by packages.d.o to render as proper HTML
lists the pseudo-lists which come from long descriptions. That makes
evident, at least to me, that long descriptions need some kind of
formatting for most of their use cases (packages.d.o is one, the
interface of a GUI package manager is another one).

In that respect, resisting the NIH syndrome just means choose an
already existing text-based markup language and adopt its
convention. For instance, we can just say that long description lists
have to be formatted as Markdown lists (modulo some extra bits needed
to not violate 822 parsing). That would be synergistic with a possible
future switch to Markdown for the whole markup of long
descriptions. Note that I don't care in particular about Markdown, it
can also be restructured text for what I care.

But please check that your convention matches such a markup language
and please say explicitly so in your proposal. That would also
implement a somewhat principle of least surprise for people coming
from those languages.

Thanks!
Cheers.

--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime

signature.asc

Andreas Tille

unread,

Mar 23, 2009, 10:40:12 AM3/23/09

to

On Mon, 23 Mar 2009, Stefano Zacchiroli wrote:

> In particular, I observe that we (IIRC) already have psuedo-parsing
> code which is used at least by packages.d.o to render as proper HTML
> lists the pseudo-lists which come from long descriptions.

Not that I know of. IMHO it is just set verbose (<pre>) just checking
the a2ps example which was mentioned here:

-------------------------------------------------------------------------
<h2>GNU a2ps - 'Anything to PostScript' converter and pretty-printer</h2>

GNU a2ps converts files into PostScript for printing or viewing. It uses a
nice default format, usually two pages on each physical page, borders
surrounding pages, headers with useful information (page number, printing
date, file name or supplied header), line numbering, symbol substitution
as well as pretty printing for a wide range of programming languages.

Historically, a2ps started as a text to PostScript converter, but thanks
to powerful delegations it is able to let you use it for any kind of files,
ie it can also digest manual pages, dvi files, texinfo, ....

Among the other most noticeable features of a2ps are:
<pre>

- various encodings (all the Latins and others),
- various fonts (automatic font down loading),
- various medias,

- various printer interfaces,
- various output styles,
- various programming languages,
- various helping applications,
- and various spoken languages.
</pre>
------------------------------------------------------------------------

> But please check that your convention matches such a markup language
> and please say explicitly so in your proposal.

This is definitely intended but I'm not an example of those markup
languages. That's why I said:

1. Defines some kind of standard which can be parsed automatically.

2. Does not break any existing tool

If there is an existing markup language which fits this feature I'd definitely
vote for it.

Christian Perrier

unread,

Mar 23, 2009, 2:10:15 PM3/23/09

to

Quoting Andreas Tille (til...@rki.de):

> Could you please clarify whether you mean *enumeration* (in the sense

I meant itemization, actually, so more "<ul>" than "<ol>". There are
certainly very few cases where ordered lists are really useful in
packages' description.

Sorry for the approximative English, here..

>

--

signature.asc

Michael Banck

unread,

Mar 23, 2009, 2:40:14 PM3/23/09

to

On Mon, Mar 23, 2009 at 02:32:17PM +0100, Stefano Zacchiroli wrote:
> In that respect, resisting the NIH syndrome just means choose an
> already existing text-based markup language and adopt its
> convention. For instance, we can just say that long description lists
> have to be formatted as Markdown lists (modulo some extra bits needed
> to not violate 822 parsing). That would be synergistic with a possible
> future switch to Markdown for the whole markup of long
> descriptions. Note that I don't care in particular about Markdown, it
> can also be restructured text for what I care.

Uh, what are you saying here? That we should use " * " to prepend
items in itemized lists, so that it can be converted to HTML lists by
packages.debian.org et al.? If not, what else?

Michael

Daniel Burrows

unread,

Mar 23, 2009, 7:30:12 PM3/23/09

to

I don't have the energy to push this any more, but I should probably
at least refer to my previous attempt to standardize bulleted lists:

http://lists.debian.org/debian-devel/2005/12/msg00531.html

You might find it useful, or not. At least it more or less documents
current practice in aptitude (I think there have been some tweaks since
then; if anyone cares I could go research what they are and dig them up).

Daniel

Stefano Zacchiroli

unread,

Mar 24, 2009, 5:30:20 AM3/24/09

to

On Mon, Mar 23, 2009 at 07:18:07PM +0100, Michael Banck wrote:
> Uh, what are you saying here? That we should use " * " to prepend
> items in itemized lists, so that it can be converted to HTML lists by
> packages.debian.org et al.? If not, what else?

Yes.

More generally, I believe we can benefit in the long run of some
simple text-based markup that support the basic emphasis stuff we are
used to use in emails; markdown is just an example of such a language.

Having to choose a syntax for itemized list, it would be wise to
choose one which is future compatible with such a language.

signature.asc

Goswin von Brederlow

unread,

Mar 30, 2009, 7:10:11 AM3/30/09

to

Andreas Tille <til...@rki.de> writes:

> On Sun, 22 Mar 2009, Michael Bramer wrote:
>
>> if we like to remove the long description from the package file, we
>> must change apt in some way and use some other rules for select the
>> right description (a new 'Description-md5sum' or the Version-Nr)
>
> I'd call the Version-Nr. a sinsible choice. ;-)
>
> Kind regards
>
> Andreas.

I think the idea of using the Description-md5sum is that in most cases
the md5sum remains identical for many versions. If you use the
packages actual version then every upload will need a new translation
entry or some fuzzyness to accept an older versions translation.

MfG
Goswin

Michael Bramer

unread,

Mar 30, 2009, 5:00:17 PM3/30/09

to

Goswin von Brederlow schrieb:

> Andreas Tille <til...@rki.de> writes:
>
>> On Sun, 22 Mar 2009, Michael Bramer wrote:
>>
>>> if we like to remove the long description from the package file, we
>>> must change apt in some way and use some other rules for select the
>>> right description (a new 'Description-md5sum' or the Version-Nr)
>> I'd call the Version-Nr. a sinsible choice. ;-)
>>
>> Kind regards
>>
>> Andreas.
>
> I think the idea of using the Description-md5sum is that in most cases
> the md5sum remains identical for many versions. If you use the
> packages actual version then every upload will need a new translation
> entry or some fuzzyness to accept an older versions translation.

ACK

Gruss
Grisu

Andreas Tille

unread,

Mar 30, 2009, 5:20:22 PM3/30/09

to

On Mon, 30 Mar 2009, Michael Bramer wrote:

> Goswin von Brederlow schrieb:

>> I think the idea of using the Description-md5sum is that in most cases
>> the md5sum remains identical for many versions. If you use the
>> packages actual version then every upload will need a new translation
>> entry or some fuzzyness to accept an older versions translation.

I understood the sense of having md5sums in translation files. My
suggsetion was an *additional* field which keeps the package version.
In case there are different versions of a package in one dist (might
be because an arch is lagging behind) either the md5sums differ
and you store different translations anyway or the desciptions are
equal and in this case use the highes available version number.

Kind regards

Andreas.
--
http://fam-tille.de

Goswin von Brederlow

unread,

Mar 31, 2009, 6:30:25 PM3/31/09

to

Andreas Tille <til...@rki.de> writes:

> On Mon, 30 Mar 2009, Michael Bramer wrote:
>
>> Goswin von Brederlow schrieb:
>>> I think the idea of using the Description-md5sum is that in most cases
>>> the md5sum remains identical for many versions. If you use the
>>> packages actual version then every upload will need a new translation
>>> entry or some fuzzyness to accept an older versions translation.
>
> I understood the sense of having md5sums in translation files. My
> suggsetion was an *additional* field which keeps the package version.
> In case there are different versions of a package in one dist (might
> be because an arch is lagging behind) either the md5sums differ
> and you store different translations anyway or the desciptions are
> equal and in this case use the highes available version number.
>
> Kind regards
>
> Andreas.

Cant you have mutliple descriptions for the same package with
different md5sums in the translation file?

MfG
Goswin

Andreas Tille

unread,

Apr 1, 2009, 7:10:13 AM4/1/09

to

On Wed, 1 Apr 2009, Goswin von Brederlow wrote:

> Cant you have mutliple descriptions for the same package with
> different md5sums in the translation file?

Yes there are such cases. If an arch is unable to catch up (for whatever
reason) and the description has changed inbetween the versions you end
up with different description translations (belonging to different package
versions). This does not happen in stable (by definition), is very seldom
in testing and happens approximately 50 times in unstable (I can give
real numbers if you are interested and the numbers heavily depend on how
good translators catch up).

Kind regards

Andreas.

--
http://fam-tille.de

Michael Bramer

unread,

Apr 1, 2009, 9:20:16 AM4/1/09

to

Goswin von Brederlow schrieb:

> Andreas Tille <til...@rki.de> writes:
>
>> On Mon, 30 Mar 2009, Michael Bramer wrote:
>>
>>> Goswin von Brederlow schrieb:
>>>> I think the idea of using the Description-md5sum is that in most cases
>>>> the md5sum remains identical for many versions. If you use the
>>>> packages actual version then every upload will need a new translation
>>>> entry or some fuzzyness to accept an older versions translation.
>> I understood the sense of having md5sums in translation files. My
>> suggsetion was an *additional* field which keeps the package version.
>> In case there are different versions of a package in one dist (might
>> be because an arch is lagging behind) either the md5sums differ
>> and you store different translations anyway or the desciptions are
>> equal and in this case use the highes available version number.
>

> Cant you have mutliple descriptions for the same package with
> different md5sums in the translation file?

We _have_ mutliple descriptions for the same package with different
md5sums in the translation file for sid.

Gruss
Grisu

Goswin von Brederlow

unread,

Apr 1, 2009, 12:20:10 PM4/1/09

to

Michael Bramer <gr...@deb-support.de> writes:

> Goswin von Brederlow schrieb:
>> Andreas Tille <til...@rki.de> writes:
>>
>>> On Mon, 30 Mar 2009, Michael Bramer wrote:
>>>
>>>> Goswin von Brederlow schrieb:
>>>>> I think the idea of using the Description-md5sum is that in most cases
>>>>> the md5sum remains identical for many versions. If you use the
>>>>> packages actual version then every upload will need a new translation
>>>>> entry or some fuzzyness to accept an older versions translation.
>>> I understood the sense of having md5sums in translation files. My
>>> suggsetion was an *additional* field which keeps the package version.
>>> In case there are different versions of a package in one dist (might
>>> be because an arch is lagging behind) either the md5sums differ
>>> and you store different translations anyway or the desciptions are
>>> equal and in this case use the highes available version number.
>>
>> Cant you have mutliple descriptions for the same package with
>> different md5sums in the translation file?
>
> We _have_ mutliple descriptions for the same package with different
> md5sums in the translation file for sid.
>
> Gruss
> Grisu

Then the version number will not be needed when an arch lags
behind. The translation for the old md5sum can just be kept.

MfG
Goswin

Andreas Tille

unread,

Apr 1, 2009, 3:50:08 PM4/1/09

to

On Wed, 1 Apr 2009, Goswin von Brederlow wrote:

> Then the version number will not be needed when an arch lags
> behind. The translation for the old md5sum can just be kept.

Well, this thread was missused to discuss several issues.
Would you mind reading my original posting why version numbers
in Translation files make sense and would you please base your
arguing on this posting. Perhaps I'm just wrong but version
numbers are really handy in this case and I see an extra benefit
in making these files somehow human readable (in the sense that
I doubt you are able to calculate md5sums manually to find out
the matching description.

Kind regards

Andreas.

--
http://fam-tille.de

Martijn van Oosterhout

unread,

Apr 5, 2009, 9:40:13 AM4/5/09

to

On Wed, Apr 1, 2009 at 9:47 PM, Andreas Tille <til...@rki.de> wrote:
> On Wed, 1 Apr 2009, Goswin von Brederlow wrote:
>
>> Then the version number will not be needed when an arch lags
>> behind. The translation for the old md5sum can just be kept.
>
> Well, this thread was missused to discuss several issues. Would you mind
> reading my original posting why version numbers
> in Translation files make sense and would you please base your arguing on
> this posting. Perhaps I'm just wrong but version
> numbers are really handy in this case and I see an extra benefit
> in making these files somehow human readable (in the sense that
> I doubt you are able to calculate md5sums manually to find out
> the matching description.

While I'm not against the idea of version numbers (though it would
have to be a list since a single translation may apply to dozens of
versions) it's not that hard to identify the description you want.
What I often did was simply open up the description file to find the
description I wanted to test, cut and paste it into another console
running md5sum and that would be the md5 I needed to look for.

Have a nice day,
--
Martijn van Oosterhout <kle...@gmail.com> http://svana.org/kleptog/

Andreas Tille

unread,

Apr 7, 2009, 4:40:19 AM4/7/09

to

On Sun, 5 Apr 2009, Martijn van Oosterhout wrote:

> While I'm not against the idea of version numbers (though it would
> have to be a list since a single translation may apply to dozens of
> versions)

This might be discussed.

> it's not that hard to identify the description you want.
> What I often did was simply open up the description file to find the
> description I wanted to test, cut and paste it into another console
> running md5sum and that would be the md5 I needed to look for.

Well, I did not said that it is actually hard and in UDD you can get this
easily by

SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
FROM packages WHERE ...

but the actual method you are proposing might be not very reliable because
of the importance of spacings (like the exact newlines etc). So comparing
version numbers is faster in any case and *easily* doable for humans -
even if you have the right md5 sum as you mentioned above - comparing it
is also harder than a short version string. While the human readability
is not my main concern I care more for the feature to directly compare
Translations and Packages table with the available information rather
than taking the detour over MD5 sums.

Kind regards

Andreas.

--
http://fam-tille.de

Guillem Jover

unread,

Apr 8, 2009, 12:20:11 AM4/8/09

to

Hi!

On Mon, 2009-03-23 at 16:23:12 -0700, Daniel Burrows wrote:
> I don't have the energy to push this any more, but I should probably
> at least refer to my previous attempt to standardize bulleted lists:
>
> http://lists.debian.org/debian-devel/2005/12/msg00531.html
>
> You might find it useful, or not. At least it more or less documents
> current practice in aptitude (I think there have been some tweaks since
> then; if anyone cares I could go research what they are and dig them up).

There's been a wiki page trying to track this, including packages
which formatting was proving problematic:

<http://wiki.debian.org/Aptitude::Parse-Description-Bullets=true>

regards,
guillem

Andreas Tille

unread,

Apr 8, 2009, 2:50:17 AM4/8/09

to

On Wed, 8 Apr 2009, Guillem Jover wrote:

> There's been a wiki page trying to track this, including packages
> which formatting was proving problematic:
>
> <http://wiki.debian.org/Aptitude::Parse-Description-Bullets=true>

Great. The most important information from this page for myself is that
there are actually other tools (not the one I intended to write for
Blends) which actually would profit from a more standardized formating
of descriptions. IMHO this rectifies filing bug reports against packages
that try to implement a list but fail to use the form:

has_list |= ( line =~ /^\s+-/ ) # a line starts with " -"
has_list |= ( line =~ /^\s+\+/ ) # " +"
has_list |= ( line =~ /^\s+\*/ ) # " *"
has_list |= ( line =~ /^\s+o\s+/ ) # " o "

BTW, why are you checking for \s after the itemizing symbol only after
'o'? IMHO it should always follow each itemizing symbol. I also see
no good chances to detect multi level lists and thus I would like to
come back to more strict rules regarding the itemizing symbol and the
spacing. In contrast to the comment in the end the check also allows

" -"

and I would rather like to force

/^ - / or /^ + /

(yes, not checking for any space but really the character ' ' = blank).
IMHO this would increase the reliability of detecting a list and if there
are tools like aptitude who are actually making use of it it should be
worth the effort.

For the sake of interest: What programming language is the script above?

Kind regards

Andreas.

--
http://fam-tille.de

Martijn van Oosterhout

unread,

Apr 12, 2009, 9:30:26 AM4/12/09

to

On Tue, Apr 7, 2009 at 9:49 AM, Andreas Tille <til...@rki.de> wrote:
> Well, I did not said that it is actually hard and in UDD you can get this
> easily by
>
> SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
> FROM packages WHERE ...

Ok, I see why you're having trouble now; you're splitting up the
description in your DB and thus need to stick it back together. That
does indeed make the process a bit less reliable. The DDTP/DDTSS
treats the description as a single string, the exact string in the
Packages file (the Description field is a single entry in the file) so
we had no issues. By doing extra processing like splitting/stripping
parts of the string it's quite possible you're doing a not invertible
conversion, which would make matching later harder.

It'd be nice if someone went over the version number stuff in
DDTP/DDTSS since by and large it was never used (user display only and
even then it wasn't accurate) and so probably there's plenty of work
there.

It might actually be easier to write a script which simply collected
Packages files from say snapshot.debian.org, calculated all the MD5
sums (you can extract the description field using a regex so it's easy
enough in Perl) and built a database of description MD5s and version
numbers. That would give a reliable mapping, far more reliable than
the DDTP/DDTSS is ever likely to do.

Keep in mind that all dpkg frontends with description only work on the
basis of the complete description string, I'm not sure if anyone is
likely to switch to using versions.

Have a nice day,
--
Martijn van Oosterhout <kle...@gmail.com> http://svana.org/kleptog/

Andreas Tille

unread,

Apr 12, 2009, 6:10:10 PM4/12/09

to

On Sun, 12 Apr 2009, Martijn van Oosterhout wrote:

>> SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
>> FROM packages WHERE ...
>
> Ok, I see why you're having trouble now; you're splitting up the
> description in your DB and thus need to stick it back together.

That's the format other tables in UDD are using. But it does not
really make the worst part of the problem - as you see It can
perfectly be joined again. It is just the md5 sum calsulation which
slows down things and the calculation of the version number is
not reliable in all cases - which I regard as a problem.

> That does indeed make the process a bit less reliable.

I don't think that it is the split which causes the problem. I was
able to reproduce the correct description the way I described above.

> The DDTP/DDTSS
> treats the description as a single string, the exact string in the
> Packages file (the Description field is a single entry in the file) so
> we had no issues. By doing extra processing like splitting/stripping
> parts of the string it's quite possible you're doing a not invertible
> conversion, which would make matching later harder.

In how far? This is done in UDD with all descriptions and never
caused a problem.

> It might actually be easier to write a script which simply collected
> Packages files from say snapshot.debian.org, calculated all the MD5
> sums (you can extract the description field using a regex so it's easy
> enough in Perl) and built a database of description MD5s and version
> numbers. That would give a reliable mapping, far more reliable than
> the DDTP/DDTSS is ever likely to do.

Can you elaborate a bit more why you regard it as not reliable to
add a version number to DDTP Translation files?

Kind regards

ANdreas.

--
http://fam-tille.de

Guillem Jover

unread,

Apr 16, 2009, 12:00:10 AM4/16/09

to

Hi!

On Mon, 2009-03-23 at 13:26:36 +0100, Andreas Tille wrote:
> On Mon, 23 Mar 2009, Michael Banck wrote:
> > So it would be great if some numbers could be brought up first (maybe
> > Andreas has a rough overview now, because he looked at the different
> > kinds of itemizations).
>
> Well, I had not but you can get it somehow by
>
> for tag in "\*" "-" "+" "o" ; do
> echo "Tag $tag was used `grep "^ $tag " /var/lib/dpkg/available | wc -l` times"
> done
>
> Tag \* was used 5647 times
> Tag - was used 2710 times
> Tag + was used 85 times
> Tag o was used 282 times
>
> which only counts those who have proper spacing - but for a rough estimation
> '*' wins definitely.

Even if we'd have to fix all the entries with wrong spacing anyway to
reach correctness, I was curious to see numbers for all spacing variants
for a wider representation of the characters used:

,-- count-bullet-chars.sh --
#!/bin/sh
lists=/var/lib/apt/lists/*_sid_main_*_Packages
total=`grep "^ *[-+\*o] " $lists | wc -l`

for tag in "\*" "-" "+" "o"; do

items=`grep "^ *$tag " $lists | wc -l`
percent=`echo "scale=4; $items / $total * 100" | bc`
echo "Tag $tag was used $items times ($percent%)"
done
`--

Tag \* was used 9277 times (68.0900%)
Tag - was used 3837 times (28.1600%)
Tag + was used 120 times (.8800%)
Tag o was used 390 times (2.8600%)

Regardless of the numbers though (which have moved lately slightly in
favour of '-' due to the recommendations from the Smith reviewing
project), I've always found the asterisk the obvious character to use
for bulleted lists, as it's the one ressembling the most a bullet, and
it's the one we use in changelog entries and similar.

regards,
guillem

Andreas Tille

unread,

Apr 16, 2009, 2:10:16 AM4/16/09

to

On Thu, 16 Apr 2009, Guillem Jover wrote:

> ,-- count-bullet-chars.sh --
> #!/bin/sh
> lists=/var/lib/apt/lists/*_sid_main_*_Packages
> total=`grep "^ *[-+\*o] " $lists | wc -l`
> for tag in "\*" "-" "+" "o"; do
> items=`grep "^ *$tag " $lists | wc -l`
> percent=`echo "scale=4; $items / $total * 100" | bc`
> echo "Tag $tag was used $items times ($percent%)"
> done
> `--
>
> Tag \* was used 9277 times (68.0900%)
> Tag - was used 3837 times (28.1600%)
> Tag + was used 120 times (.8800%)
> Tag o was used 390 times (2.8600%)
>
>
> Regardless of the numbers though (which have moved lately slightly in
> favour of '-' due to the recommendations from the Smith reviewing
> project),

I have not found any recommendation regarding this at the SRP Wiki page [1].
I vaguely remember that this Smith project was initially driven by a French
guy who might try to push a French habit into the English world. ;-)
Do you have any link to those recommendation which perhaps should be fixed
in the first place. IMHO the Smith Review Project would be a first place
were we could start kind of a standardisation of this issue - it seems there
is no "stronger" place to move this suggestion to.

> I've always found the asterisk the obvious character to use
> for bulleted lists, as it's the one ressembling the most a bullet, and
> it's the one we use in changelog entries and similar.

I perfectly agree here. Even if I tend to a "I do not care about the actual
character we use as long as it is a defined one" opinion the statistics above
shows clearly a preference and we should turn this preference in a
recommendation and ask people to stick to this recommendation.

So could we settle down with the agreement:

' * ' for first order lists and
' - ' for second order lists.

I would like to push this to SRP *and* "6.2. Best practices for debian/control"
of developers reference. This would finally allow us to file wishlist bug
reports against packages which do not follow this recommendation.

Kind regards

Andreas.

[1] http://wiki.debian.org/I18n/SmithReviewProject

--
http://fam-tille.de

Christian Perrier

unread,

Apr 16, 2009, 2:50:18 AM4/16/09

to

Andreas Tille a écrit :

> I have not found any recommendation regarding this at the SRP Wiki page
> [1].
> I vaguely remember that this Smith project was initially driven by a French
> guy who might try to push a French habit into the English world. ;-)

Of course. Because, contrary to the world of English language, we *do*
have written rules for such cases. From the "Lexique des règles
typographiques en usage à l'Imprimerie Nationale" (which is the
reference for all typographic conventions for the French language....the
reference book of all French TeXnicians) :

Les énumérations

- elles sont introduites par un deux-points ;
- les énumérations de premier rang sont introduites par un tiret et
se terminent par un point-virgule, sauf la dernière par un point final ;
- les énumérations de second rang sont introduites par un tiret
décalé et se terminent par une virgule.

Which (badly) translates to:

Itemizations:
- they're introduced by a colon;
- first degree itemizations are preceeded by a dash and end with a
semi-colon, except the last one that ends up with a sentence dot;
- second degree itemizations are preceeded by a tabbed dash and end
up with a comma.

I have never been able to find any such solid reference for English.
There is probably something in the Chicago Manual of Style, that's
generally accepted as the Right Reference for en_US.

Maybe more input from our experts on debian-l10n-english?

Manoj Srivastava

unread,

Apr 16, 2009, 3:00:26 AM4/16/09

to

On Wed, Apr 15 2009, Guillem Jover wrote:

> Tag \* was used 9277 times (68.0900%)
> Tag - was used 3837 times (28.1600%)
> Tag + was used 120 times (.8800%)
> Tag o was used 390 times (2.8600%)
>
> Regardless of the numbers though (which have moved lately slightly in
> favour of '-' due to the recommendations from the Smith reviewing
> project), I've always found the asterisk the obvious character to use
> for bulleted lists, as it's the one ressembling the most a bullet, and
> it's the one we use in changelog entries and similar.

The primary goal of the description is to convey to the user why
they should install the package. The maintainer can use an unsorted
list to help convey the information; and any means that make it clear
to the user that they are looking at a list is good enough.

Anything beyond that seems like striving for a foolish
consistency; and the basic assumption being made (which does
not, in my opinion, hold) is that a rigid monotonic conformity is
aesthetically pleasing. I think a variety in the symbols used for
bullets is better, in that it breaks the monotony.

Do we really have nothing better to do than to impose
bureaucratic rules on what characters to use as bullet symbols in long
descriptions even if the user can tell that the character is a bullet?

manoj

--
Slowly and surely the unix crept up on the Nintendo user ...
Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Lars Wirzenius

unread,

Apr 16, 2009, 3:10:25 AM4/16/09

to

to, 2009-04-16 kello 08:42 +0200, Christian Perrier kirjoitti:
> I have never been able to find any such solid reference for English.
> There is probably something in the Chicago Manual of Style, that's
> generally accepted as the Right Reference for en_US.
>
> Maybe more input from our experts on debian-l10n-english?

I'm not an expert, but I have the 14th edition of the CMS. It says both
bullets and dashes are acceptable (8.77, page 314, for reference).

(I am not expressing an opinion for or against the normalization of long
description markup.)

Andreas Tille

unread,

Apr 16, 2009, 3:20:10 AM4/16/09

to

On Thu, 16 Apr 2009, Manoj Srivastava wrote:

> Do we really have nothing better to do than to impose
> bureaucratic rules on what characters to use as bullet symbols in long
> descriptions even if the user can tell that the character is a bullet?

The user can tell, but scripts can't reliably. Long descriptions are
used in several places and some of these could render a better layout.
A good layout is pleasing for users. So it is not stupid bureaucracy
but making our descriptions better readable (for instance on packages.d.o
and other places).

Kind regards

Andreas.

--
http://fam-tille.de

Manoj Srivastava

unread,

Apr 16, 2009, 3:20:12 AM4/16/09

to

On Thu, Apr 16 2009, Christian Perrier wrote:

> Andreas Tille a écrit :
>
>> I have not found any recommendation regarding this at the SRP Wiki page
>> [1].
>> I vaguely remember that this Smith project was initially driven by a French
>> guy who might try to push a French habit into the English world. ;-)
>
> Of course. Because, contrary to the world of English language, we *do*
> have written rules for such cases. From the "Lexique des règles
> typographiques en usage à l'Imprimerie Nationale" (which is the
> reference for all typographic conventions for the French language....the
> reference book of all French TeXnicians) :

Perhaps such rigidity is the reason the Lingua Franca around the
world has come to be English? :-)

I really do not think that needless consistency is something we
should pursue. Indeed, I'll go and scable the bullet symbols I use for
unsorted lists to get away from the mind numbingly boring consistency,
and provide some variety for my readers.

manoj
--
A well-known friend is a treasure.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Manoj Srivastava

unread,

Apr 16, 2009, 3:50:11 AM4/16/09

to

On Thu, Apr 16 2009, Andreas Tille wrote:

> On Thu, 16 Apr 2009, Manoj Srivastava wrote:
>
>> Do we really have nothing better to do than to impose
>> bureaucratic rules on what characters to use as bullet symbols in long
>> descriptions even if the user can tell that the character is a bullet?
>
> The user can tell, but scripts can't reliably.

Any script should be able to take the top 4 symbols currently
used, and be able to detect them. I think *, +, - and o cover most
packages, and the scripts in question can be readily expanded. All
kinds of markup languages already do something similar. (markdown,
Emacs org-mode, mediawiki, etc)

> Long descriptions are used in several places and some of these could
> render a better layout.

Functionally, just rendering the description as written would
suffice; the rest is aesthetics.

> A good layout is pleasing for users. So it

Pleasing is in the eye of the beholder, no?

> is not stupid bureaucracy but making our descriptions better readable
> (for instance on packages.d.o and other places).

I find the descriptions on packages.d.o just fine right now.

Having sad that, I would not be averse to specifying that leading
white space and *, +, and - would be acceptable as bullet marks (I
thought specifying which mark at which level was overspecification).

manoj
--
A man convinced against his will is of the same opinion still. --
Butler

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Andreas Tille

unread,

Apr 16, 2009, 4:20:23 AM4/16/09

to

On Thu, 16 Apr 2009, Manoj Srivastava wrote:

> Any script should be able to take the top 4 symbols currently
> used, and be able to detect them. I think *, +, - and o cover most
> packages, and the scripts in question can be readily expanded. All
> kinds of markup languages already do something similar. (markdown,
> Emacs org-mode, mediawiki, etc)

Perhaps you missed the point that it is not only the very character
which is used but also the broken spacing which prevents scripts from
detecting levels of itemizing list.

Yes, we have more than one level itemizings in our descriptions (see
my initial posting. Detecting these would need either a defined
character or a defined spacing (IMHO an 'and' would be better than
a non-exclusive 'or' here).

> I find the descriptions on packages.d.o just fine right now.

IMHO it is no argument that a specific person is happy with the layout
everybody else is. If a text has a certain logic it should to be supported
by the means a certain output style has. HTML can express a list and
so it should if we want to express lists.

> Having sad that, I would not be averse to specifying that leading
> white space and *, +, and - would be acceptable as bullet marks (I
> thought specifying which mark at which level was overspecification).

So you would be in favour of specifying only the amount of white space
to define a level? If this might be accepted as a rough consensus it
is at least helpful to enable tools detecting what they need to detect.
Even if my esthetical feeling goes beyond this I can accept this. But
you also specified three characters (*, +, and -) so do you want to
restrict the acceptable set yourself (for instance not accept 'o')?

Kind regards

Andreas.

--
http://fam-tille.de

Michael Banck

unread,

Apr 16, 2009, 4:30:22 AM4/16/09

to

On Thu, Apr 16, 2009 at 02:34:52AM -0500, Manoj Srivastava wrote:
> Having sad that, I would not be averse to specifying that leading
> white space and *, +, and - would be acceptable as bullet marks (I
> thought specifying which mark at which level was overspecification).

Why don't we say binaries are fine in /usr/bin, /usr/local/bin and /opt
while we are at it, to provide some refreshing alternatives to our
users?

Michael

Manoj Srivastava

unread,

Apr 16, 2009, 5:20:14 AM4/16/09

to

On Thu, Apr 16 2009, Andreas Tille wrote:

> On Thu, 16 Apr 2009, Manoj Srivastava wrote:
>
>> Any script should be able to take the top 4 symbols currently
>> used, and be able to detect them. I think *, +, - and o cover most
>> packages, and the scripts in question can be readily expanded. All
>> kinds of markup languages already do something similar. (markdown,
>> Emacs org-mode, mediawiki, etc)
>
> Perhaps you missed the point that it is not only the very character
> which is used but also the broken spacing which prevents scripts from
> detecting levels of itemizing list.
>
> Yes, we have more than one level itemizings in our descriptions (see
> my initial posting. Detecting these would need either a defined
> character or a defined spacing (IMHO an 'and' would be better than
> a non-exclusive 'or' here).

Umm. I am not sure that follows. I am also not convinced we need
to invent our own rules. Text::Markdown or Text::MultiMarkdown could
help. And they do not seem to have issues with recognizing
indentation/different characters as denoting levels of lists.

>
>> I find the descriptions on packages.d.o just fine right now.
>
> IMHO it is no argument that a specific person is happy with the layout
> everybody else is.

Just like it is no argument that someone think something is ugly
that means everyone thinks so too.

> If a text has a certain logic it should to be
> supported by the means a certain output style has. HTML can express a
> list and so it should if we want to express lists.

And we do not need to specify any more rigid rules than
established systems like markdown do in order to achieve that. Indeed,
we can just pipe the description though markdown, and use the html

>
>> Having sad that, I would not be averse to specifying that leading
>> white space and *, +, and - would be acceptable as bullet marks (I
>> thought specifying which mark at which level was overspecification).
>
> So you would be in favour of specifying only the amount of white space
> to define a level?

You do not have to specify the level. Just that the indentation
be sufficient for the user or markdown to be able to differentiate what
level the item is at.

> If this might be accepted as a rough consensus it is at least helpful
> to enable tools detecting what they need to detect. Even if my
> esthetical feeling goes beyond this I can accept this. But you also
> specified three characters (*, +, and -) so do you want to restrict
> the acceptable set yourself (for instance not accept 'o')?

I suggest we follow a convention and tool set already in place,
with multiple language bindings, if you must insist on adding rules to
the long description.

There are alternatives (Text::Textile comes to mind), but
Markdown has better language support, so long description parsers might
have an easier time.

I suggest, for readability, to use a subset of markdown; the
link and image tags are not that human readable.

manoj

http://en.wikipedia.org/wiki/Markdown
http://markdown.infogami.com/
http://daringfireball.net/projects/markdown/syntax

--
Man's horizons are bounded by his vision.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Ben Finney

unread,

Apr 16, 2009, 5:50:12 AM4/16/09

to

(following up on IRC discussion)

Manoj Srivastava <sriv...@debian.org> writes:

> I suggest we follow a convention and tool set already in place,
> with multiple language bindings, if you must insist on adding rules to
> the long description.
>
> There are alternatives (Text::Textile comes to mind), but
> Markdown has better language support, so long description parsers might
> have an easier time.
>
> I suggest, for readability, to use a subset of markdown; the
> link and image tags are not that human readable.

reStructuredText <URL:http://docutils.sourceforge.net/rst.html> (reST)
is, I argue, a superior choice to Markdown for our existing format.

Markdown explicitly assumes the writer is going to punt to HTML for
anything not covered by Markdown, which severely limits its future
flexibility in contexts where we don't want to put HTML in the source.

reST, on the other hand, makes no such assumptions about enclosing
context; it was initially designed for documentation in program source
code, which is much closer to our needs for text in a control field.

It also helps that the simple bullet lists that are the most common case
are perfectly valid in reST too.

--
\ “Never express yourself more clearly than you are able to |
`\ think.” —Niels Bohr |
_o__) |
Ben Finney

Andreas Tille

unread,

Apr 16, 2009, 6:00:24 AM4/16/09

to

On Thu, 16 Apr 2009, Manoj Srivastava wrote:

>> my initial posting. Detecting these would need either a defined
>> character or a defined spacing (IMHO an 'and' would be better than
>> a non-exclusive 'or' here).
>
> Umm. I am not sure that follows. I am also not convinced we need
> to invent our own rules.

I tried to suggest *any* rule which works. I'm not in favour of invanting
new rules. But the rules should be simple enough to not break any existing
tool.

> Text::Markdown or Text::MultiMarkdown could
> help. And they do not seem to have issues with recognizing
> indentation/different characters as denoting levels of lists.

If I interpret your first link [1] right this are even *more* rules as
I suggested.

>>> I find the descriptions on packages.d.o just fine right now.
>>
>> IMHO it is no argument that a specific person is happy with the layout
>> everybody else is.
>
> Just like it is no argument that someone think something is ugly
> that means everyone thinks so too.
>
>> If a text has a certain logic it should to be
>> supported by the means a certain output style has. HTML can express a
>> list and so it should if we want to express lists.

Please do not split my paragraphs to blur my arguing. Thanks.

> And we do not need to specify any more rigid rules than
> established systems like markdown do in order to achieve that. Indeed,
> we can just pipe the description though markdown, and use the html

Have you tested this suggestion whether the current long descriptions will
render correctly?

>> So you would be in favour of specifying only the amount of white space
>> to define a level?
>
> You do not have to specify the level. Just that the indentation
> be sufficient for the user or markdown to be able to differentiate what
> level the item is at.

I'm sorry - I do not know markdown whether it is clever enough to render
the lists in all long descriptions. But as long as the hint "please
make sure that your long description renders with markdown" is not
written in any of our documents I really doubt that. May I draw the
conclusion that you are also in favour of some rules but not really
happy with the rules I suggested? That's really fine for me. I just
want *any* rule which *works* and is written down somewhere to enable
us filing bug reports against packages which do not follow this rule.
I think I mentioned this in my postings of this thread.

> I suggest we follow a convention and tool set already in place,
> with multiple language bindings, if you must insist on adding rules to
> the long description.
>
> There are alternatives (Text::Textile comes to mind), but
> Markdown has better language support, so long description parsers might
> have an easier time.

I do not want any complicated tool to parse our long descriptions.
In principle they are really easy to parse. I want to have the
simplest possible rule set which enables us to reliable parse the
logic of our long descriptions. While you claim to be against rules
you propose even harder to apply rules. At least for me your suggestions
are confusing and just bluring the issue.

> I suggest, for readability, to use a subset of markdown; the
> link and image tags are not that human readable.

Yes - that's perfectly fine. We are just using a subset of markdown
actually - a much simpler one than the suggested, without features like
italics and strong, headings etc. And we do not really need it - we
just should keep it simple to not break any existing tool. If there
is a library which reliably can detect the logic of the current long
descriptions probably nothing has to be changed. But I doubt there is
one and I really wonder why anybody who is happy with the current rendering
is suggesting even more complex things.

Kind regards

Andreas.

[1] http://en.wikipedia.org/wiki/Markdown

--
http://fam-tille.de

Tzafrir Cohen

unread,

Apr 16, 2009, 6:00:21 AM4/16/09

to

On Thu, Apr 16, 2009 at 04:01:20AM -0500, Manoj Srivastava wrote:

> Umm. I am not sure that follows. I am also not convinced we need
> to invent our own rules. Text::Markdown or Text::MultiMarkdown could
> help. And they do not seem to have issues with recognizing
> indentation/different characters as denoting levels of lists.

Character-level formatting of markdown as well?

Two examples:

* From abcmidi:

This package contains the programs `abc2midi' and `midi2abc', which

* From alltray:

KDE, XFCE 4*, Fluxbox* and WindowMaker*.
(*) No drag 'n drop support. Enable with "-nm" option.

Ben Finney

unread,

Apr 16, 2009, 6:20:12 AM4/16/09

to

Ben Finney <ben+d...@benfinney.id.au> writes:

> (following up on IRC discussion)
>
> Manoj Srivastava <sriv...@debian.org> writes:
>
> > I suggest, for readability, to use a subset of markdown; the
> > link and image tags are not that human readable.
>
> reStructuredText <URL:http://docutils.sourceforge.net/rst.html> (reST)
> is, I argue, a superior choice to Markdown for our existing format.

Note that, like Manoj, I'm suggesting only a *subset*, not the full
specification.

--
\ “Like the creators of sitcoms or junk food or package tours, |
`\ Java's designers were consciously designing a product for |
_o__) people not as smart as them.” —Paul Graham |

Andreas Tille

unread,

Apr 16, 2009, 8:50:20 AM4/16/09

to

On Thu, 16 Apr 2009, Ben Finney wrote:

> Note that, like Manoj, I'm suggesting only a *subset*, not the full
> specification.

Well, in this thread we had several suggestions reaching from complete
change to different format up to "not in detail specified" subsets of
other formats. IMHO this does not bring us foreward a single step.
If we want to move foreward we have to make sure that we will not be
forced to touch every single package because such an intend will be
bound to fail and every minute spended in discussion here is simply
wasted. So if you suggest a subset of a specification please state
clearly which subset and whether it works with currently existing
descriptions. I'd volunteer to set up a doodle poll with suggestions.

If you make a suggestion please answer the following question:

A. Does the suggestion enable parsing logical structures like
two level itemize lists?
(This is what I want to approach and what is IMHO needed)
B. Does the suggestion enable keeping the majority of description
untouched and enables keeping the currently existing tools?
(This is important to gain any acceptance)

If one of the question above is answered with "no" please mention
whether you are volunteering to do the work which is needed to
port the existing stuff to match your suggestion.

Currently I would feed the poll with 4 suggestions:

0. Keep anything as unstructured as it is.
Answer to A: no
Answer to B: yes

1. Use '*' for first order item lists, '-' for second order
item lists and use ' ' (exactly two spaces) before the
'*' and ' ' (exactly four spaces) before the '-'. After
'*' and '-' exactly one space should be used and continued
lines should start in the same column as the text starts
above.
Answer to A: yes
Answer to B: yes

2. Use '*' for first order item lists, '-' for second order
item lists. Spacing does not matter as long as continued
lines will start in the same column as the text above.
Answer to A: yes
Answer to B: yes

3. Use any character of ('*', '-', '+') to start a list and
mark the level of the list by strictly following spacing
rules and use ' ' (exactly two spaces) before the selected
character for starting first order list and ' ' (exactly
four spaces) before the character for starting second order
list. After the marker symbold exactly one space should be
used and continued lines should start in the same column as
the text starts above.
Answer to A: yes
Answer to B: yes

If you want to make further suggestions just append this list.
I'll start a doodle poll next Monday. Depending from the outcome
of this poll I will submit a patch for "6.2. Best practices for
debian/control".

Does this sound reasonable?

Kind regards

Andreas.

--
http://fam-tille.de

Manoj Srivastava

unread,

Apr 16, 2009, 11:20:13 AM4/16/09

to

On Thu, Apr 16 2009, Ben Finney wrote:

> (following up on IRC discussion)
>
> Manoj Srivastava <sriv...@debian.org> writes:
>
>> I suggest we follow a convention and tool set already in place,
>> with multiple language bindings, if you must insist on adding rules to
>> the long description.
>>
>> There are alternatives (Text::Textile comes to mind), but
>> Markdown has better language support, so long description parsers might
>> have an easier time.
>>
>> I suggest, for readability, to use a subset of markdown; the
>> link and image tags are not that human readable.
>
> reStructuredText <URL:http://docutils.sourceforge.net/rst.html> (reST)
> is, I argue, a superior choice to Markdown for our existing format.

I can live with restructured text. I would like to point out,
though, that the language support is more mature in markdown, and the
subset of features we care about are identical in markdown and rest.

> It also helps that the simple bullet lists that are the most common case
> are perfectly valid in reST too.

Right.

manoj

--
Patageometry, n.: The study of those mathematical properties that are
invariant under brain transplants.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Manoj Srivastava

unread,

Apr 16, 2009, 11:30:08 AM4/16/09

to

On Thu, Apr 16 2009, Andreas Tille wrote:

> On Thu, 16 Apr 2009, Ben Finney wrote:
>
>> Note that, like Manoj, I'm suggesting only a *subset*, not the full
>> specification.
>
> Well, in this thread we had several suggestions reaching from complete
> change to different format up to "not in detail specified" subsets of
> other formats. IMHO this does not bring us foreward a single step.
> If we want to move foreward we have to make sure that we will not be
> forced to touch every single package because such an intend will be

This is exactly why I like markdown or restructured text, most
packages conform already.

> bound to fail and every minute spended in discussion here is simply
> wasted. So if you suggest a subset of a specification please state
> clearly which subset and whether it works with currently existing
> descriptions. I'd volunteer to set up a doodle poll with suggestions.

Voting is a piss poor means of making a technical decision.

At this point, I would say rules for lists, and bold/italics
should not be any more restrictive than markdown/ReST, and not impose
any more burdens on the description writer.

> If you make a suggestion please answer the following question:
>
> A. Does the suggestion enable parsing logical structures like
> two level itemize lists?
> (This is what I want to approach and what is IMHO needed)

Markdown and ReST, trivially.

> B. Does the suggestion enable keeping the majority of description
> untouched and enables keeping the currently existing tools?
> (This is important to gain any acceptance)

Yes, for both.

The one issue I have seen raised is that of using *italics* and
**bold** text; there are package descriptions where italics will
suddenly appear. Me, I like org mode, where we have /italics/, *bold*
+strikethrough+, _underline_; bug I doubt that org-mode will be popular
as an interpreter.

manoj
--
It is better to have loved and lost -- much better.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Manoj Srivastava

unread,

Apr 16, 2009, 12:00:17 PM4/16/09

to

On Thu, Apr 16 2009, Andreas Tille wrote:

> On Thu, 16 Apr 2009, Manoj Srivastava wrote:
>
>>> my initial posting. Detecting these would need either a defined
>>> character or a defined spacing (IMHO an 'and' would be better than
>>> a non-exclusive 'or' here).
>>
>> Umm. I am not sure that follows. I am also not convinced we need
>> to invent our own rules.
>
> I tried to suggest *any* rule which works. I'm not in favour of invanting
> new rules. But the rules should be simple enough to not break any existing
> tool.

Which is good, since Markdown/ReST rules for lists will only
make the lists using o as the bullet out of whack.

>
>> Text::Markdown or Text::MultiMarkdown could
>> help. And they do not seem to have issues with recognizing
>> indentation/different characters as denoting levels of lists.
>
> If I interpret your first link [1] right this are even *more* rules as
> I suggested.

None of which are mandatory. All the package descriptions I read
in /var/lib/dpkg/available seems to pass, though a couple had italics
in strange places. This is not a fatal flaw.

>
>>>> I find the descriptions on packages.d.o just fine right now.
>>>
>>> IMHO it is no argument that a specific person is happy with the layout
>>> everybody else is.
>>
>> Just like it is no argument that someone think something is ugly
>> that means everyone thinks so too.
>>
>>> If a text has a certain logic it should to be
>>> supported by the means a certain output style has. HTML can express a
>>> list and so it should if we want to express lists.
>
> Please do not split my paragraphs to blur my arguing. Thanks.

Heh. Ever heard of inline answers?

>> And we do not need to specify any more rigid rules than
>> established systems like markdown do in order to achieve that. Indeed,
>> we can just pipe the description though markdown, and use the html
>
> Have you tested this suggestion whether the current long descriptions will
> render correctly?

Yup.

>>> So you would be in favour of specifying only the amount of white space
>>> to define a level?
>>
>> You do not have to specify the level. Just that the indentation
>> be sufficient for the user or markdown to be able to differentiate what
>> level the item is at.
>
> I'm sorry - I do not know markdown whether it is clever enough to
> render the lists in all long descriptions. But as long as the hint
> "please make sure that your long description renders with markdown" is
> not written in any of our documents I really doubt that. May I draw

Doubt is fine. Actually reading the package descriptions would
have been better.

>>>>> Tag \* was used 9277 times (68.0900%)
>>>>> Tag - was used 3837 times (28.1600%)
>>>>> Tag + was used 120 times (.8800%)

These work.

>>>>> Tag o was used 390 times (2.8600%)

These do not.

Now, using *italic* had a few issues. There are 99 lines in
available where * is not used as a list item tag.

Of these 99 lines, 27 places the *word* is used for emphasis,
meaning that 72 places in the available file * is used as a
wildcard. But not all of these are an issue:

--8<---------------cut here---------------start------------->8---
__> echo ' bsd* and others.' | markdown
bsd* and others.
--8<---------------cut here---------------end--------------->8---

In those 72 places, only 24 descriptions did we have a second *
show up, to anchor the other end of the mistaken emphasis.

> the conclusion that you are also in favour of some rules but not
> really happy with the rules I suggested? That's really fine for me.
> I just want *any* rule which *works* and is written down somewhere to
> enable us filing bug reports against packages which do not follow this
> rule. I think I mentioned this in my postings of this thread.

I suggest you try it out, before handwaving vague FUD
around. Even tnftp description works fine with either. There are very
few descriptions (about 24 or so) where we might have unwanted
emphasis. I think we can have that fixed.

>> I suggest we follow a convention and tool set already in place,
>> with multiple language bindings, if you must insist on adding rules to
>> the long description.
>>
>> There are alternatives (Text::Textile comes to mind), but
>> Markdown has better language support, so long description parsers might
>> have an easier time.
>
> I do not want any complicated tool to parse our long descriptions. In
> principle they are really easy to parse. I want to have the simplest
> possible rule set which enables us to reliable parse the logic of our
> long descriptions. While you claim to be against rules you propose
> even harder to apply rules. At least for me your suggestions are
> confusing and just bluring the issue.

I would simplify the rule, as opposed to having a trivial
library call in the tool. Indeed, reusing the libraries provided is
*less* work for the parser, than a NIH new parser.

>
>> I suggest, for readability, to use a subset of markdown; the
>> link and image tags are not that human readable.
>
> Yes - that's perfectly fine. We are just using a subset of markdown
> actually - a much simpler one than the suggested, without features
> like italics and strong, headings etc. And we do not really need it -
> we just should keep it simple to not break any existing tool. If
> there is a library which reliably can detect the logic of the current
> long descriptions probably nothing has to be changed. But I doubt
> there is one and I really wonder why anybody who is happy with the
> current rendering is suggesting even more complex things.

I think we need the emphasis almost as much as we need lists;
and people are already using *word* for emphasis in desciptions
(though not all that many).

manoj
--
Teutonic: Not enough gin.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Manoj Srivastava

unread,

Apr 16, 2009, 12:00:19 PM4/16/09

to

On Thu, Apr 16 2009, Tzafrir Cohen wrote:

> On Thu, Apr 16, 2009 at 04:01:20AM -0500, Manoj Srivastava wrote:
>
>> Umm. I am not sure that follows. I am also not convinced we need
>> to invent our own rules. Text::Markdown or Text::MultiMarkdown could
>> help. And they do not seem to have issues with recognizing
>> indentation/different characters as denoting levels of lists.
>
> Character-level formatting of markdown as well?
>
> Two examples:
>
> * From abcmidi:
>
> This package contains the programs `abc2midi' and `midi2abc', which

Yup, this one is a problem.
This package contains the programs <code>abc2midi\' and</code>midi2abc\', which

So using ` as a quote seems to be an issue.
__> egrep '`' /var/lib/dpkg/available | wc -l
149
Less than 150 instances.

> * From alltray:
>
> KDE, XFCE 4*, Fluxbox* and WindowMaker*.
> (*) No drag 'n drop support. Enable with "-nm" option.

__> echo "KDE, XFCE 4*, Fluxbox* and WindowMaker*.
(*) No drag 'n drop support. Enable with "-nm" option." | markdown
KDE, XFCE 4*, Fluxbox* and WindowMaker*.
(*) No drag 'n drop support. Enable with -nm option.

Hmm. Looks fine to me.

manoj
--
"If Diet Coke did not exist it would have been necessary to invent it."
Karl Lehenbauer

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Manoj Srivastava

unread,

Apr 16, 2009, 12:10:13 PM4/16/09

to

Hi,

Oh, markdown is only confused when you have `two' `words'
quoted like this, wqhen there is only one such quote in the package, we
are fine.

This package contains the programs `abc2midi' which

So, less than 149 instances of the <code> tag where we want none.

manoj
finding fewer problems in the descriptions than expected
--
"Slime is the agony of water." Jean-Paul Sartre

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Christian Perrier

unread,

Apr 16, 2009, 1:40:22 PM4/16/09

to

Quoting Lars Wirzenius (l...@liw.fi):
> to, 2009-04-16 kello 08:42 +0200, Christian Perrier kirjoitti:
> > I have never been able to find any such solid reference for English.
> > There is probably something in the Chicago Manual of Style, that's
> > generally accepted as the Right Reference for en_US.
> >
> > Maybe more input from our experts on debian-l10n-english?
>
> I'm not an expert, but I have the 14th edition of the CMS. It says both
> bullets and dashes are acceptable (8.77, page 314, for reference).

Well, based on that discussion, these facts and the current practice,
I think that, in Smith reviews, we will, from now, recommend the use
of asterisks for 1st level items in item lists, in package
descriptions and debconf templates (these are the texts we review).

Please note that this is not *enforcing* things on maintainers. All
Smith reviews are suggestions made to maintainers and they are
associated to the whole discussion/review. When maintainers insist on
some practice (or even spelling|wording) we always follow their advice
at the end....even for mainainers who insist on using first person
sentences (hint hint).

The same will happen for item lists.

signature.asc

Manoj Srivastava

unread,

Apr 16, 2009, 2:10:08 PM4/16/09

to

Hi,

I think we need to enumerate some goals for this proposed
change. Here is a start:

- Minimal disruption for current packages. The impact should be
measured by numbers of packages impacted
+ Any specification of which of *, +, - to use as th first level item
will impact more packages than not specifying it, by several
hundred
+ The same is true for specifying the mark used for second level list
items
+ Specifying exact number of spaces will also hit current packages,
and will be a source of errors in the future.
- Ability to recognize and render the following logical entities, in
decreasing order of importance:
+ unordered lists
+ ordered lists
+ emphasis
+ strong emphasis
+ definition lists
+ hypertext links
+ underlines, and strike throughs
- Readability for people looking at non-enhanced renditions, i.e.,
using less on the Packages file. Sticking to widely known
conventions, using the same conventions that peple are used to using
in email, and Wikis, is a plus.
- Ease of use for description writers.
Again, sticking with standards that people already know and use is
better than making our own, more restrictive standards
- Not adding hugely to bloat for the Packages file
This kinda excludes verbose markup like XML (which would have failed
the readability test too)

At this point, I would say that Markdown/Resstructued text meets
most of the goals above, as long as we restrict the markup to the list
above:
* unordered lists
* ordered lists
* emphasis
* strong emphasis
* definition lists
* hypertext links
* underlines, and strike throughs

manoj
--
"If we can't fix it -- we'll fix it so nobody can." Gibbons

Giacomo Catenazzi

unread,

Apr 16, 2009, 3:30:10 PM4/16/09

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Manoj Srivastava wrote:
> - Ability to recognize and render the following logical entities, in
> decreasing order of importance:
> + unordered lists
> + ordered lists

really needed?

> + emphasis
> + strong emphasis
> + definition lists
> + hypertext links
> + underlines, and strike throughs

I don't think they are needed. Underlines is generally bad,
strike throughs are worse ;-)

Ev. also monospace, e.g. for commands, but I really prefer to have
a simpler language as possible.

> At this point, I would say that Markdown/Resstructued text meets
> most of the goals above, as long as we restrict the markup to the list
> above:

Could provide us an example of Resstructued for the basic constructs?

> * unordered lists
> * ordered lists
> * emphasis
> * strong emphasis
> * definition lists
> * hypertext links
> * underlines, and strike throughs

I like also creole (standardized wiki language, moinmoin support it), but no definition lists,
underline, strike throughs.

So for creole:

* unordered lists \n * \n **
* ordered lists \n # \n ##
* emphasis //foo//
* strong emphasis **bar**
* definition lists missing ev. \n **spam** is spam
* hypertext links normal url
* underlines, and strike throughs missing, missing

ciao
cate
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknnhU8ACgkQ+ZNUJLHfmlfJigCfR/Jpn96l7FxHb9INlJlHkd+S
z+MAn2eM+rOOHN9n8LJTYXi/gT7cWuMa
=3a5+
-----END PGP SIGNATURE-----

Manoj Srivastava

unread,

Apr 16, 2009, 4:20:22 PM4/16/09

to

On Thu, Apr 16 2009, Giacomo Catenazzi wrote:

> Manoj Srivastava wrote:
>> - Ability to recognize and render the following logical entities, in
>> decreasing order of importance:
>> + unordered lists
>> + ordered lists
>
> really needed?

I would think these are the guts of this proposal. Or else what
are we discussing here?

>
>> + emphasis
>> + strong emphasis
>> + definition lists
>> + hypertext links
>> + underlines, and strike throughs
>
> I don't think they are needed.

Why not? If rendering a description in a manner that makes it
easier to read is the goal, I fail to see why emphasis and strong
emphasis is a bad idea (think of text-to-speech mechanisms). This is
not just opinions we are discussing here, we should be looking at use
cases for marking up a textual description.

> Underlines is generally bad, strike throughs are worse ;-)

So you say. Don't use them, then. There are cases where either
one of these constructs have value; and you should not impose your
personal aesthetics on a general policy discussion.

> Ev. also monospace, e.g. for commands, but I really prefer to have
> a simpler language as possible.
>
>> At this point, I would say that Markdown/Resstructued text meets
>> most of the goals above, as long as we restrict the markup to the list
>> above:
>
> Could provide us an example of Resstructued for the basic constructs?

>> * unordered lists
>> * ordered lists
>> * emphasis
>> * strong emphasis
>> * definition lists
>> * hypertext links
>> * underlines, and strike throughs
>
> I like also creole (standardized wiki language, moinmoin support it),
> but no definition lists, underline, strike throughs.

What kind of language bindings are present for creole libraries?
markdown has a shell interpreter, has python, perl, ruby, C, c++, lisp,
and is widely supported and used by wikis et al.

> So for creole:
>
> * unordered lists \n * \n **

This fails the "Do not impact large numbers of packages" test,
since we have lots of packages using + and -. for list items.

> * ordered lists \n # \n ##
> * emphasis //foo//

This also fails the test above -- lots of people are using
*emphasis*.

> * strong emphasis **bar**
> * definition lists missing ev. \n **spam** is spam

Hmm

> * hypertext links normal url
> * underlines, and strike throughs missing, missing

ok.

manoj

--
There's just something I don't like about Virginia; the state.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Stefano Zacchiroli

unread,

Apr 16, 2009, 5:00:13 PM4/16/09

to

On Thu, Apr 16, 2009 at 12:50:12PM -0500, Manoj Srivastava wrote:
> I think we need to enumerate some goals for this proposed
> change. Here is a start:
>
> - Minimal disruption for current packages. The impact should be
> measured by numbers of packages impacted

<snip>

> At this point, I would say that Markdown/Resstructued text meets
> most of the goals above, as long as we restrict the markup to the
> list above:

I agree with the goals and thanks for "resetting" the discussion on
their grounds.

According to the goals you pointed out, it looks like that Markdown
would be a more than suitable choice in terms of availability of
implementations, matching of "mail-like" markup (which is actually one
of the design goal of the language), and minimal disruption.

[ Markdown would also be my choice in term of personal tastes. Not
that it matters, but I mention it to it make clear which is my
"church" in this respect :) ]

However, markdown would not be directly applicable to the content of
the long description field, as a RFC822 parser would give you, due to
'.'s used as paragraph separators. Sure the needed pre-processing to
fix that would be trivial, but it is *some kind* of
pre-processing. One can then wonder to which extent we would allow
pre-processing before the markup processor without considering that
need a "disruption" of current long descriptions.

I just felt like pointing that out, because it can put back into play
some other language which can be considered "non disrupting" by
allowing some extra pre-processing bits. ... nevertheless I completely
agree that something like Markdown + the minimal paragraph separator
pre-processing looks like a completely reasonable implementation
plan. Out of curiosity, would restructured text be immune to this
problem?

Cheers.

--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime

signature.asc

Andreas Tille

unread,

Apr 17, 2009, 4:10:09 AM4/17/09

to

On Thu, 16 Apr 2009, Manoj Srivastava wrote:

> Which is good, since Markdown/ReST rules for lists will only
> make the lists using o as the bullet out of whack.

Fine.

> None of which are mandatory. All the package descriptions I read
> in /var/lib/dpkg/available seems to pass, though a couple had italics
> in strange places. This is not a fatal flaw.

No - this perfectly fits my intention that some descriptions have to be fixed.
We just need guidelines for developers to follow.

>>>>> I find the descriptions on packages.d.o just fine right now.
>>>>
>>>> IMHO it is no argument that a specific person is happy with the layout
>>>> everybody else is.
>>>
>>> Just like it is no argument that someone think something is ugly
>>> that means everyone thinks so too.
>>>
>>>> If a text has a certain logic it should to be
>>>> supported by the means a certain output style has. HTML can express a
>>>> list and so it should if we want to express lists.
>>
>> Please do not split my paragraphs to blur my arguing. Thanks.
>
> Heh. Ever heard of inline answers?

In most cases I manage to ignore this kind of questions. Try reading my
mail again to find out a reasonable answer to your question yourself.

> I suggest you try it out, before handwaving vague FUD
> around. Even tnftp description works fine with either. There are very
> few descriptions (about 24 or so) where we might have unwanted
> emphasis. I think we can have that fixed.

But what exactly do I have to do to get the item lists marked?

grep-available -s Description -F Package airport-utils | markdown
Description: configuration and management utilities for Apple AirPort base stations
This package contains various utilities to manage the Apple AirPort base
stations.
.
Be aware that Apple released several versions of the AirPort base station;
the original AirPort ("Graphite") was a rebranded Lucent RG-1000 base
station, doing 802.11a/b. The AirPort Extreme ("Snow") is an Apple-built
802.11a/b/g base station.
.
For the original Apple AirPort and the Lucent RG-1000 base stations only:
- airport-config: base station configurator
- airport-linkmon: wireless link monitor, gives information on the wireless
link quality between the base station and the associated hosts
.
For the Apple AirPort Extreme base stations only:
- airport2-config: base station configurator
- airport2-portinspector: port maps monitor
- airport2-ipinspector: WAN interface monitoring utility
.
For all:
- airport-modem: modem control utility, displays modem state, starts/stops
modem connections, displays the approximate connection time (Extreme only)
- airport-hostmon: wireless hosts monitor, lists wireless hosts connected
to the base station (see airport2-portinspector for the Snow)

$ grep-available -s Description -F Package tnftp | markdown
Description: The enhanced ftp client
tnftp is what many users affectionately call the enhanced ftp
client in NetBSD (http://www.netbsd.org).
.
This package is a <code>port' of the NetBSD ftp client to other systems.
.
The enhancements over the standard ftp client in 4.4BSD include:
* command-line editing within ftp
* command-line fetching of URLS, including support for:
- http proxies (c.f: $http_proxy, $ftp_proxy)
- authentication
* context sensitive command and filename completion
* dynamic progress bar
* IPv6 support (from the WIDE project)
* modification time preservation
* paging of local and remote files, and of directory listings
(c.f:</code>lpage', <code>page',</code>pdir')
* passive mode support, with fallback to active mode
* <code>set option' override of ftp environment variables
* TIS Firewall Toolkit gate ftp proxy support (c.f:</code>gate')
* transfer-rate throttling (c.f: <code>-T',</code>rate')

> I would simplify the rule, as opposed to having a trivial
> library call in the tool. Indeed, reusing the libraries provided is
> *less* work for the parser, than a NIH new parser.

I'm really in favour of reusing a library (and I wonder whether I wrote
anything in contrast to this). I just fail to see any effect when using
markdown except that the description is now enclosed in and
some other markups appear which could be fixed. But the intended result
to get a list markup is not reached. Or did I missed something?

> I think we need the emphasis almost as much as we need lists;
> and people are already using *word* for emphasis in desciptions
> (though not all that many).

I'm not against implementing emphasis which might be also an interesting
enhancement and if it is a small amount of packages which need to be
fixed these most probably need to be fixed in plain text anyway. So if
you enlighten me how the lists could work I'm perfectly happy.

Kind regards

Andreas.

--
http://fam-tille.de

Peter Pentchev

unread,

Apr 17, 2009, 9:10:09 AM4/17/09

to

Just as a kind of clarification: Manoj, I think that Giacomo's comments
were only to the *last* item of the text he quoted, not to the whole
portion above it :) Thus, IMHO his first "really needed?" question
referred specifically to the "ordered lists" item, and the "I don't think
they are needed" referred specifically to the "underlines and
strike-throughs", not to the emphasis, strong emphasis, etc.

G'luck,
Peter

--
Peter Pentchev ro...@ringlet.net ro...@space.bg ro...@FreeBSD.org
PGP key: http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553
If this sentence didn't exist, somebody would have invented it.

Andreas Tille

unread,

Apr 18, 2009, 12:20:39 PM4/18/09

to

On Sat, 18 Apr 2009, Vincent Danjean wrote:

> Remove the first space, remove the '.' that are alone on their line,

That's cheap.

> add a blank line before enumeration (this last point seems the more
> annoying to me: it can be difficult to automatically find where to
> insert a blank line).

Well - here is the crux which let's me wonder whether Manoj was
right in his posting[1] when he claimed:

> > If you make a suggestion please answer the following question:
> >
> > A. Does the suggestion enable parsing logical structures like
> > two level itemize lists?
> > (This is what I want to approach and what is IMHO needed)
>
> Markdown and ReST, trivially.
>
> > B. Does the suggestion enable keeping the majority of description
> > untouched and enables keeping the currently existing tools?
> > (This is important to gain any acceptance)
>
> Yes, for both.

It is neither trivial to detect the point where to add the needed
blank line nor would it be a solution to advise people alwasy to
enclose lists in blank lines because people will tell you that
this will look ugly in the existing interfaces. So I would rather
tend to "No for both" and this is the crux here.

So while I perfectly agree with Manoj that voting on technical
decisions is a bad idea I come back to my initial suggestion because
my suggestions are technically equivalent but express a matter of
taste of the developers which might lead to better acceptance.

I would love if somebody could provide a proof that I'm wrong and
there is a reliable way to turn long descriptions into proper markdown
input to *really* be able to detect the lists. If not I think I
continue with my intention as described. [2]

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/04/msg00652.html
[2] http://lists.debian.org/debian-devel/2009/04/msg00643.html

Manoj Srivastava

unread,

Apr 18, 2009, 1:00:11 PM4/18/09

to

On Sat, Apr 18 2009, Andreas Tille wrote:

> On Sat, 18 Apr 2009, Vincent Danjean wrote:
>
>> Remove the first space, remove the '.' that are alone on their line,
>
> That's cheap.
>
>> add a blank line before enumeration (this last point seems the more
>> annoying to me: it can be difficult to automatically find where to
>> insert a blank line).
>
> Well - here is the crux which let's me wonder whether Manoj was
> right in his posting[1] when he claimed:
>
>> > If you make a suggestion please answer the following question:
>> >
>> > A. Does the suggestion enable parsing logical structures like
>> > two level itemize lists?
>> > (This is what I want to approach and what is IMHO needed)
>>
>> Markdown and ReST, trivially.
>>
>> > B. Does the suggestion enable keeping the majority of description
>> > untouched and enables keeping the currently existing tools?
>> > (This is important to gain any acceptance)
>>
>> Yes, for both.
>
> It is neither trivial to detect the point where to add the needed
> blank line nor would it be a solution to advise people alwasy to

Actually, it is pretty trivial. It is a second chanpeter
exercise in K&R; it is a first month exercise in computer science 101.

Here is an algorithm:
--8<---------------cut here---------------start------------->8---
we are not in a list
while reading each line, do
remove leading space
if the only non white space character on the line is a singe .
remove the dot
if the line matches the regexp: '^\s+[\*\+\-]\s+'
if we are not in a list
emit blank line first
record we are not in a list
else
if we are in a list
record we are not in a list
emit line
--8<---------------cut here---------------end--------------->8---

People who can not convert this 13 line Psuedocode into a real
code should not be writing stuff to pretty print descriptions.

> enclose lists in blank lines because people will tell you that
> this will look ugly in the existing interfaces. So I would rather
> tend to "No for both" and this is the crux here.

Frankly, I think this is very wrong.

> So while I perfectly agree with Manoj that voting on technical
> decisions is a bad idea I come back to my initial suggestion because
> my suggestions are technically equivalent but express a matter of
> taste of the developers which might lead to better acceptance.
>
> I would love if somebody could provide a proof that I'm wrong and
> there is a reliable way to turn long descriptions into proper markdown
> input to *really* be able to detect the lists. If not I think I
> continue with my intention as described. [2]

Is the above algorithm proof enough for you? Or do I have to
write that into real code in your favourite porogramming language
before you can see it?

manoj
--
"The minority is always right." Henrik Ibsen 1828-1906

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Ben Finney

unread,

Apr 18, 2009, 1:10:15 PM4/18/09

to

Peter Pentchev <ro...@ringlet.net> writes:

> Just as a kind of clarification: Manoj, I think that Giacomo's
> comments were only to the *last* item of the text he quoted, not to
> the whole portion above it :) Thus, IMHO his first "really needed?"
> question referred specifically to the "ordered lists" item, and the "I
> don't think they are needed" referred specifically to the "underlines
> and strike-throughs", not to the emphasis, strong emphasis, etc.

Traps for new players: One must remember to trim irrelevant quoted
material so it's clear what the context of one's responses are.

--
\ “You can't have everything; where would you put it?” —Steven |
`\ Wright |
_o__) |
Ben Finney

Vincent Danjean

unread,

Apr 18, 2009, 1:10:35 PM4/18/09

to

Andreas Tille wrote:
> But what exactly do I have to do to get the item lists marked?

Remove the first space, remove the '.' that are alone on their line,

add a blank line before enumeration (this last point seems the more
annoying to me: it can be difficult to automatically find where to
insert a blank line).

> grep-available -s Description -F Package airport-utils | markdown

grep-aptavail -s Description -F Package airport-utils | sed -e 's/^ $.$$\?//' -e '/: *$/a\\
' | markdown

Description: configuration and management utilities for Apple AirPort base stations
This package contains various utilities to manage the Apple AirPort base

stations.

Be aware that Apple released several versions of the AirPort base station;

the original AirPort ("Graphite") was a rebranded Lucent RG-1000 base
station, doing 802.11a/b. The AirPort Extreme ("Snow") is an Apple-built

802.11a/b/g base station.

For the original Apple AirPort and the Lucent RG-1000 base stations only:

<ul>
<li>airport-config: base station configurator</li>
<li>airport-linkmon: wireless link monitor, gives information on the wireless
link quality between the base station and the associated hosts</li>
</ul>

For the Apple AirPort Extreme base stations only:

<ul>
<li>airport2-config: base station configurator</li>
<li>airport2-portinspector: port maps monitor</li>
<li>airport2-ipinspector: WAN interface monitoring utility</li>
</ul>

For all:

<ul>
<li>airport-modem: modem control utility, displays modem state, starts/stops

modem connections, displays the approximate connection time (Extreme only)

<ul>
<li>airport-hostmon: wireless hosts monitor, lists wireless hosts connected
to the base station (see airport2-portinspector for the Snow)</li>
</ul></li>
</ul>

Regards,
Vincent

--
Vincent Danjean GPG key ID 0x9D025E87 vdan...@debian.org
GPG key fingerprint: FC95 08A6 854D DB48 4B9A 8A94 0BF7 7867 9D02 5E87
Unofficial pacakges: http://moais.imag.fr/membres/vincent.danjean/deb.html
APT repo: deb http://perso.debian.org/~vdanjean/debian unstable main

Andreas Tille

unread,

Apr 18, 2009, 1:40:09 PM4/18/09

to

On Sat, 18 Apr 2009, Manoj Srivastava wrote:

> Here is an algorithm:
> --8<---------------cut here---------------start------------->8---
> we are not in a list
> while reading each line, do
> remove leading space
> if the only non white space character on the line is a singe .
> remove the dot
> if the line matches the regexp: '^\s+[\*\+\-]\s+'
> if we are not in a list
> emit blank line first
> record we are not in a list
> else
> if we are in a list
> record we are not in a list
> emit line
> --8<---------------cut here---------------end--------------->8---
>
> People who can not convert this 13 line Psuedocode into a real
> code should not be writing stuff to pretty print descriptions.

Thanks for the trust in the programming skills of your fellow
developers. You obviosely are able to write the code to detect
a list *without* using a library. Wasn't it you who told me we
should use a library to *avoid* inventing our own code? So if
you have this code which works perfectly on the input I'm
suggesting since two weeks why you want to add an additional library
on top of this. I feel a little bit bored by this discussion which
is running several circles starts to become personal without any
real reason (I hope I did not gave any) and finally leads to nothing
(at least this is my impression).

>> enclose lists in blank lines because people will tell you that
>> this will look ugly in the existing interfaces. So I would rather
>> tend to "No for both" and this is the crux here.
>
> Frankly, I think this is very wrong.

The solution does not work without the code you wrote above. But you
need this code anyway to detect lists in the long descriptions and so
I wonder where the real profit of an additional library is.

> Is the above algorithm proof enough for you? Or do I have to
> write that into real code in your favourite porogramming language
> before you can see it?

I hope you would not code the bug in line no. 9.

What you basically tried to prove is that you are keen on teaching your
fellow developers programming. Your time would be much better spend if
you would bring the effort forward to finally reach a consensus how we
should change best practices for debian/control to enable the parsing
of list. My suggestions I presented [1] are not in contrast to markdown
and what you finally are using for the description parsing tools -
the algorithm above or a library on top of it - does not matter at all
if we agree to some simple standard.

It would be really helpful if you would return to the constructive way
of discussion I observed in former times instead of bluring the issue
with distracting discussions.

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/04/msg00643.html

--
http://fam-tille.de

Manoj Srivastava

unread,

Apr 18, 2009, 2:40:09 PM4/18/09

to

On Sat, Apr 18 2009, Andreas Tille wrote:

> On Sat, 18 Apr 2009, Manoj Srivastava wrote:
>
>> Here is an algorithm:
>> --8<---------------cut here---------------start------------->8---
>> we are not in a list
>> while reading each line, do
>> remove leading space
>> if the only non white space character on the line is a singe .
>> remove the dot
>> if the line matches the regexp: '^\s+[\*\+\-]\s+'
>> if we are not in a list
>> emit blank line first
>> record we are not in a list

s/not//

>> else
>> if we are in a list
>> record we are not in a list
>> emit line
>> --8<---------------cut here---------------end--------------->8---
>>
>> People who can not convert this 13 line Psuedocode into a real
>> code should not be writing stuff to pretty print descriptions.
>
> Thanks for the trust in the programming skills of your fellow
> developers. You obviosely are able to write the code to detect
> a list *without* using a library. Wasn't it you who told me we
> should use a library to *avoid* inventing our own code? So if
> you have this code which works perfectly on the input I'm
> suggesting since two weeks why you want to add an additional library
> on top of this. I feel a little bit bored by this discussion which
> is running several circles starts to become personal without any
> real reason (I hope I did not gave any) and finally leads to nothing
> (at least this is my impression).

Frankly, I have no idea where this trade is going.

With a 6 line pre-processor, you can feed the grep-dctrl
provided Description fields to Markdown. So, seems like we have come
somewhere -- we have had one investigation that leads one to believe
that there are a small fraction of packages using "o" as a bullet that
need to be changed, and apart fro that there are less than 50 packages
are affected (if we want to specify markdown as the markup language for
descriptions -- and these are the one where we have some unwanted
emphasis, a non-fatal result).

There is a mechanism to pre-process the description for
markdown (Perl implementation below). What more is needed for you to
think this is leading somewhere?

>>> enclose lists in blank lines because people will tell you that
>>> this will look ugly in the existing interfaces. So I would rather
>>> tend to "No for both" and this is the crux here.
>>
>> Frankly, I think this is very wrong.
>
> The solution does not work without the code you wrote above. But you
> need this code anyway to detect lists in the long descriptions and so
> I wonder where the real profit of an additional library is.

*Sigh*.

All I am doing with the code is inserting a line before the
lists. I am not generating html. I am not also handling the _other_
markup that markdown handles, that I presented as something that will
make the description more readable too. The markdown librarys does all
the heavy lifting fro the html generation. If you think my little perl
snippet is the equivalent for what markdown does, you have not looked
at markdown.

I am not re-inventing the wheel when it comes to markup
languages.

We know we needed _some_ pre-processing because we have the
paragraphs separated by ' .', but the code is pretty minimal.

--8<---------------cut here---------------start------------->8---
my $in=0;
while(<>) {
chomp; s/^ //g; s/^\.\s*$//;
if(/^\s+[\*\+\-]\s+/) { print "\n" unless $in++;}
else { $in=0; }
print "$_\n"
}
--8<---------------cut here---------------end--------------->8---

manoj

ps: This can easily become a shell function.

__> grep-aptavail -s Description -P airport-utils | perl -e '
my $in=0;
while(<>) {
chomp; s/^ //g; s/^\.\s*$//;
if(/^\s+[\*\+\-]\s+/) { print "\n" unless $in++;}
else { $in=0; }
print "$_\n"

}' | markdown
Description: configuration and management utilities for Apple AirPort base stations
This package contains various utilities to manage the Apple AirPort base
stations.

Be aware that Apple released several versions of the AirPort base station;
the original AirPort ("Graphite") was a rebranded Lucent RG-1000 base
station, doing 802.11a/b. The AirPort Extreme ("Snow") is an Apple-built
802.11a/b/g base station.

For the original Apple AirPort and the Lucent RG-1000 base stations only:

<ul>
<li>airport-config: base station configurator</li>
<li>airport-linkmon: wireless link monitor, gives information on the wireless
link quality between the base station and the associated hosts</li>
</ul>

For the Apple AirPort Extreme base stations only:

<ul>
<li>airport2-config: base station configurator</li>
<li>airport2-portinspector: port maps monitor</li>
<li>airport2-ipinspector: WAN interface monitoring utility</li>
</ul>

For all:

<ul>
<li>airport-modem: modem control utility, displays modem state, starts/stops
modem connections, displays the approximate connection time (Extreme only)

<ul>
<li>airport-hostmon: wireless hosts monitor, lists wireless hosts connected
to the base station (see airport2-portinspector for the Snow)</li>
</ul></li>
</ul>

--
Never call a man a fool; borrow from him.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Andreas Tille

unread,

Apr 18, 2009, 4:30:13 PM4/18/09

to

On Sat, 18 Apr 2009, Manoj Srivastava wrote:

> Frankly, I have no idea where this trade is going.

IMHO the problem is that you assume our suggestions are in contrast to
each other - but they are not. I wanted to iron out suggestions how
to format the input in a standardised way. What will be done afterwards
is the choice of people who are working with this input. I don't care
whether they choose markdown, restructured text or just take your
perl code and use <ul> / </ul> instead of the additional blank lines
and wrapp the lines in lists in <li> / </li> tags if they need HTML
output. But this is NOT to be discussed HERE (even if it does not
harm. The point is that our input should ENABLE this which needs
a better standardisation of long descriptions.

You are one step after this - and your input is welcome - but there
is no contradiction.

> With a 6 line pre-processor, you can feed the grep-dctrl
> provided Description fields to Markdown.

BTW, your pre-processor will need some additional lines if it comes to
second level lists (and yes, I'm sure this can easily be done - but
this is, and never was the point)

> So, seems like we have come
> somewhere -- we have had one investigation that leads one to believe
> that there are a small fraction of packages using "o" as a bullet that
> need to be changed, and apart fro that there are less than 50 packages
> are affected

Great - let's iron out the advise how to format long descriptions
in our docs to enable us to write lintian checks and file bug reports.
Manoj, we really reached a point here!

> (if we want to specify markdown as the markup language for
> descriptions -- and these are the one where we have some unwanted
> emphasis, a non-fatal result).

Please let's draw this to a different discussion. People who are
responsible for packages.debian.org might be interested and adopt
your idea.

> There is a mechanism to pre-process the description for
> markdown (Perl implementation below). What more is needed for you to
> think this is leading somewhere?

Did I gave the impression that I wanted more? Honestly, I'd be
interested from what part of my mails you are drawing the conclusion
to enhance my communication skills.

> All I am doing with the code is inserting a line before the
> lists. I am not generating html. I am not also handling the _other_
> markup that markdown handles, that I presented as something that will
> make the description more readable too. The markdown librarys does all
> the heavy lifting fro the html generation. If you think my little perl
> snippet is the equivalent for what markdown does, you have not looked
> at markdown.

In the whole discussion I was talking about structuring the input
to ENABLE turning it to html (or whatever structured output you need).
You were discussing steps to actually *do* the step I just wanted to
provide the precondition for. I just was saying if you need a
preprocessor for a library while you could reach a similar result
by tweaking the preprocessor a little bit. I just do not want to
force any programmer to use markdown (even if it has advantages
admittedly as I also agreed to). This was a *sidenote* because this
whole processing of the input is just not my point.

> I am not re-inventing the wheel when it comes to markup
> languages.

Same for me - or am I writing in delirium???

And your divergence of the original topic just blurs the issue -
would you mind rereading my initial mail. [1] Do you agree that
long descriptions need enhancement or not?

> We know we needed _some_ pre-processing because we have the
> paragraphs separated by ' .', but the code is pretty minimal.
>
> --8<---------------cut here---------------start------------->8---
> my $in=0;
> while(<>) {
> chomp; s/^ //g; s/^\.\s*$//;
> if(/^\s+[\*\+\-]\s+/) { print "\n" unless $in++;}
> else { $in=0; }
> print "$_\n"
> }
> --8<---------------cut here---------------end--------------->8---
>
> manoj
>
> ps: This can easily become a shell function.

Again: Please asume for the rest of this thread that I'm not stupid
and know how scripts can be used.

I wonder why you are insisting in providing broken examples. The last
list is formatted wrong - the original description did not contained a
second order list but your result does. The only thing you are doing
is proving my point that we need to enhance the input first - but this
can be done more elegantly and simpler than you are doing.

I would regard it as really helpfull if you would concentrate on
the help I asked for with the words "This suggestion is far from
complete and should be enhanced." in [1]. Once the input data is
well structured we could start another discussion what might be the
best way to process this data.

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/03/msg01165.html
--
http://fam-tille.de

Andreas Tille

unread,

Apr 20, 2009, 5:30:15 PM4/20/09

to

Hi,

as promissed in the overlongish thread [1] I would like to
sort out the details how we should enhance the consistency and
parseability of our long descriptions in a poll. I agree that
it is not a good idea to solve technical issues in a poll.
But this is not about a technical issue. There is a fact that
we need a defined structure (technical issue 1) to be able
to parse the long descriptions (whatever library or self invented
code will be used - technical issue 2). But the details how
the structure should look like is more or less an aesthetical
question (because several tools print the long descriptions
in verbose mode) and so the question is about this aesthetics.
If you want to discuss the technical issues please read all mails
of the thread and continue discussing this (preferably with a
new subject).

Here is the URL of the poll:

http://doodle.com/2bp8rrh3i35sr4s7

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/03/msg01165.html

Manoj Srivastava

unread,

Apr 20, 2009, 7:10:07 PM4/20/09

to

On Mon, Apr 20 2009, Andreas Tille wrote:

> Hi,
>
> as promissed in the overlongish thread [1] I would like to
> sort out the details how we should enhance the consistency and
> parseability of our long descriptions in a poll. I agree that
> it is not a good idea to solve technical issues in a poll.
> But this is not about a technical issue. There is a fact that
> we need a defined structure (technical issue 1) to be able
> to parse the long descriptions (whatever library or self invented
> code will be used - technical issue 2). But the details how
> the structure should look like is more or less an aesthetical
> question (because several tools print the long descriptions
> in verbose mode) and so the question is about this aesthetics.
> If you want to discuss the technical issues please read all mails
> of the thread and continue discussing this (preferably with a
> new subject).
>
> Here is the URL of the poll:
>
> http://doodle.com/2bp8rrh3i35sr4s7
>

Frankly, a poll about micromanaging marks for each level of
unordered list does seem to be technical. It is also an implementation
detail, and invents our own convention, and options 1 & 2 would cause
many more packages to be changed than would just adopting markdown or
ReST. The fact that we need more packages changed for options 1 & 2
makes them technically inferior.

Is there anyone other than yourself who is actually unhappy
about markdown/ReST?

And should we have similar silly polls (which I have no
intention of promoting by voting in them) for emhpasis? for specifying
bold/italic text? For ordered lists? for a myriad of other useful
markup already familiar to people who know markdown and ReST?

Also, given that there are more output formats than html
available for markdown/ReST is another plus point; we might want other
output formats for Descriptions than plain ol' html.

manoj
--
Once the erosion of power begins, it has a momentum all its own.

Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Andreas Tille

unread,

Apr 21, 2009, 2:50:13 AM4/21/09

to

On Mon, 20 Apr 2009, Manoj Srivastava wrote:

> Frankly, a poll about micromanaging marks for each level of
> unordered list does seem to be technical. It is also an implementation
> detail, and invents our own convention,

I disagree.

> and options 1 & 2 would cause
> many more packages to be changed than would just adopting markdown or
> ReST.

Please specify what you mean by "adopting markdown or ReST" more
precisely.

> The fact that we need more packages changed for options 1 & 2
> makes them technically inferior.

Best practices do not imply a *need* to change anything.

> Is there anyone other than yourself who is actually unhappy
> about markdown/ReST?

Please remind me at which point I was unhappy about markdown/ReST.
I do not really remember that I was. I just try to enable better
input for any postprocessing.

> And should we have similar silly polls (which I have no
> intention of promoting by voting in them) for emhpasis? for specifying
> bold/italic text? For ordered lists? for a myriad of other useful
> markup already familiar to people who know markdown and ReST?

Manoj, please do not give me the feeling that my English is that bad
that I was unable to explain my point in my last mail[1]. If you would
confirm that you missunderstand me intentionally I would gain back
a small amount of trust in my English teacher.

> Also, given that there are more output formats than html
> available for markdown/ReST is another plus point; we might want other
> output formats for Descriptions than plain ol' html.

Hmmm, this paragraph confuses me even more. Going back, reading my
mails again, wondering why I spend so much time in explaining, ...

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/04/msg00713.html

--
http://fam-tille.de

Stefano Zacchiroli

unread,

Apr 21, 2009, 3:10:09 AM4/21/09

to

On Mon, Apr 20, 2009 at 11:24:42PM +0200, Andreas Tille wrote:
> Here is the URL of the poll:
> http://doodle.com/2bp8rrh3i35sr4s7

Heya, thanks for the poll.

Nevertheless, I think I got a bit lost in the discussion.
Following it, I had the impression that there was a quasi-agreement on
Markdown. Hence, I'm wondering what is the exact purpose of your
poll. With Markdown, you have alternative markers for denoting
bullet list (which is reasonable and consistent with what we do in
email), so what is the point of choosing one?

More generally (and given that even you are unsure about what you
didn't like of Markdown :-)), can you please you two explain why we
can't just say something like "long descriptions are paragraph
separated by dots on single lines; each paragraph is formatted
according to markdown syntax".

signature.asc

Andreas Tille

unread,

Apr 21, 2009, 3:40:06 AM4/21/09

to

On Tue, 21 Apr 2009, Stefano Zacchiroli wrote:

> Nevertheless, I think I got a bit lost in the discussion.
> Following it, I had the impression that there was a quasi-agreement on
> Markdown. Hence, I'm wondering what is the exact purpose of your
> poll. With Markdown, you have alternative markers for denoting
> bullet list (which is reasonable and consistent with what we do in
> email), so what is the point of choosing one?

I think my whole point was just blured. I never wanted to change
the format of long descriptions. I wanted to make it consistently
parseable. I consider it a good idea to use a formating library
and Manoj has mentioned that this is perfectly possible with the
current format provided you are doing some preprocessing while it
was shown as well that some consistent formatting has to be done
to do this reliable (see the links I gave in the poll).

> More generally (and given that even you are unsure about what you
> didn't like of Markdown :-)),

To say it explicitely: I like markdown and if this whole discussion
might have no outcome for the descriptions at least I have decided
to use it in my Blends tools.

> can you please you two explain why we
> can't just say something like "long descriptions are paragraph
> separated by dots on single lines; each paragraph is formatted
> according to markdown syntax".

I'm afraid that this leaves to much space for broken input as
the airport-utils example in the end of [1] shows. Manoj tried
to prove that markdown works perfectly - but it does not because
the indentantion of the original input is just wrong. I want to
fix THIS.

Moreover I see no reason to bind anybody to a certain library
like markdown. My experience has shown that people will insist
on their very own way to do things. Do you think apt, aptitude,
synaptic etc. developers would be happy if you start filing bug
reports to make them use markdown? So my suggestion leaves
perfectly space for using markdown as well as even raw text
output - which would look also better with consistent formatting.

Kind regards

Andreas.

[1] http://lists.debian.org/debian-devel/2009/04/msg00713.html

--

Don Armstrong

unread,

Apr 21, 2009, 4:10:14 AM4/21/09

to

On Tue, 21 Apr 2009, Andreas Tille wrote:
> I'm afraid that this leaves to much space for broken input as the
> airport-utils example in the end of [1] shows. Manoj tried to prove
> that markdown works perfectly - but it does not because the
> indentantion of the original input is just wrong. I want to fix
> THIS.

So long as we have an implementation which works for the vast majority
of cases we can file bugs to make it work for the few cases where it
doesn't. (Or the output can just be slightly broken in those cases;
it's not like that's a huge problem.)

> Moreover I see no reason to bind anybody to a certain library like
> markdown.

It's perfectly ok to punt the specification of the format to an
external library, at least initially. If enough people don't want to
use the markdown libraries, they'll either code up patches to policy
to codify the equivalent of markdown formatting in policy or write
equivalent code to markdown.

It seems to me like the next step is to go ahead and make a few
patches to packages.debian.org to change to a markdown (or equivalent)
formatting of the long description with whatever pre-processing is
necessary, see how well it works, submit a patch to policy to codify,
and move on with filing bugs for those bits that don't work properly.

Don Armstrong

--
Sentenced to two years hard labor (for sodomy), Oscar Wilde stood
handcuffed in driving rain waiting for transport to prison. "If this
is the way Queen Victoria treats her prisoners," he remarked, "she
doesn't deserve to have any."

http://www.donarmstrong.com http://rzlab.ucr.edu

Vincent Danjean

unread,

Apr 21, 2009, 4:40:12 AM4/21/09

to

Don Armstrong wrote:
> On Tue, 21 Apr 2009, Andreas Tille wrote:
>> Moreover I see no reason to bind anybody to a certain library like
>> markdown.
>
> It's perfectly ok to punt the specification of the format to an
> external library, at least initially. If enough people don't want to
> use the markdown libraries, they'll either code up patches to policy
> to codify the equivalent of markdown formatting in policy or write
> equivalent code to markdown.

As shown before in the other thread, markdown does not work with
the current long description : it needs pre-processing to add some
blank lines before each list.
So, I see Andreas proposition has a way to formalize what we want
to accept (that will allow most current long description to be good)
in order to be able to send them with a preprocessor to tools such
as markdown.

> It seems to me like the next step is to go ahead and make a few
> patches to packages.debian.org to change to a markdown (or equivalent)
> formatting of the long description with whatever pre-processing is
> necessary, see how well it works, submit a patch to policy to codify,
> and move on with filing bugs for those bits that don't work properly.

No current tools¹ works with current long descriptions.

If we add some preprocessor, I think we will hit the reason why, for
example, markdown requires this additional blank line. This means that
we will not support all what markdown support. It is not a problem but
it means that markdown specifications can not be used "as it".

Regards,
Vincent

¹: at least, I did not notice one in this discussion

--
Vincent Danjean GPG key ID 0x9D025E87 vdan...@debian.org
GPG key fingerprint: FC95 08A6 854D DB48 4B9A 8A94 0BF7 7867 9D02 5E87
Unofficial pacakges: http://moais.imag.fr/membres/vincent.danjean/deb.html
APT repo: deb http://perso.debian.org/~vdanjean/debian unstable main

Lars Wirzenius

unread,

Apr 21, 2009, 4:50:13 AM4/21/09

to

ti, 2009-04-21 kello 10:37 +0200, Vincent Danjean kirjoitti:
> As shown before in the other thread, markdown does not work with
> the current long description : it needs pre-processing to add some
> blank lines before each list.

That's true. Because the Packages and debian/control files are in
pseudo-RFC822 format, it needs to "escape" the long description a bit.
Otherwise an empty line between paragraphs in the description would
separate records in the Packages file, for example.

The two things that need to be done to un-escape are these: remove the
leading single space, and remove lines consisting of a single period.
This is really, really simple to do.

After that is done, the long description can be fed to something that
understands the markdown language. One such tool is the markdown command
line tool or library, but anything that interprets markdown-the-language
is fine.

(This really, really shouldn't need all the discussion it's had
already.)

Stefano Zacchiroli

unread,

Apr 21, 2009, 5:00:18 AM4/21/09

to

On Tue, Apr 21, 2009 at 10:37:00AM +0200, Vincent Danjean wrote:
> As shown before in the other thread, markdown does not work with
> the current long description : it needs pre-processing to add some
> blank lines before each list.

I've the impression that you didn't read my post, I might be wrong
though. Anyhow, what I said there is that *each single paragraph* can
be considered Markdown (and not the whole long description), using
single-dot lines as paragraph separators. With that convention, you
can use Markdown out of the box (on each paragraph) as long as each
list is on a single paragraph (which sounds like a reasonable
requirement to pose).

Anticipating a potential objection: nested lists do work without
needing "blank" lines to separate nesting levels; I've just tried that
out.

> If we add some preprocessor, I think we will hit the reason why, for
> example, markdown requires this additional blank line. This means that
> we will not support all what markdown support. It is not a problem but
> it means that markdown specifications can not be used "as it".

Note that the only pre-processor needed seems to be the one which
separate paragraphs using the single dot we already use. Such
pre-processor is most likely already implemented by all tools
processing long descriptions.

signature.asc

Andreas Tille

unread,

Apr 21, 2009, 5:30:10 AM4/21/09

to

On Tue, 21 Apr 2009, Stefano Zacchiroli wrote:

> Anticipating a potential objection: nested lists do work without
> needing "blank" lines to separate nesting levels; I've just tried that
> out.

... provided that lists are formated properly in the first place (keyword:
broken spacings). That's why I would like to give advises for the
spacing directly.

Kind regards

Andreas.

--
http://fam-tille.de

Lars Wirzenius

unread,

Apr 21, 2009, 5:40:10 AM4/21/09

to

ti, 2009-04-21 kello 11:27 +0200, Andreas Tille kirjoitti:
> On Tue, 21 Apr 2009, Stefano Zacchiroli wrote:
>
> > Anticipating a potential objection: nested lists do work without
> > needing "blank" lines to separate nesting levels; I've just tried that
> > out.
>
> ... provided that lists are formated properly in the first place (keyword:
> broken spacings). That's why I would like to give advises for the
> spacing directly.

"Properly" here should mean "anything that the markdown language says is
OK". The markdown language is remarkably relaxed about indentation. It
can handle it fine if one list is indented by two space, and other by
three. There seems to be no need for Debian to impose stricter
definitions.

Or am I misunderstanding what you are saying, Andreas?

liw@dorfl$ cat foo.mdwn
This is a normal paragraph.

* this is top level item
* this is second level item
* this is another second level item
* this is again a top level item

This is another paragraph.

* this is top level item
* this is second level item
* this is another second level item
* this is again a top level item
liw@dorfl$ markdown foo.mdwn
This is a normal paragraph.

<ul>
<li>this is top level item
<ul>
<li>this is second level item</li>
<li>this is another second level item</li>
</ul></li>
<li>this is again a top level item</li>
</ul>

This is another paragraph.

<ul>
<li>this is top level item
<ul>
<li>this is second level item</li>
<li>this is another second level item</li>
</ul></li>
<li>this is again a top level item</li>
</ul>
liw@dorfl$