Possible contrib.humanize addition

5 views
Skip to first unread message

harrym

unread,
Jan 5, 2010, 4:24:13 PM1/5/10
to Django developers
I'm working a templatetag that determines whether to use 'a' or 'an'
in front of English words. My particular use case for this is in a
tumblelog app I'm developing - many different types of entry may be
added (link, html, quote, etc), and I'm linking to the 'Add a[n]
<type> entry' pages by iterating over the different types. Would this
be considered a useful addition to contrib.humanize?

The two main reasons against it I see are that firstly, it only works
for English words, so would be of little use to developers using
foreign languages, and secondly, it perhaps wouldn't be as widely used
as the other filters in there.

Many thanks,

Harry

Russell Keith-Magee

unread,
Jan 5, 2010, 10:45:23 PM1/5/10
to django-d...@googlegroups.com

It sounds like a potentially interesting addition to contrib.humanize,
but you have hit both of the objections that I would raise.

The foreign language limitation is particularly important - if we're
going to introduce a tag like this, then it should be able to be used
for languages other than English. If you present some research to
demonstrate how this tag could/would work for non-English languages,
it would be a lot more compelling.

Yours,
Russ Magee %-)

harrym

unread,
Jan 5, 2010, 10:54:49 PM1/5/10
to Django developers
Thanks for your reply - I'll have a look into how this would work with
other languages and get back to you if it looks like it would work
easily with other languages.

Regards,
Harry

On Jan 6, 3:45 am, Russell Keith-Magee <freakboy3...@gmail.com> wrote:

Luke Plant

unread,
Jan 6, 2010, 7:59:15 AM1/6/10
to django-d...@googlegroups.com
On Tuesday 05 January 2010 21:24:13 harrym wrote:
> I'm working a templatetag that determines whether to use 'a' or
> 'an' in front of English words. My particular use case for this is
> in a tumblelog app I'm developing - many different types of entry
> may be added (link, html, quote, etc), and I'm linking to the 'Add
> a[n] <type> entry' pages by iterating over the different types.
> Would this be considered a useful addition to contrib.humanize?

Hmm, can it handle the following?

an honest man
a history book
an historical book (debatable)

My gut instinct is that it's not possible to work this out
programmatically. When it comes to other languages, I imagine it's
going to be even harder (if it's possible to get harder than
'impossible'), because you have things like gender and case to worry
about, which certainly cannot be worked out by an algorithm.

To give some examples, in French, the choice is between 'un' and
'une', depending on whether the word is masculine or feminine. In
Greek, the choice is between ̔εις, ̔ενα, ̔ενος, ̔ενι, μια, μιαν,
μιας, μια, ̔εν, ̔εν, ̔ενος, ̔ενι, depending on whether the word is
masculine, feminine or neuter, and in nominative, accusative, genitive
or dative case. Although in many cases you would probably omit the
article altogether - the above words often mean "one" rather than "a".
(That's NT Koine Greek, it might be different/simpler/more complicated
in modern Greek).

I imagine there are plenty of languages where this gets even worse,
violating almost every assumption you don't even know you are making
(like whether the article comes before or after or in the middle, or
exists at all, etc. etc.)

To summarise: if I were you, I would give up now.

Luke

--
"Mediocrity: It takes a lot less time, and most people don't
realise until it's too late." (despair.com)

Luke Plant || http://lukeplant.me.uk/

sago

unread,
Jan 6, 2010, 9:17:49 AM1/6/10
to Django developers
> Hmm, can it handle the following?
>
>  an honest man
>  a history book
>  an historical book (debatable)

It can't, the rules for the indefinite article around 'h' are complex
and depend on the etymology of the word used. To add complexity the
lexicographic rules are often different to the rules for speech, and
UK rules differ from US rules (and possibly Oz too, but I don't
know).

> If you present some research to
> demonstrate how this tag could/would work for non-English languages,
> it would be a lot more compelling.

That's not going to work, in any meaningful sense. That peculiarity of
the article is highly English-specific. The generalization would
surely be something like

{% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
{{ form2 }} {{ word }}{% endif %}

where the regex is language and context dependent. There are various
regex replacement filters/tags out in the djangosphere. Could you use
one of them?

> (That's NT Koine Greek, it might be different/simpler/more complicated
> in modern Greek).

What is it about Django and NT scholars - have you come across James
Tauber (of Pinax fame?)

Ian.

James Bennett

unread,
Jan 6, 2010, 9:37:58 AM1/6/10
to django-d...@googlegroups.com
On Wed, Jan 6, 2010 at 8:17 AM, sago <idmill...@googlemail.com> wrote:
> What is it about Django and NT scholars - have you come across James
> Tauber (of Pinax fame?)

There are at least three Django committers who can list one or another
ancient Greek dialect among their studies. Not sure why that is, but
it does make for fun conversation over drinks.


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

harrym

unread,
Jan 6, 2010, 9:49:12 AM1/6/10
to Django developers
The code I've got so far works pretty well - I've tested it on some
medium-sized corpora and the only times the expected result was
different from the actual result was when the corpus was wrong. The
code works by first checking a few specific rules for numbers and
acromyns, then checking against a few exceptional cases (word
prefixes), then checking whether the word starts with a vowel. Most of
the rules came from some Perl code I found a while a go - just ported
them over to Python.

But I agree that this would be far too difficult ( / impossible) to
make multi-lingual so is perhaps not appropriate for inclusion in
Django.

Harry

Hanne Moa

unread,
Jan 6, 2010, 9:56:51 AM1/6/10
to django-d...@googlegroups.com
2010/1/6 sago <idmill...@googlemail.com>:

>> If you present some research to
>> demonstrate how this tag could/would work for non-English languages,
>> it would be a lot more compelling.
>
> That's not going to work, in any meaningful sense. That peculiarity of
> the article is highly English-specific. The generalization would
> surely be something like
>
> {% if /some-regex/.matches(word) %}{{ form1 }} {{ word }}{% else %}
> {{ form2 }} {{ word }}{% endif %}

Disclaimer: I have a masters degree in Computational Linguistics. Ths
is a simplified account of "last year of bachelor"-stuff:

Human language cannot (mathematically proven) be modelled by a mere
regexp, as human language is not only context-free, (needing a full
parser) but context-sensitive (needing parsers we don't really have
yet). Nice, yes?

It cannot go in humanize but it could go in localflavor for English.
It would be necessary with a stemmer and a replaceable wordlist
though, as what words get "an" and what get "a" not only depends on
country but also on specific publishing styles - and all of this has a
tendency to change over time.


HM

Chuck Harmston

unread,
Jan 6, 2010, 10:14:01 AM1/6/10
to django-d...@googlegroups.com
More of an academic question, as it likely isn't a feasible solution for Django, but might a soundex solve this problem? Best I can tell, rules for articles, without exception, are based on the pronunciation of the following word..

Of course, phonology can be regional, subjective, and unpredictable. "Wind" (the flow of gases) and "wind" (circular weaving) are identical to a template tag but have different vowel sounds. The "a" sound in "bag" is pronounced much differently in northern Minnesota (where it's bay-g) than they do in Baltimore.

This feels unsolvable.


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.




SmileyChris

unread,
Jan 6, 2010, 3:23:03 PM1/6/10
to Django developers

Here's a snippet I wrote a while back you may want to check out too:
www.djangosnippets.org/snippets/1519/

Reply all
Reply to author
Forward
0 new messages