-----------------------------------------
Short summary: [patch] Generating slug for words with accents
Full description: In my language (czech) there are a lot of characters
with accents. When I type titles in admin forms, the slug field
autogenerated values are incorect (for example:title="sršeň",
autogenerated slug="sre"; correct is "srsen"). So I wrote little patch
to urlify.js code, which first convert all accents chars to their ASCII
equivalent. For now, my code respect only czech accents. I will be glad,
If some others of you add your own national characters.
Priority: normal
Component: Admin interface
Severity: normal
Version: SVN
Keywords: slug urlify
-----------------------------------------
Trac error
Trac detected an internal error:
Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 299,
in dispatch_request
dispatcher.dispatch(req)
File "/usr/lib/python2.3/site-packages/trac/web/main.py", line 189,
in dispatch
resp = chosen_handler.process_request(req)
File "/usr/lib/python2.3/site-packages/trac/ticket/web_ui.py", line
104, in process_request
self._do_create(req, db)
File "/usr/lib/python2.3/site-packages/trac/ticket/web_ui.py", line
163, in _do_create
self._validate_ticket(req, ticket)
File "/usr/lib/python2.3/site-packages/trac/ticket/web_ui.py", line
47, in _validate_ticket
for field, message in manipulator.validate_ticket(req, ticket):
File "build/bdist.linux-i686/egg/tracspamfilter/adapters.py", line
40, in validate_ticket
File "build/bdist.linux-i686/egg/tracspamfilter/api.py", line 74, in test
herror: (1, 'Unknown host')
(I was trying to submit ticket from FreeBSD 5.4 system & Firefox 1.5.0.1)
Regards
Michal
Is it a hornet?
> autogenerated slug="sre"; correct is "srsen").
That's right. I've been experiencing the same thing.
> I will be glad, If some others of you add your own national characters.
I'm attaching a modified patch with Polish characters added.
--
Maciej Bliziński
http://automatthias.wordpress.com
Yes it is, my Slavic brother :)
>
>> autogenerated slug="sre"; correct is "srsen").
>
> That's right. I've been experiencing the same thing.
>
>> I will be glad, If some others of you add your own national characters.
>
> I'm attaching a modified patch with Polish characters added.
>
Thank you. I also added a few of Slovak characters (Czech and Slovak was
brothers too, and they have similar alphabet).
>
>
I looked at the Latin Unicode article in Wikipedia:
http://en.wikipedia.org/wiki/Latin_Unicode
There are characters with accents have I never seen before... Vietnamese
alphabet, for instance, has glyphs which are Latin characters with
unusual accents, for example: ã, or even with two accents: ặ
For most of the characters, it's pretty easy to remove the accents.
However, some characters are mysterious: should Ƨ be translated to S?
I don't know. So I just deleted them from the accent removal list.
I'm including a patch with "from" and "to" constants extended with all
the characters I found on Wikipedia that seemed to be of any use. This
should cover all the Slavic countries except those which use cyrylic
alphabet.
One thing... some characters want to be translated into _two_ ASCII
characters, for example Æ to AE. This would require a different data
structure. In present form, I just entered E. The same with ß which
I replaced with single S.
Regards,
Maciej
Nice work Maciej :)
When I wrote my first post, I typed: "I will be glad, If some others of
you add your own national characters."
Each nationality have its own specific characters and rules for them, so
I think that somebody from this countries should check your version of
patch.
> I'm including a patch with "from" and "to" constants extended with all
> the characters I found on Wikipedia that seemed to be of any use. This
> should cover all the Slavic countries except those which use cyrylic
> alphabet.
>
> One thing... some characters want to be translated into _two_ ASCII
> characters, for example Æ to AE. This would require a different data
> structure. In present form, I just entered E. The same with ß which
> I replaced with single S.
Maybe we could try wrote one new function, which will translate one
unicode to adequate 2 ascii chars? (translate accent chars will be then
done in two steps: 1-replAccents, 2-new function)
>
> Regards,
> Maciej
>
>
>
> ------------------------------------------------------------------------
>
> Index: django/contrib/admin/media/js/urlify.js
> ===================================================================
> --- django/contrib/admin/media/js/urlify.js (revision 3618)
> +++ django/contrib/admin/media/js/urlify.js (working copy)
> @@ -1,4 +1,43 @@
> +function replAccents(s)
> +{
> + // Replacement lists based on article in Wikipedia,
> + // http://en.wikipedia.org/wiki/Latin_Unicode
> + // from and to strings must have same number of characters
> + var from = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîï';
> + var to = 'AAAAAAECEEEEIIIIDNOOOOOOUUUUYSaaaaaaaceeeeiiii';
> + from += 'ñòóôõöøùúûüýÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģ';
> + to += 'noooooouuuuyyaaaaaaccccccccddddeeeeeeeeeegggggggg';
> + from += 'ĤĥĦħĨĩĪīĬĭĮįİıĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘř';
> + to += 'hhhhiiiiiiiiiijjkkkllllllllllnnnnnnnnnoooooooorrrrrr';
> + from += 'ŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƂƃƄƅƇƈƉƊƐƑƒƓƔ';
> + to += 'ssssssssttttttuuuuuuuuuuuuwwyyyzzzzzzfbbbbbccddeffgv';
> + from += 'ƖƗƘƙƚƝƞƟƠƤƦƫƬƭƮƯưƱƲƳƴƵƶǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩ';
> + to += 'likklnnoopettttuuuuyyzzaaiioouuuuuuuuuueaaaaeeggggkk';
> + from += 'ǪǫǬǭǰǴǵǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȞȟȤȥȦȧȨȩ';
> + to += 'oooojggpnnaaeeooaaaaeeeeiiiioooorrrruuuusstthhzzaaee';
> + from += 'ȪȫȬȭȮȯȰȱȲȳḀḁḂḃḄḅḆḇḈḉḊḋḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝḞḟḠḡḢḣḤḥḦḧḨḩḪḫ';
> + to += 'ooooooooyyaabbbbbbccddddddddddeeeeeeeeeeffgghhhhhhhhhh';
> + from += 'ḬḭḮḯḰḱḲḳḴḵḶḷḸḹḺḻḼḽḾḿṀṁṂṃṄṅṆṇṈṉṊṋṌṍṎṏṐṑṒṓṔṕṖṗṘṙṚṛṜṝṞṟ';
> + to += 'iiiikkkkkkllllllllmmmmmmnnnnnnnnoooooooopppprrrrrrrr';
> + from += 'ṠṡṢṣṤṥṦṧṨṩṪṫṬṭṮṯṰṱṲṳṴṵṶṷṸṹṺṻṼṽṾṿẀẁẂẃẄẅẆẇẈẉẊẋẌẍẎẏẐẑẒẓẔẕ';
> + to += 'ssssssssssttttttttuuuuuuuuuuvvvvwwwwwwwwwwxxxxxyzzzzzz';
> + from += 'ẖẗẘẙẚẛẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾếỀềỂểỄễỆệỈỉỊị';
> + to += 'htwyafaaaaaaaaaaaaaaaaaaaaaaaaeeeeeeeeeeeeeeeeiiii';
> + from += 'ỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợỤụỦủỨứỪừỬửỮữỰựỲỳỴỵỶỷỸỹ';
> + to += 'oooooooooooooooooooooooouuuuuuuuuuuuuuyyyyyyyy';
> +
> + for (var i = 0; i != s.length; i++) {
> + var x = from.indexOf(s[i]);
> + if (x != -1) {
> + r = new RegExp(from[x], 'g');
> + s = s.replace(r, to[x]);
> + }
> + }
> + return s;
> +}
> +
> function URLify(s, num_chars) {
> + s = replAccents(s);
> // changes, e.g., "Petty theft" to "petty_theft"
> // remove all these words from the string before urlifying
> removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
>
> I'm including a patch with "from" and "to" constants extended with all
> the characters I found on Wikipedia that seemed to be of any use. This
> should cover all the Slavic countries except those which use cyrylic
> alphabet.
Was this page commit to svn version of django, as in 0.95 I was facing
this issue with french accents.
Nicolas
Regards,
Aidas Bendoraitis [aka Archatas]
but «ü» in Spanish should be just «u» (as in pingüino -> pinguino).
--
John Lenton (jle...@gmail.com) -- Random fortune:
The trouble with a lot of self-made men is that they worship their creator.
I think that's everything in spanish ;)
I am just thinking whether slugify function should correspond to the
chosen language or not. It seems that there are not many differences
among stripped accented letters in different languages, so maybe it
should be left the same. Whatever we decide, ß should still be
translated to ss, but not S. What is the opinion of the others?
And also, if we are already adding localizations to the slugify
function, should't greek, russian, and other non-latin alphabets also
be translated to latin charset?
Regards,
Aidas Bendoraitis [aka Archatas]