Re: [Django] #8391: slugify template filter poorly encodes non-English strings

Django

unread,

Jul 12, 2011, 9:32:14 AM7/12/11

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
-------------------------------------+-------------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: reopened
Milestone: | Component: Template system
Version: SVN | Severity: Normal
Resolution: | Keywords:
Triage Stage: Design | Has patch: 0
decision needed | Needs tests: 0
Needs documentation: 0 | Easy pickings: 0
Patch needs improvement: 0 |
UI/UX: 0 |
-------------------------------------+-------------------------------------
Changes (by mitar):

* cc: mmitar@… (added)
* ui_ux: => 0
* easy: => 0

Comment:

I have added made [https://bitbucket.org/mitar/django-
missing/src/ebd814ed834b/missing/templatetags/url_tags.py slugify2
function] which first downcodes and then translates to slug. It behaves
exactly the same as its [https://bitbucket.org/mitar/django-
missing/src/ebd814ed834b/missing/static/missing/urlify2.js JavaScript
counterpart]. So now it is possible to have both in Python and JavaScript
same behavior.

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:31>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Sep 15, 2011, 11:56:15 AM9/15/11

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings

------------------------------------+---------------------------------

Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------+---------------------------------
Changes (by ptone):

* stage: Design decision needed => Accepted

Comment:

see #16853 for a Turkish case

Seems that there have been no objections to the downcode then slugify
approach.

This seems ready for someone to take a shot at implementing that approach
in a patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:32>

Django

unread,

Sep 15, 2011, 12:33:57 PM9/15/11

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
------------------------------------+---------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: reopened
Milestone: | Component: Template system
Version: SVN | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------+---------------------------------

Comment (by mitar):

You can take the above slugify2 function.

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:33>

Django

unread,

Sep 15, 2011, 1:03:50 PM9/15/11

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
------------------------------------+---------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: reopened
Milestone: | Component: Template system
Version: SVN | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------+---------------------------------

Comment (by yasar11732@…):

Above slugify2 function won't fix #16853.

{{{
# -*- coding: utf-8 -*-
import sys
import re

from django.utils import encoding

TURKISH_MAP = {
u'ş':'s', u'Ş':'S', u'ı':'i', u'İ':'I', u'ç':'c', u'Ç':'C', u'ü':'u',
u'Ü':'U',
u'ö':'o', u'Ö':'O', u'ğ':'g', u'Ğ':'G'
}

ALL_DOWNCODE_MAPS = [
TURKISH_MAP,
]

class Downcoder(object):
map = {}
regex = None

def __init__(self):
self.map = {}
chars = u''

for lookup in ALL_DOWNCODE_MAPS:
for c, l in lookup.items():
self.map[c] = l
chars += encoding.force_unicode(c)

self.regex = re.compile(ur'[' + chars + ']|[^' + chars + ']+',
re.U)

downcoder = Downcoder()

def downcode(value):
downcoded = u''
pieces = downcoder.regex.findall(value)

if pieces:
for p in pieces:
mapped = downcoder.map.get(p)
if mapped:
downcoded += mapped
else:
downcoded += p
else:
downcoded = value

return value

def slugify2(value):
"""
Normalizes string, converts to lowercase, removes non-alpha
characters,
and converts spaces to hyphens.
"""
import unicodedata
value = downcode(value)
value = unicodedata.normalize('NFD', value).encode('ascii', 'ignore')
value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
return re.sub('[-\s]+', '-', value)

print(slugify2(u"Işık ılık süt iç"))

}}}

This prints "isk-lk-sut-ic", but expected value is, "isik-ilik-sut-ic".

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:34>

Django

unread,

Sep 15, 2011, 3:59:40 PM9/15/11

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
------------------------------------+---------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: reopened
Milestone: | Component: Template system
Version: SVN | Severity: Normal
Resolution: | Keywords:
Triage Stage: Accepted | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------+---------------------------------

Comment (by mitar):

Ups. That was a bug. [https://bitbucket.org/mitar/django-
missing/src/d17bac5f8a5a/missing/templatetags/url_tags.py Fixed version of
slugify2].

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:35>

Django

unread,

Aug 20, 2014, 3:53:37 AM8/20/14

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings

---------------------------------+------------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: new
Component: Template system | Version: master

---------------------------------+------------------------------------

Comment (by claudep):

Note that slugify2 is now here: https://github.com/mitar/django-
missing/blob/master/missing/templatetags/url_tags.py

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:37>

Django

unread,

Aug 31, 2014, 3:28:16 PM8/31/14

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
---------------------------------+------------------------------------
Reporter: bjornkri | Owner: nobody

Type: Bug | Status: closed

Component: Template system | Version: master

Severity: Normal | Resolution: wontfix

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------

Changes (by aaugustin):

* status: new => closed
* resolution: => wontfix

Comment:

There's obviously more than one way to achieve slugification, depending on
your tastes and constraints.

If we try to be smart, we'll get dozens and dozens of tickets from people
who want to be smarter -- see the urlize filter for an example.

Django's implementation has the advantage of being simple and relying only
on the stdlib. Pretty good solutions are available externally.

The drawbacks of implementing something more complicated outweigh the
advantages at this stage.

--
Ticket URL: <https://code.djangoproject.com/ticket/8391#comment:38>

Django

unread,

Sep 25, 2014, 10:57:55 PM9/25/14

to django-...@googlegroups.com

#8391: slugify template filter poorly encodes non-English strings
---------------------------------+------------------------------------
Reporter: bjornkri | Owner: nobody
Type: Bug | Status: closed
Component: Template system | Version: master
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------