Proposal: Allow stopwords in slugs generated by ModelAdmin.prepopulated_fields

199 views
Skip to first unread message

Andy Chosak

unread,
Apr 8, 2020, 4:35:57 PM4/8/20
to Django developers (Contributions to Django itself)
Automatic slug generation in ModelAdmin via prepopulated_fields uses a urlify.js file which, among other behaviors, removes certain stop words from the slug. For example, a string like "To be or not to be, that is the question" will generate a slug "be-or-not-be-question", not "to-be-or-not-to-be-that-is-the-question" as one might expect. I’d like to solicit feedback on the idea of removing this logic so that slugs can contain these words.

For reference, the current list is: a, an, as, at, before, but, by, for, from, is, in, into, like, of, off, on, onto, per, since, than, the, this, that, to, up, via, with.

Django ticket #30538 mentions this behavior as part of a more general comparison between urlify.js and Python slugify. It was closed as wontfix due to reasons of backwards compatibility. Per the triaging guidelines, I’m making this post to solicit feedback on the more specific question of addressing stopword removal in the JS code only -- not to try to address any other differences in behavior between these two methods. There’s been quite a bit of discussion on generating slugs for non-English languages (for example #2282), and this post is not intended to reopen that discussion.

The current list of stopwords being removed seems to have been the same since at least 2005 (the earliest code I can find including this logic). Some of these words feel a little unexpected, for example “before” and “since”. After 15 years it seems reasonable to revisit the list and consider whether it still makes sense.

Was removal of these words introduced for SEO reasons? If so, is this still a recommended default behavior? In 2020, search engines like Google seem smart enough to interpret them properly. Here's an arbitrary page that discusses this and includes a much longer list of what might be considered stopwords. As another datapoint, the popular WordPress Yoast SEO plugin used to remove stopwords, but stopped doing so a few years back.

Potentially outdated SEO concerns aside, does this behavior still align well with the needs and desires of Django users? Is this something this community would be open to revisiting? Thanks for your consideration.

(One minor point on language support: allowing these words would help to resolve at least some of the unequal treatment given to English over other languages, for example #12905. See also wagtail#4899, from which much of this post has been copied, for an example of how this logic impacts a Django-based CMS.)

Adam Johnson

unread,
Apr 9, 2020, 1:41:30 PM4/9/20
to django-d...@googlegroups.com
I for one am quite surprised to learn the admin has this behaviour.

I'm extra surprised it assumes it's in English if only ASCII letters are used. This is quite a naïve assumption 😂 (See what I did in that sentence?)

Was removal of these words introduced for SEO reasons?

Seems likely.

Personally, for the reasons you've presented I think it would make sense to remove this behaviour. We can probably document how to wrap window.URLify to preserve the old behaviour.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/fb6c9596-951d-4102-91b5-b5fd9c8c6340%40googlegroups.com.


--
Adam

Andy Chosak

unread,
Apr 23, 2020, 4:21:32 PM4/23/20
to Django developers (Contributions to Django itself)
Thanks, Adam, for your reply. I've opened a ticket at https://code.djangoproject.com/ticket/31511, which includes a link to a PR that makes this change.

Any advice on documenting how to wrap window.URLify?

Thanks,
Andy
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam

Adam Johnson

unread,
Apr 24, 2020, 5:19:01 AM4/24/20
to django-d...@googlegroups.com
I agree with Mariusz on the ticket/PR that my answer alone isn't enough impetus to make this change. Hopefully someone more involved in i18n can weigh in.

Although it changes the order of operations, I think this still works to achieve the same behaviour. This snippet can be run at the end of a page to wrap window.URLify.

(function () {
    const originalURLify = window.URLify;
   
    function URLify(s, num_chars, allowUnicode) {
        let result = originalURLify(s, num_chars, allowUnicode);
       
        const hadUnicodeChars = /[^\u0000-\u007f]/.test(s);
        // Remove English words only if the string contains ASCII (English)
        // characters.
        if (!hasUnicodeChars) {
            const removeList = [
                "a", "an", "as", "at", "before", "but", "by", "for", "from",
                "is", "in", "into", "like", "of", "off", "on", "onto", "per",
                "since", "than", "the", "this", "that", "to", "up", "via",
                "with"
            ];
            const r = new RegExp('\\b(' + removeList.join('|') + ')\\b', 'gi');
            result = result.replace(r, '');
        }
        return result;
    };
   
    window.URLify = newURlify;
})();

To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8f1e9719-da61-421a-97a1-9313ee0dd8db%40googlegroups.com.


--
Adam

Scott Cranfill

unread,
May 15, 2020, 6:44:12 PM5/15/20
to Django developers (Contributions to Django itself)
Does anyone else have an opinion on whether or not we should still be removing these stopwords?

Hopefully someone more involved in i18n can weigh in.

I'm not sure if there are any i18n concerns here. In fact, ceasing this practice removes the impetus for the recurring issues being raised about how this practice negatively affects the experience for users of other languages, or doesn't remove words in their language, etc.

Thanks for the suggested code, Adam. On the topic of deprecation, in general: Andy I weren't really sure how to approach that for a JavaScript-only change. We can't throw deprecation warnings in the Django console like we could if we were talking about Python code, can we? I could see adding some more aggressive messaging, maybe even in the Admin?

Tim Graham

unread,
May 15, 2020, 7:41:25 PM5/15/20
to Django developers (Contributions to Django itself)
I'm in favor of the change. It seems to me that most slugs I see these days have stop words in them and they read better because of that. I don't think JavaScript warnings would be helpful. A release note is sufficient. Admin users still get a preview of the slug and can edit it if needed.

אורי

unread,
May 15, 2020, 10:05:47 PM5/15/20
to Django developers (Contributions to Django itself)
I very much prefer a slug "to-be-or-not-to-be-that-is-the-question" than "be-or-not-be-question" (which doesn't make sense).


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Adam Johnson

unread,
May 16, 2020, 5:25:29 AM5/16/20
to django-d...@googlegroups.com
There's a bit more support now, and there have been no opinions against it.

Because of this I've reopened the older closed ticket #11157: https://code.djangoproject.com/ticket/11157 . Andy/Scott, I hope you can retarget your PR as per my comment there. Thanks!

Admin users still get a preview of the slug and can edit it if needed.

Agree, no need for deprecation warnings. This behaviour is in front of users with an easy override.



--
Adam

Scott Cranfill

unread,
May 20, 2020, 12:51:56 PM5/20/20
to Django developers (Contributions to Django itself)
Thanks for the additional feedback, folks!

We have opened a fresh PR, rebased on the latest master and referencing #11157, at https://github.com/django/django/pull/12945

Best,
Scott


On Saturday, May 16, 2020 at 5:25:29 AM UTC-4, Adam Johnson wrote:
There's a bit more support now, and there have been no opinions against it.

Because of this I've reopened the older closed ticket #11157: https://code.djangoproject.com/ticket/11157 . Andy/Scott, I hope you can retarget your PR as per my comment there. Thanks!

Admin users still get a preview of the slug and can edit it if needed.

Agree, no need for deprecation warnings. This behaviour is in front of users with an easy override.

‪On Sat, 16 May 2020 at 03:04, ‫אורי‬‎ <u...@speedy.net> wrote:‬
I very much prefer a slug "to-be-or-not-to-be-that-is-the-question" than "be-or-not-be-question" (which doesn't make sense).


On Wed, Apr 8, 2020 at 11:35 PM Andy Chosak <cho...@gmail.com> wrote:
Automatic slug generation in ModelAdmin via prepopulated_fields uses a urlify.js file which, among other behaviors, removes certain stop words from the slug. For example, a string like "To be or not to be, that is the question" will generate a slug "be-or-not-be-question", not "to-be-or-not-to-be-that-is-the-question" as one might expect. I’d like to solicit feedback on the idea of removing this logic so that slugs can contain these words.

For reference, the current list is: a, an, as, at, before, but, by, for, from, is, in, into, like, of, off, on, onto, per, since, than, the, this, that, to, up, via, with.

Django ticket #30538 mentions this behavior as part of a more general comparison between urlify.js and Python slugify. It was closed as wontfix due to reasons of backwards compatibility. Per the triaging guidelines, I’m making this post to solicit feedback on the more specific question of addressing stopword removal in the JS code only -- not to try to address any other differences in behavior between these two methods. There’s been quite a bit of discussion on generating slugs for non-English languages (for example #2282), and this post is not intended to reopen that discussion.

The current list of stopwords being removed seems to have been the same since at least 2005 (the earliest code I can find including this logic). Some of these words feel a little unexpected, for example “before” and “since”. After 15 years it seems reasonable to revisit the list and consider whether it still makes sense.

Was removal of these words introduced for SEO reasons? If so, is this still a recommended default behavior? In 2020, search engines like Google seem smart enough to interpret them properly. Here's an arbitrary page that discusses this and includes a much longer list of what might be considered stopwords. As another datapoint, the popular WordPress Yoast SEO plugin used to remove stopwords, but stopped doing so a few years back.

Potentially outdated SEO concerns aside, does this behavior still align well with the needs and desires of Django users? Is this something this community would be open to revisiting? Thanks for your consideration.

(One minor point on language support: allowing these words would help to resolve at least some of the unequal treatment given to English over other languages, for example #12905. See also wagtail#4899, from which much of this post has been copied, for an example of how this logic impacts a Django-based CMS.)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam

Scott Cranfill

unread,
May 27, 2020, 8:55:35 AM5/27/20
to Django developers (Contributions to Django itself)
The PR was merged! Thanks everyone for your input and assistance.
Reply all
Reply to author
Forward
0 new messages