[Django] #34325: PercentRank confusion

8 views
Skip to first unread message

Django

unread,
Feb 9, 2023, 8:39:23 AM2/9/23
to django-...@googlegroups.com
#34325: PercentRank confusion
-----------------------------------------+------------------------
Reporter: dvg | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 4.1
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
The [https://docs.djangoproject.com/en/4.1/ref/models/database-
functions/#percentrank documentation for the PercentRank window function]
says:

Computes the '''''percentile rank''''' of the rows in the frame clause.
This computation is equivalent to evaluating:
{{{
(rank - 1) / (total rows - 1)
}}}

(my emphasis)

However, I'm not so sure
"[https://en.wikipedia.org/w/index.php?title=Percentile&oldid=1114275310
percentile] rank" is the correct term.

If you look up the (statistical) term "percentile rank" online, you'll
find various definitions,
[https://en.wikipedia.org/w/index.php?title=Percentile_rank&oldid=1136815121
ranging from]

{{{
(CF - 0.5 * F) / N
}}}

where CF—the cumulative frequency—is the count of all scores less than
or equal to the score of interest, F is the frequency for the score of
interest, and N is the number of scores in the distribution.

[https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Descriptive-
Statistics/Measures-of-Position/Percentiles-and-Percentile-Rank/index.html
to something like]

{{{
<number of values less than the score of interest> / <total number of
values in the data set>
}}}

However, none exactly matches the definition in the Django docs.

Note also that the documentation for the `percent_rank` function in the
[https://www.sqlite.org/windowfunctions.html#built_in_window_functions
SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
PostgreSQL] database backends does '''not''' mention "percentile rank".
Instead, they use the term "relative rank."

To prevent confusion, wouldn't it be better to use the same terminology as
the database backends?

--
Ticket URL: <https://code.djangoproject.com/ticket/34325>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 9, 2023, 9:27:15 AM2/9/23
to django-...@googlegroups.com
#34325: PercentRank confusion
-------------------------------+--------------------------------------

Reporter: dvg | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 4.1
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by dvg:

Old description:

> The [https://docs.djangoproject.com/en/4.1/ref/models/database-
> functions/#percentrank documentation for the PercentRank window function]
> says:
>
> Computes the '''''percentile rank''''' of the rows in the frame clause.
> This computation is equivalent to evaluating:
> {{{
> (rank - 1) / (total rows - 1)
> }}}
>
> (my emphasis)
>
> However, I'm not so sure
> "[https://en.wikipedia.org/w/index.php?title=Percentile&oldid=1114275310
> percentile] rank" is the correct term.
>
> If you look up the (statistical) term "percentile rank" online, you'll
> find various definitions,
> [https://en.wikipedia.org/w/index.php?title=Percentile_rank&oldid=1136815121
> ranging from]
>
> {{{
> (CF - 0.5 * F) / N
> }}}
>
> where CF—the cumulative frequency—is the count of all scores less than
> or equal to the score of interest, F is the frequency for the score of
> interest, and N is the number of scores in the distribution.
>
> [https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Descriptive-
> Statistics/Measures-of-Position/Percentiles-and-Percentile-

> Rank/index.html to something like]


>
> {{{
> <number of values less than the score of interest> / <total number of
> values in the data set>
> }}}
>
> However, none exactly matches the definition in the Django docs.
>
> Note also that the documentation for the `percent_rank` function in the
> [https://www.sqlite.org/windowfunctions.html#built_in_window_functions
> SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
> PostgreSQL] database backends does '''not''' mention "percentile rank".
> Instead, they use the term "relative rank."
>
> To prevent confusion, wouldn't it be better to use the same terminology
> as the database backends?

New description:

The [https://docs.djangoproject.com/en/4.1/ref/models/database-
functions/#percentrank documentation for the PercentRank window function]
says:

Computes the '''''percentile rank''''' of the rows in the frame clause.
This computation is equivalent to evaluating:
{{{
(rank - 1) / (total rows - 1)
}}}

(my emphasis)

However, I'm not so sure
"[https://en.wikipedia.org/w/index.php?title=Percentile&oldid=1114275310
percentile] rank" is the correct term.

If you look up the (statistical) term "percentile rank" online, you'll
find various definitions,
[https://en.wikipedia.org/w/index.php?title=Percentile_rank&oldid=1136815121
ranging from]

{{{
(CF - 0.5 * F) / N
}}}

where CF—the cumulative frequency—is the count of all scores less than
or equal to the score of interest, F is the frequency for the score of
interest, and N is the number of scores in the distribution.

[https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Descriptive-
Statistics/Measures-of-Position/Percentiles-and-Percentile-Rank/index.html
to something like]

{{{
<number of values less than the score of interest> / <total number of
values in the data set>
}}}

(equivalent to `(CF - F) / N`)

Both of these definitions are also used e.g. by
[https://github.com/scipy/scipy/blob/dde50595862a4f9cede24b5d1c86935c30f1f88a/scipy/stats/_stats_py.py#L2182
scipy].

The latter definition is similar to that in the Django docs, but still
subtly different in the denominator.

Note also that the documentation for the `percent_rank` function in the
[https://www.sqlite.org/windowfunctions.html#built_in_window_functions
SQLite] and [https://www.postgresql.org/docs/15/functions-window.html

PostgreSQL] database backends does '''not''' mention "percentile rank" at
all. Instead, they use the term '''"relative rank."'''

To prevent confusion, wouldn't it be better to use the same terminology as
the database backends?

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34325#comment:1>

Django

unread,
Feb 9, 2023, 9:27:50 AM2/9/23
to django-...@googlegroups.com

Old description:

New description:

(my emphasis)

Similar definitions are also used e.g. by
[https://github.com/scipy/scipy/blob/dde50595862a4f9cede24b5d1c86935c30f1f88a/scipy/stats/_stats_py.py#L2182
scipy].

The latter definition is similar to that in the Django docs, but still
subtly different in the denominator.

Note also that the documentation for the `percent_rank` function in the
[https://www.sqlite.org/windowfunctions.html#built_in_window_functions
SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
PostgreSQL] database backends does '''not''' mention "percentile rank" at
all. Instead, they use the term '''"relative rank."'''

To prevent confusion, wouldn't it be better to use the same terminology as
the database backends?

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34325#comment:2>

Django

unread,
Feb 9, 2023, 9:34:39 AM2/9/23
to django-...@googlegroups.com

Old description:

New description:

(my emphasis)

[https://github.com/scipy/scipy/blob/dde50595862a4f9cede24b5d1c86935c30f1f88a/scipy/stats/_stats_py.py#L2190
scipy].

The latter definition is similar to that in the Django docs, but still
subtly different in the denominator.

Note also that the documentation for the `percent_rank` function in the
[https://www.sqlite.org/windowfunctions.html#built_in_window_functions
SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
PostgreSQL] database backends does '''not''' mention "percentile rank" at
all. Instead, they use the term '''"relative rank."'''

To prevent confusion, wouldn't it be better to use the same terminology as
the database backends?

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34325#comment:3>

Django

unread,
Feb 9, 2023, 9:35:43 AM2/9/23
to django-...@googlegroups.com

Old description:

> [https://github.com/scipy/scipy/blob/dde50595862a4f9cede24b5d1c86935c30f1f88a/scipy/stats/_stats_py.py#L2190
> scipy].
>
> The latter definition is similar to that in the Django docs, but still
> subtly different in the denominator.
>
> Note also that the documentation for the `percent_rank` function in the
> [https://www.sqlite.org/windowfunctions.html#built_in_window_functions
> SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
> PostgreSQL] database backends does '''not''' mention "percentile rank" at
> all. Instead, they use the term '''"relative rank."'''
>
> To prevent confusion, wouldn't it be better to use the same terminology
> as the database backends?

New description:

(my emphasis)

Both definitions are also used e.g. by
[https://github.com/scipy/scipy/blob/dde50595862a4f9cede24b5d1c86935c30f1f88a/scipy/stats/_stats_py.py#L2190
scipy].

The latter definition is similar to that in the Django docs, but still
subtly different in the denominator.

Note also that the documentation for the `percent_rank` function in the
[https://www.sqlite.org/windowfunctions.html#built_in_window_functions
SQLite] and [https://www.postgresql.org/docs/15/functions-window.html
PostgreSQL] database backends does '''not''' mention "percentile rank" at
all. Instead, they use the term '''"relative rank."'''

To prevent confusion, wouldn't it be better to use the same terminology as
the database backends?

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34325#comment:4>

Reply all
Reply to author
Forward
0 new messages