Quartile Decile Percentile Example Problems

2 views

Skip to first unread message

Astryd Boschee

unread,

Aug 3, 2024, 4:28:19 PM8/3/24

to concpodtises

The simple definition for a percentile is that it indicates the number at which a certain percentage of data falls below. As you learned in previous sections, there are two types of measurements in descriptive statistics: measures of central tendency and variability.

Percentiles are one version of measuring the variability within a data set. A percentile is an important measure because it can help you understand a certain data set better than simple means, modes or medians can.

You scored 50 points. At first glance, 50 out of 100 points may seem like a disappointing grade - for many classes, it would also be considered at the point of failure. However, calculating the percentile, you are at the 90th percentile. In other words, 90% of students scored lower than you did.

In this case, there are 8 students who scored below 50, which means our score is in the 9th position. Next, you take the 9 and divide it by n, or our sample size. In this case, =10. So together, we have 90%. This tells us that, although 50 out of 100 points can seem like a low score, you actually did better than 90% of the people in your class.

You can be in a situation where you want to find the value corresponding to a certain percentile. Taking our example above, you want to find the 70th percentile, or the score at which 70% of students scored below.

The index gives you the observation number for which your 70th percentile is located. If it has a decimal, round to the nearest whole number. Here, the index 7 means that the 7th observation in our data set is the score at the 70th percentile. Counting from the lowest to the highest score, we reach the 7th observed value: a score of 40, which is the 70th percentile for our data.

Deciles are a form of percentiles that split the data up into groups of 10%. Meaning, every decile contains 10% of the data. To find the decile, first order the data from least to greatest. Then, divide the data by 10. This indicates the number of observed values within each decile.

Using our previous example, we divide our data into 10 groups, each containing 10% of the data. This can be visualized in the data above. Because our n is equal to 10, each decile contains only 1 score.

The second quartile is also known as the median, which, as we calculated earlier, is 34 points. Quartile 3 is the 75th percentile, which means that at 41 points, 25% students scored above and 75% of students scored below this number.

I see a lot of questions and answers re order and sort. Is there anything that sorts vectors or data frames into groupings (like quartiles or deciles)? I have a "manual" solution, but there's likely a better solution that has been group-tested.

The first one has the side-effect of labeling the quartiles with the values, which I consider a "good thing", but if it were not "good for you", or the valid problems raised in the comments were a concern you could go with version 2. You can use labels= in cut, or you could add this line to your code:

Sorry for being a bit late to the party. I wanted to add my one liner using cut2 as I didn't know max/min for my data and wanted the groups to be identically large. I read about cut2 in an issue which was marked as duplicate (link below).

Take care with ntile() if your original values are clustered at some values. To create equally sized groups, it will allocate rows with the same original value into different groups. This may not be desirable.

I had a case where scores of individuals were clustered at certain values and it was important that individuals with the same original score were placed in the same group (e.g. allocating students to groups based on test score). ntile() allocated individuals with the same score to different groups (unfair in this case), but cut() with quantile() does not (but groups are only approximately equal in size).

I would like to propose a version, which seems to be more robust, since I ran into a lot of problems using quantile() in the breaks option cut() on my dataset.I am using the ntile function of plyr, but it also works with ecdf as input.

This document provides information about interpreting measures of position through examples and learning tasks. It begins by welcoming students and setting objectives to recognize the connection between measures of position and their interpretations in distributions. Examples are given to interpret quartiles, deciles, and percentiles. Learning tasks then assess understanding of comparing heights and salaries based on their percentile, quartile, and decile positions. The document concludes by reinforcing learning through a quiz.Read less

A few details to clarify. The individual with the lowest value of the variable, with the minimum value, is not bigger than anyone, so the lowest percentile, the percentile of the rock-bottom minimum, is the 0th percentile. If my score is in the 0th percentile, then I am not higher than anyone.

When the number of folks in the group is that large, then for all intents and purposes,the median is the 50th percentile. If you are familiar with the idea of quartiles, then the first quartile is the 25th percentile and the third quartile is the 75th percentile, again, when the group sizes are truly huge.

1) We know that Sasha is near the top of the scoring distribution, so that would mean a score with a percentile close to the 99th percentile. Because of the scoring scale, the score is not going to be above 60, so the percentile is clearly bigger. Answer = B.

Hi Amisha! A score that is one standard deviation above the mean is better than 84% of the scores. That means the score is 84th percentile, which is bigger than 80th percentile. This problem depends on the 34-13.5-2 rule. You can read all about it on this post: -normal-distribution/

Is there something like the 100th percentile? If it does exist, does it mean that the person who secured that percentile is the only one to have got the maximum score or can there be more than one to have gotten the max. score?

I am sorry to say this, but it does not.
Even I hosted a word wide test online (on edX opencourseware) only 2 took the test. So the highest score got a percentile of 50 only .
How can general knowledge on how a nation wide test is conducted influence my answer.
Should I expect such ambiguous questions in GRE?

Dear Vamseedhar,
You are correct, highest percentile is 99%, so for most tests the highest grade is 99th percentile. This question is asking us to compare the numerical grade, to the numerical percentile. For example, for the GRE, the number for the highest score (340) would always be higher than the number for the highest percentile, because 340 > 99. If a test when from, say, 1 to 12, then in all likelihood, the number for the highest score (12) would be less than the number for the highest percentile, 12 < 99. For a test that goes from 0 to 100, Maybe Alice got a score of 100, which was 99th percentile, so (score) > (percentile), or maybe Alice's score was only 78 but it was still 99th percentile, so then (score) < (percentile). We are comparing one number to another number. Does this make sense?
Mike ?

Dear Mike,
Thank you for quick reply. But your answer does not clarify my doubt. What I said was that in an exam like gre a 99 percentiler gets a perfect score. In the question given since the grade is scaled from 0 to 100
a 99 percentiler should get 100 if it is similar to gre. May be the question does not clearly mention if it was absolute grading or relative grading(like gre). In case of relative grading, in the question given, the score of the student will be greater than percentile,I think. I hope you clarify this.

Yes, that definitely makes sense. This is the kind of little mistake I keep making! How frustrating. I got a 790 on the SAT Math back in the day and now I cannot break 155-160 on practice GREs (which has been verified by my Magoosh math score estimate), all because of this silly lack of attention.

You will use percentiles as you progress through your statistics and probability units, along with standard deviation (a measure of dispersion or central tendency), determining outliers, using the normal distribution, including quantiles and deciles, and more exploratory data analysis techniques.

For a large data set, crossing numbers off a list can be time-consuming and a bit confusing, particularly if the data spans over two or more lines when listed. You can use an alternative method to find the lower and upper quartiles.

In situations where data is grouped, this method can also be used to find the class intervals in which the lower and upper quartiles lie. This is particularly true when estimating the quartiles in a histogram.

At Third Space Learning, we specialize in helping teachers and school leaders to provide personalized math support for more of their students through high-quality, online one-on-one math tutoring delivered by subject experts.

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as quartiles (four groups), deciles (ten groups), and percentiles (100 groups). The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

As in the computation of, for example, standard deviation, the estimation of a quantile depends upon whether one is operating with a statistical population or with a sample drawn from it. For a population, of discrete values or for a continuous population density, the k-th q-quantile is the data value where the cumulative distribution function crosses k/q. That is, x is a k-th q-quantile for a variable X if