detano adored fyllbert

0 views

Skip to first unread message

Berniece Leonhardt

unread,

Aug 3, 2024, 1:00:25 AM8/3/24

to blomgoogzolu

In short, Statistics is associated with collecting, classifying, arranging and presenting numerical data. It allows us to interpret various results from it and forecast many possibilities. Statistics deals with facts, observations and information which are in the form of numeric data only. With the help of statistics, we are able to find various measures of central tendencies and the deviation of different values from the center.

Below, the first two formulas find the smallest sample sizes required to achieve a fixed margin of error, using simple random sampling. The third formula assigns sample to strata, based on a proportionate design. The fourth formula, Neyman allocation, uses stratified sampling to minimize variance, given a fixed sample size. And the last formula, optimum allocation, uses stratified sampling to minimize variance, given a fixed budget.

ContentsToggle Main Menu 1 Variance1.1 Definition1.2 Population Variance1.3 Sample Variance1.4 Variance of a Random Variable1.5 Variance of a discrete random variable1.6 Variance of a continuous random variable 2 Standard Deviation2.1 Definition2.2 Population Standard Deviation2.3 Sample Standard Deviation 3 Worked Example3.1 Video Example 4 Workbook 5 External Resources

The variance defines a measure of the spread or dispersion within a set of data. There are two types: the population variance, usually denoted by $\sigma^2$ and the sample variance is usually denoted by $s^2$.

The population variance is the variance of the population. To calculate the population variance, use the formula \[\sigma^2=\frac1N\sum\limits_i=1^N (x_i-\mu)^2\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean.

Usually we only have a sample, the sample variance is the variance of this sample. Given a sample of data of size $n$, the sample variance is calculated using \[s^2=\frac1n-1\sum\limits_i=1^n (x_i-\barx)^2 \text.\]

Make sure you know when to make this distinction. To use the population variance you need all of the data available whereas to use the sample variance you only need a proportion of it. For example, if we take ten words at random from this page to calculate the variance of their length, a sample variance would be needed. To find the population variance, the length of every word on the page would be needed.

However this calculation can take a lot of time as it involves calculating the difference between each element of the sample space and the mean (which is equal to $\mathrmE[X]$ and abbreviated as $\mu$), squaring this difference and then finding the expected value of this new set of square differences.

If we expand the formula for the variance, we see \beginalign \mathrmVar[X] &= \mathrmE[(X - \mathrmE[X])^2 ] \\ &= \mathrmE[X^2 - 2X \mathrmE[X] + \mathrmE[X]^2] \\ &= \mathrmE[X^2] - 2 \mathrmE[X]\mathrmE[X] + ( \mathrmE[X])^2 \\ &= \mathrmE[X^2] - 2 \mathrmE[X]^2 + ( \mathrmE[X])^2 \\ & = \mathrmE[X^2] - (\mathrmE[X])^2\text. \endalign

Given a discrete random variable $X$ over a sample space $S$, we can calculate the variance in one of the following ways: \beginalign \mathrmVar[X] &= \sum\limits_x\in S \mathrmP[X=x](x - \mu)^2\text, \\ \mathrmVar[X] &= \sum\limits_x\in S \ \mathrmP[X=x]\cdot x^2 \ - \mu ^2\text. \endalign

Given a continuous random variable $X$ over a sample space $S$ with probability density function $f(x)$, we can calculate the variance in one of the following ways: \beginalign* \mathrmVar[X] &= \int\limits_x\in S f(x)\cdot (x - \mu)^2 \mathrmd x\text, \\ \mathrmVar[X] &= \int\limits_x\in S (f(x)\cdot x^2 )\mathrmd x - \mu ^2\text. \endalign*

The standard deviation, often denoted by $\sigma$, is the positive square root of the variance. Data sets with a small standard deviation are tightly grouped around the mean, whereas a larger standard deviation indicates the data is more spread out.

The population standard deviation is the standard deviation of the entire population and often denoted by $\sigma$. It is given by the formula \[\sigma = \sqrt\frac1N\sum\limits_i=1^N (x_i - \mu)^2\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean.

Because we have the lengths of every song on the album, we calculate the population standard deviation. This is done using the formula \[\sigma = \sqrt\frac1N\sum\limits_i=1^N (x_i-\mu)^2 \text.\]

\beginalign (x_1-\mu)^2 &= (128-223.0769)^2 = (-95.0769)^2 = 9039.6169 \\ (x_2-\mu)^2 &=(219-223.0769)^2 = (-4.0769)^2 = 16.6211 \\ (x_3-\mu)^2 &=(316-223.0769)^2= (92.9231)^2 = 8634.7025 \\ (x_4-\mu)^2 &=(189-223.0769)^2= (-34.0769)^2 = 1161.2351 \\ (x_5-\mu)^2 &=(512-223.0769)^2= (288.9231)^2 = 83476.5577 \\ (x_6-\mu)^2 &=(98-223.0769)^2 = (-125.0769)^2 = 15644.2309 \\ (x_7-\mu)^2 &=(155-223.0769)^2 = (-68.0769)^2 = 4634.4643 \\ (x_8-\mu)^2 &= (110-223.0769)^2= (-113.0769)^2= 12786.3853 \\ (x_9-\mu)^2 &=(468-223.0769)^2 = (-244.9231)^2 = 59987.3249 \\ (x_10-\mu)^2 &= (177-223.0769)^2 = (-46.0769)^2 =2123.0807 \\ (x_11-\mu)^2 &= (203-223.0769)^2 = (-20.0769)^2 = 403.0819\\ (x_12-\mu)^2 &= (73-223.0769)^2 = (-150.0769)^2 = 22523.0759 \\ (x_13-\mu)^2 &= (252-223.0769)^2 = (28.9231)^2 = 836.5457 \endalign

Long story short, I struggle with a statistics exam at the university. Although I know how to solve all types of exercises and the rationale behind each of them, I still can't pass this exam, because we are supposed to know ninety formulas by heart. I am not complaining about the volume of work that has to be put in, because I did all that had to be done. Still, when practicing, I was looking every time on a sheet where I had all formulas needed.

During the exam, though, we were not allowed to use any help. I find it extremely hard to memorize ninety formulas by heart. I literally don't know how to memorize them; it would take me years to learn them by heart. How can I do this?

In addition to @Buffy suggestions: A lot of formulas in statistics are variations on the same theme. If you understand the relationship between the formulas you can learn for each "block of formulas" the main formula and the relationship with the others in that block.

It does not help you, but for the statistics exam I make I allow a list of formulas. If the student needs or wants to use statistics later, I would prefer that he/she/they look the required formulas up rather than rely on a memory from a course they took years ago. If that is the way I want them to use statistics, then the exam should reflect that.

That 2 facts make me suppose that it has something to do with f(x)=-x^2 and f(x)=e^-x functions. Let combine them into: f(x)=e^-x^2.Of course what we got is some kind of ideal "Unit Gaussian Function". The last thing we need to do is to shape it with some custom parameters in order to place it in a proper place:

Other thing I usually do is to try to learn how the formula was derived from more fundamental theorems or formulas. When I sometimes forget the exact formula, I usually try to take some time to make a very quick "proof" for the formula.For instance: _related_to_chi-squared_distribution

Unfortunately it does not work well for formulas derived using very complex integrals or differential equations. Fortunately there is not so many of them. They usually result in functions that are not elementary anyway and must be published only in a form of tables or value sheets.

Writing is widely reputed to aid in memorization and in my anecdotal case it certainly did: My studying consisted of the creation of dozens of pages of handwritten notes. Then condensing it into a second set of notes maybe 30% shorter. And then a third set, 30% shorter again. (Alongside more traditional attempts at study, understanding and memorization, of course.)

The reduction in length was partly an indication of how much I felt I'd already memorized, but the sensation of actually writing those notes by hand, multiple times, helped. By the time of the exams, I was able to call up key information by imagining all those times I had already written it.

If these formulas are simple enough that you can also work problems by hand, then definitely do that, too. And when I say work them by hand, I mean it literally. Start with the problem, write down the applicable formula(s), and work the problem with nothing more than a hand calculator to handle the mechanics of the arithmetic.

For memorising a set of formulae (or words in a language, names of people, or other simple facts), a good choice is spaced repetition. This is implemented by applications such as Anki, which you can use on a desktop computer or a smartphone; you write the facts you want to memorise on "cards", typically with two "sides" (formula name on one side, formula on the other). You review those cards periodically (e.g. see one side and try to remember the other), and after reviewing a card you can tell the app whether it was easy or difficult for you to recall. That feeds back into how frequently the app asks you to review that card.

This is very similar to Buffy's answer, but I've written it as a separate answer because Buffy specifically promotes the use of physical paper flashcards over digital ones. The upside of using an app is the spaced repetition: it keeps track of how well you have learned each card so far, and then gets you to review those cards at (in theory) the optimal times for increasing your memory retention.

I once took a quantum mechanics course where the professor thought it was really important that we memorize all the relevant formulas. The only way I managed to get through it was to, right before the test, sit and copy the formulas over and over again on a piece of paper for 15-30 minutes. Then as soon as I was given the test, I would write them down on my scratch paper. Good luck.

In stats, we talk about cookbooks (lots of how-to procedures to remember, without background theory) and spookbooks (lots of fundamental theory that tends to frighten rather than enlighten the students).