Apopulation is a collection of measurements that you are interested in studying. For example, a population could consist of the incomes of the residents of the United States. Because analyzing every element of a population is often impractical, a sample of data may be chosen from a population and used as a substitute for the population.
The variance measures the average squared difference between the elements of a sample or a population and the mean. The formulas for computing the sample and population variances are slightly different.
For both measures, a positive value indicates that two variables tend to move in the same direction, while a negative value indicates that two variables tend to move in opposite directions. A value of zero indicates that two variables are unrelated to each other (they are independent of each other).
A random variable is used to assign numerical values to all the possible outcomes of a random experiment. A probability distribution assigns probabilities to these numerical values. The two basic types of probability distributions are:
The binomial distribution is based on a random process in which a series of independent trials take place. On each trial, only two possible outcomes can occur. The probability of a given number of successful outcomes during a fixed number of trials is:
The Poisson distribution is used to measure the probability that a specified number of events will occur over the next interval of time. Probabilities for the Poisson distribution are computed as follows:
The uniform distribution is used to describe a situation in which all possible outcomes of a random experiment are equally likely to occur. For a uniform random variable X that is defined over the interval (a, b) the equation is:
The normal distribution is one of the most widely used distributions in all applied disciplines such as economics, finance, psychology, biology, and so on. The normal distribution has several key properties, such as:
Normal probabilities are used for sampling distributions, confidence intervals, hypothesis testing, regression analysis, and many other applications. Normal probabilities can be computed with tables, with the Texas Instruments TI-84 calculator, Microsoft Excel, or a statistical programming language.
The chi-square distribution is a continuous distribution derived from the standard normal distribution. Unlike the standard normal distribution, it is positively skewed and only defined for positive values. The chi-square distribution can be used to test hypotheses about the population variance, for goodness of fit tests, and many other applications.
The F-distribution is another positively skewed continuous distribution. It is useful for testing hypotheses about the equality of two population variances and is also used in conjunction with analysis of variance (ANOVA) and regression analysis.
A sample statistic such as (the sample mean) can be thought of as a random variable that has its own probability distribution. According to the Central Limit Theorem, the sampling distribution of is normally distributed if the underlying population is normally distributed and/or the sample size is at least 30.
The null hypothesis is a statement that is assumed to be true unless there is strong contrary evidence. The alternative hypothesis is a statement that is accepted in place of the null hypothesis if there is sufficient evidence to reject the null hypothesis.
The level of significance is the probability of incorrectly rejecting the null hypothesis (this is referred to as a Type I Error). A test statistic is a numerical measure that is constructed to determine whether the null hypothesis should be rejected.
The critical value (or values) shows how extreme the test statistic must be in order to reject the null hypothesis. The decision as to whether or not to reject the null hypothesis is based on a comparison of the test statistic and the critical value (or values).
Regression analysis is a statistical methodology that helps you estimate the strength and direction of the relationship between two or more variables. Simple regression is based on the relationship between a dependent variable and a single independent variable, while multiple regression is based on the relationship between a dependent variable and two or more independent variables.
The results of regression analysis must be tested to ensure that the results are valid. One of the most important tests is known as the t-test; this is a hypothesis test that is used to determine if the slope coefficient equals zero. If this hypothesis cannot be rejected, it means that the independent variable (X) does not explain the value of the dependent variable (Y).
Alan Anderson, PhD is a teacher of finance, economics, statistics, and math at Fordham and Fairfield universities as well as at Manhattanville and Purchase colleges. Outside of the academic environment he has many years of experience working as an economist, risk manager, and fixed income analyst. Alan received his PhD in economics from Fordham University, and an M.S. in financial engineering from Polytechnic University.
Modern businesses rely on data professionals to enhance and support decision-making. The primary objective of this course is to acquire proficiency in statistical techniques and concepts so that informed decisions may be made throughout critical business processes. Thus, the major topics of the course include: Descriptive statistics, both graphic & numeric;
Probability and probability distributions, including (at least) the binomial and normal;
Sampling distributions for means and proportions; Confidence interval estimation; Hypothesis testing; Simple linear regression. Applications will be drawn from primary business disciplines such as accounting, economics, finance, information systems, management, and marketing.
Yet in order to integrate data into your decision-making processes, you need a set of tools to transform raw data into a valuable asset. The primary tool set every data-driven decision maker needs is statistics. As the foundation of any data-driven decision, statistics helps you make sense of your data. This certificate program is designed to help you not only gain a strong working knowledge of statistical concepts but also the ability to apply them to your data to make better business decisions.
In order to uncover insights in data, it is important to draw conclusions about the population that is being studied using numerical measures. In this course, you will identify various numerical measures including percentiles, range, variance, and standard deviation. You will then see how to visualize and draw conclusions on quantitative or qualitative variables. This course uses tables and charts to compare combinations of variables, identify the means of finding relationships between variables, and teaches you to interpret results and make predictions between variables.
In order to use data from a sample group to make judgments about an entire population, you will explore probability in order to move toward the area of inferential statistics in this course. You will identify the role of discrete variables, use them in determining probability, find the expected value, and define variance. Additionally, the normal distribution, often called the bell curve, is a practical model for many business measurements, including financial decision making, process variations, and salaries. In this course you will examine the normal distribution and identify how to determine probabilities and percentiles from each of these distributions.
It is often not feasible to capture parameters for an entire population; however, it's necessary to gather statistics to estimate population parameters. In this course, you will walk through the multiple methods of collecting samples and examining margin of error and confidence intervals, including how they are calculated. You will then explore another area of inferential statistics called hypothesis testing to start with a hypothesized value. One of the most important measures to calculate is the p-value, which helps gauge the significance of your findings. You will observe the role that p-values play in hypothesis testing and the way in which they are calculated.
An ever-present need in business is to compare two populations, such as sales of related products, different customer segments, or productivity of factory work shifts, to name a few. In this course, you will examine how to compare two population means. Just as there is a need to look at two populations, the same is true for larger groups. However, the process of comparing three or more population means is significantly different. You will investigate the comparison of multiple means, including the experiment designs to choose from and the three-step process to follow. Additionally, you will explore how hypothesis testing is used to make judgments about a population.
Many times, however, comparisons are needed on more than one variable, such as a survey given to two different audiences or a defect caused by different pieces of equipment. Lastly, in this course you will examine tests on two variables, having either two options or multiple options and identify the formulas used in these comparisons.
Forecasting can be found in every corner of the business world today. When done in tandem with accurate time series analysis, it enables sound prediction of future values. In this course, you will explore the use of time series analysis and the four components of time series data. Consider, there are a number of time series that may require forecasting but do not have any discernible trend, such as a stable product environment or a very short timeframe. In this course you will continue exploring forecasting by examining stationary time series and the situations in which they most often occur and practice forecasting techniques and stationary time series analysis. You will then examine stationary data where no substantial change is taking place. Lastly, you will move to data that is changing. A layer of complexity can be added to forecasting in the form of seasonality, where the time series being studied regularly changes with each season. This added element must be considered in any prediction of future periods.
3a8082e126