Fwd: [MITU Network] A Complete Guide To Math And Statistics For Data Science

17 views
Skip to first unread message

Sunil L. Bangare

unread,
Apr 25, 2020, 10:48:14 AM4/25/20
to saebeit...@googlegroups.com, saebeit...@googlegroups.com, saebeit...@googlegroups.com, saebeit...@googlegroups.com, saebeit...@googlegroups.com, saebeit2016-17, saebeit2015-16



Thanks & Regards,
Mr. Sunil L. Bangare,
Ph.D. (CSE) Research Scholar, M.Tech (I.T.)
LMISTE, AMIE (CSE), ACM-CSTA, MIAENG

Assistant Professor,
Training & Placement IT-Dept Coordinator,
Social Media Cell & Industry-Institute Interaction Cell (College Coordinator) 
STES's Sinhgad Academy of Engineering, Kondhwa-Bk, Pune
(Accredited 'A' Grade by NAAC)
Mobile No: 9822239136
E-mail ID: sunil....@gmail.com ,slbang...@sinhgad.edu,
visit:https://www.researchgate.net/profile/Sunil_Bangare
http://in.linkedin.com/pub/prof-sunil-bangare/b/578/866,



---------- Forwarded message ---------
From: vijay chaudhari <vinuda.c...@gmail.com>
Date: Fri, Apr 24, 2020 at 6:42 PM
Subject: Re: [MITU Network] A Complete Guide To Math And Statistics For Data Science
To: <mitu-n...@googlegroups.com>


Very very nice info.. Much understandable.. Thnx a lot Sir.. 👍🙏

On Fri, Apr 24, 2020, 17:47 Tushar B Kute <tus...@tusharkute.com> wrote:

A Complete Guide To Math And Statistics For Data Science

Math And Statistics For Data Science:

As Josh Wills once said,

“Data Scientist is a person who is better at statistics than any programmer and better at programming than any statistician.”

Math and Statistics for Data Science are essential because these disciples form the basic foundation of all the Machine Learning Algorithms. In fact, Mathematics is behind everything around us, from shapes, patterns and colors, to the count of petals in a flower. Mathematics is embedded in each and every aspect of our lives.

Although having a good understanding of programming languages, Machine Learning algorithms and following a data-driven approach is necessary to become a Data Scientist, Data Science isn’t all about these fields. In this blog post, you will understand the importance of Math and Statistics for Data Science and how they can be used to build Machine Learning models.

To get in-depth knowledge on Data Science and the various Machine Learning Algorithms, you can enroll for live Data Science Certification Training by Edureka with 24/7 support and lifetime access.

Here’s a list of topics I’ll be covering in this Math and Statistics for Data Science blog:

    1. Introduction To Statistics
    2. Terminologies In Statistics
    3. Categories In Statistics
    4. Understanding Descriptive Analysis
    5. Descriptive Statistics In R
    6. Understanding Inferential Analysis
    7. Inferential Statistics In R

Introduction To Statistics

To become a successful Data Scientist you must know your basics. Math and Stats are the building blocks of Machine Learning algorithms. It is important to know the techniques behind various Machine Learning algorithms in order to know how and when to use them. Now the question arises, what exactly is Statistics?

Statistics is a Mathematical Science pertaining to data collection, analysis, interpretation and presentation.

Statistics - Math And Statistics For Data Science - Edureka

Statistics – Math And Statistics For Data Science – Edureka

Statistics is used to process complex problems in the real world so that Data Scientists and Analysts can look for meaningful trends and changes in Data. In simple words, Statistics can be used to derive meaningful insights from data by performing mathematical computations on it.

Several Statistical functions, principles and algorithms are implemented to analyse raw data, build a Statistical Model and infer or predict the result.

Statistics Applications - Math And Statistics For Data Science - Edureka

Statistics Applications – Math And Statistics For Data Science – Edureka

The field of Statistics has an influence over all domains of life, the Stock market, life sciences, weather, retail, insurance and education are but to name a few.

Moving ahead. let’s discuss the basic terminologies in Statistics.

Terminologies In Statistics – Statistics For Data Science

One should be aware of a few key statistical terminologies while dealing with Statistics for Data Science. I’ve discussed these terminologies below:

  • Population is the set of sources from which data has to be collected.
  • A Sample is a subset of the Population
  • A Variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item.
  • Also known as a statistical model, A statistical Parameter or population parameter is a quantity that indexes a family of probability distributions. For example, the mean, median, etc of a population.

Before we move any further and discuss the categories of Statistics, let’s look at the types of analysis.

Types Of Analysis

An analysis of any event can be done in one of two ways:

Types Of Analysis - Math And Statistics For Data Science - Edureka

Types Of Analysis – Math And Statistics For Data Science – Edureka

  1. Quantitative Analysis: Quantitative Analysis or the Statistical Analysis is the science of collecting and interpreting data with numbers and graphs to identify patterns and trends.
  2. Qualitative Analysis: Qualitative or Non-Statistical Analysis gives generic information and uses text, sound and other forms of media to do so.

For example, if I want a purchase a coffee from Starbucks, it is available in Short, Tall and Grande. This is an example of Qualitative Analysis. But if a store sells 70 regular coffees a week, it is Quantitative Analysis because we have a number representing the coffees sold per week.

Although the purpose of both these analyses is to provide results, Quantitative analysis provides a clearer picture hence making it crucial in analytics.

Categories In Statistics

There are two main categories in Statistics, namely:

  1. Descriptive Statistics
  2. Inferential Statistics

Descriptive Statistics

Descriptive Statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables.

Descriptive Statistics helps organize data and focuses on the characteristics of data providing parameters.

Descriptive Statistics - Math And Statistics For Data Science - Edureka

Descriptive Statistics – Math And Statistics For Data Science – Edureka

Suppose you want to study the average height of students in a classroom, in descriptive statistics you would record the heights of all students in the class and then you would find out the maximum, minimum and average height of the class.

Descriptive Statistics Example - Math And Statistics For Data Science - Edureka

Descriptive Statistics Example – Math And Statistics For Data Science – Edureka

Inferential Statistics

Inferential Statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.

Inferential statistics generalizes a large data set and applies probability to arrive at a conclusion. It allows you to infer parameters of the population based on sample stats and build models on it.

Inferential Statistics - Math And Statistics For Data Science - Edureka

Course Curriculum

Data Science Certification Course using R

  • Instructor-led Sessions
  • Real-life Case Studies
  • Assignments
  • Lifetime Access

Inferential Statistics – Math And Statistics For Data Science – Edureka

So, if we consider the same example of finding the average height of students in a class, in Inferential Statistics, you will take a sample set of the class, which is basically a few people from the entire class. You already have had grouped the class into tall, average and short. In this method, you basically build a statistical model and expand it for the entire population in the class.

Inferential Statistics Example - Math And Statistics For Data Science - Edureka

Inferential Statistics Example – Math And Statistics For Data Science – Edureka

Now let’s focus our attention on Descriptive Statistics and see how it can be used to solve analytical problems.

Understanding Descriptive Analysis

When we try to represent data in the form of graphs, like histograms, line plots, etc. the data is represented based on some kind of central tendency. Central tendency measures like, mean, median, or measures of the spread, etc are used for statistical analysis. To better understand Statistics lets discuss the different measures in Statistics with the help of an example.

Cars DataSet - Math And Statistics For Data Science - Edureka

Cars Data Set – Math And Statistics For Data Science – Edureka

Here is a sample data set of cars containing the variables:

  1. Cars
  2. Mileage per Gallon (mpg)
  3. Cylinder Type (cyl)
  4. Displacement (disp)
  5. Horse Power (hp)
  6. Real Axle Ratio (drat).

Before we move any further, let’s define the main Measures of the Center or Measures of Central tendency.

Measures Of The Center

  1. Mean: Measure of average of all the values in a sample is called Mean.
  2. Median: Measure of the central value of the sample set is called Median.
  3. Mode: The value most recurrent in the sample set is known as Mode.

Using descriptive Analysis, you can analyse each of the variables in the sample data set for mean, standard deviation, minimum and maximum.

  • If we want to find out the mean or average horsepower of the cars among the population of cars, we will check and calculate the average of all values. In this case, we’ll take the sum of the Horse Power of each car, divided by the total number of cars:

Mean = (110+110+93+96+90+110+110+110)/8 = 103.625

  • If we want to find out the center value of mpg among the population of cars, we will arrange the mpg values in ascending or descending order and choose the middle value. In this case, we have 8 values which is an even entry. Hence we must take the average of the two middle values.

The mpg for 8 cars: 21,21,21.3,22.8,23,23,23,23
Median = (22.8+23 )/2 = 22.9

  • If we want to find out the most common type of cylinder among the population of cars, we will check the value which is repeated most number of times. Here we can see that the cylinders come in two values, 4 and 6. Take a look at the data set, you can see that the most recurring value is 6. Hence 6 is our Mode.

Measures Of The Spread

Just like the measure of center, we also have measures of the spread, which comprises of the following measures:

  1. Range: It is the given measure of how spread apart the values in a data set are.
  2. Inter Quartile Range (IQR): It is the measure of variability, based on dividing a data set into quartiles.
  3. Variance: It describes how much a random variable differs from its expected value. It entails computing squares of deviations.
    1. Deviation is the difference between each element from the mean.
    2. Population Variance is the average of squared deviations
    3. Sample Variance is the average of squared differences from the mean
  4. Standard Deviation: It is the measure of the dispersion of a set of data from its mean.

--
Tushar B Kute,
Researcher, Computer Science,
MITU Skillologies, Pune
Website | Facebook | Blog | Articles | Travel | Hindi Blog |g+ | mail
P Please don't print this e-mail unless you really need to.
Use Open Source and Be safe, Be secure.

--
- MITU Skillologies technical communication group [http://mitu.co.in]
---
You received this message because you are subscribed to the Google Groups "MITU Network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mitu-network...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mitu-network/CAAxW26pv3qUk5XUKjrHA9tABkO8OXgAZXbCNbEYGKiLVn_1D6g%40mail.gmail.com.

--
- MITU Skillologies technical communication group [http://mitu.co.in]
---
You received this message because you are subscribed to the Google Groups "MITU Network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mitu-network...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mitu-network/CALo7an0TdHr2HYwbEzqGfW2-TGrA8DD5b1ygNNvfWsr0my8g_A%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages