Question:
Note:
1. Need to write at least 2 paragraphs
2. Need to include the information from the textbook as the reference.
3. Need to include at least 1 peer reviewed article as the reference.
4. Please find the textbook and related power point in the attachment
Describing Data: Numerical Measures
Chapter 3
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-1
In this chapter you will learn how to calculate measures of location and measures of dispersion. The purpose of a measure of location is to pinpoint the center of the data and these results are often referred to as averages. For example, the average US home changes ownership every 11.8 years and the average American home has more TV sets than people There are 2.73 TV sets and 2.55 people in the typical home. The dispersion in the data refers to the variation or spread in the data. For example, average incomes for two different industries may appear to be the same, but when the variation in the incomes from one industry to another is examined, you find there is much more variation in one than the other.
1
Learning Objectives
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
LO3-1 Compute and interpret the mean, the median, and the mode
LO3-2 Compute a weighted mean
LO3-3 Compute and interpret the geometric mean
LO3-4 Compute and interpret the range, variance, and standard deviation
LO3-5 Explain and apply Chebyshev’s theorem and the Empirical Rule
LO3-6 Compute the mean and standard deviation of grouped data
3-2
Measures of Location
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
A measure of location is a value used to describe the central tendency of a set of data
Common measures of location
Mean
Median
Mode
The arithmetic mean is the most widely reported measure of location
3-3
When calculating a mean, there are different symbols to use when working with a population and when working with a sample. While the arithmetic mean is the most widely used measure of location, there are other means you will learn to calculate, namely, the weighted mean and the geometric mean. Later in the chapter, you learn to calculate an estimate of the mean of grouped data. There are other measures of central tendency too, such as median and mode.
3
Population Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
A measurable characteristic of a population is a parameter
3-4
PARAMETER A characteristic of a population.
Here is the formula to calculate the population mean. For raw data, the population mean is calculated by adding the values of the observations and dividing by the total number of observations. These are population values because we are including all the values under consideration. An example of a population mean is the mean closing price for Johnson & Johnson stock for the last 5 days is $139.05. The mean of a population is an example of a parameter.
4
Example: Population Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances between exits (in miles).
Why is this information a population?
What is the mean number of miles between exits?
3-5
Perhaps you are interested in finding the average distance between exits on Interstate 75 in Kentucky. This data set lists the distances between all the exits along the Kentucky portion of the interstate.
5
Example: Population Mean Continued
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances between exits (in miles).
Why is this information a population?
This is a population because we are considering all of the exits in Kentucky.
What is the mean number of miles between exits?
3-6
We interpret the value 4.57 as the typical number of miles between exits. This value is a population parameter.
6
Sample Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
A measurable characteristic of a sample is a statistic
3-7
STATISTIC A characteristic of a sample.
The formula we use to calculate the sample mean is on this slide. For raw data, the mean of a sample is the sum of all the sampled values divided by the total number of sampled values. To save time and money, we often select a sample from a population to estimate characteristics of the population. An example is taking a sample of 10 jars of Smucker’s orange marmalade and weighing the 10 jars, calculating the mean weight. Then we can use that value to estimate the amount for all the jars in the population. The mean of a sample is a statistic.
7
Example: Sample Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-8
We interpret 97.5 minutes as the typical number of minutes used last month. This value is a sample statistic.
8
Properties of the Arithmetic Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Interval or ratio scale of measurement is required
All the data values are used in the calculation
The mean is unique
The sum of the deviations from the mean equals zero
A weakness of the mean is that it is affected by extreme values.
3-9
Listed here are the major characteristics of the arithmetic mean. If one or two of the values in the data set are extremely large or small compared to the majority of the data, the mean may not be representative of the data set and therefore may not be the best “average” to use and you may chose to use another measure of location, like median or mode, to represent the data.
9
The Median
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-10
MEDIAN The midpoint of the values after they have been ordered from the minimum to the maximum values.
The median is the value in the middle of a set of ordered data and is often used to describe data sets where there are one or just a few extreme values, such as with real estate prices or household incomes in a particular area. On this slide are prices for condos in Palm Aire with an arithmetic mean price of $110,000. A better measure would be the median since it, $70,000, is not affected by the $275,000 unit. When finding the median, it doesn’t matter if the values are sorted in ascending order or descending order. Since there are an odd number of values in this data set it is fairly easy to find the value that divides the set in half, with the same number of observations below $70,000 as above $70,000. Remember, the data must be at least the ordinal level of measurement.
10
Characteristics of the Median
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The median is the value in the middle of a set of ordered data
At least the ordinal scale of measurement is required
It is not influenced by extreme values
Fifty percent of the observations are larger than the median
It is unique to a set of data
3-11
In cases where the mean is not representative of the data, you may decide to use the median.
11
Finding the Median
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
To find the median for an even numbered data set, sort the observations and calculate the average of the two middle values
3-12
The number of hours a sample of 10 adults used Facebook last month:
3 5 7 5 9 1 3 9 17 10
Arranging the data in ascending order gives:
1 3 3 5 5 7 9 9 10 17
Thus, the median is 6.
The median is found by averaging the two middle values. The middle values are 5 hours and 7 hours, and the mean of these two values is 6. We conclude that the typical adult Facebook user spends 6 hours per week at the website. Notice that the median is not one of the values. Also, half of the times are below the median and half are above it.
12
The Mode
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The mode can be found for nominal level data
A set of data can have more than one mode
A set of data could have no mode
3-13
MODE The value of the observation that occurs most frequently.
The mode is useful for summarizing nominal level data. Here a company has developed five bath oils and has conducted a marketing survey to find which bath oil consumers prefer. We see that most of the survey respondents favored Lamoure, since it is the highest bar. So Lamoure is the mode. Mode can be determined for all levels of measurement and it is not affected by extreme values. A disadvantage of using the mode is that a data set may not have a mode or that it may have more than one mode.
13
Relative Positions of Mean, Median, and Mode
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-14
Refer to the first chart on the left. It is a symmetric distribution with zero skewness where the mean=median=mode. The middle chart is positively skewed because the mode<median<mean and it has a tail on the right. In the last chart, the mean<median<mode and is negatively skewed with a tail to the left. If a distribution is highly skewed, the mean is probably not a representative measure of central tendency and the median or mode should be used.
14
The Weighted Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The weighted mean is found by multiplying each observation, x, by its corresponding weight, w
The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees: 14 are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate
What is the mean hourly rate paid for the 26 employees?
3-15
Here is the formula for calculating the weighted mean. You’ll discover that this is really just a shortcut method of computing the arithmetic mean, which we can use when we have recurring values in a data set. The mean hourly rate is $18.12.
15
The Geometric Mean
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The geometric mean is the nth root of the product of n positive values
The formula for geometric mean is:
The geometric mean is also used to find the rate of change from one period to another
3-16
Here are the formulas to calculate the geometric mean and to find the rate of change over time. The geometric mean is always equal to or less than the arithmetic mean.
16
Why Study Dispersion?
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The dispersion is the variation or spread in a set of data
Measures of dispersion include:
Range
Variance
Standard Deviation
3-17
A measure of location only describes the center of the data; it does not tell us anything about the spread of the data. We may need to know something about the variation in the data. Measures of dispersion also allow us to compare two or more distributions. Small measures of dispersion indicate the data are closely clustered around the mean and therefore the mean is representative of the data; large measures of dispersion indicate that the mean may not be representative of the data. We’ll learn how to compute the range, the variance, and the standard deviation as well as the major characteristics of each.
17
Range
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-18
The range is the difference between the maximum and minimum values in a set of data
The major characteristics of the range are
Only two values are used in its calculation
It is influenced by extreme values
It is easy to compute and to understand
Population Variance
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Major characteristics of the variance are:
All observations are used in the calculation
The units are somewhat difficult to work with; they are the original units squared
3-19
The formula for determining the population variance is shown above and is another measure of dispersion. The variance is the mean of the squared deviations from the arithmetic mean.
19
Example: Population Variance
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The number of traffic citations issued last year by month in Beaufort County, South Carolina is reported below.
Determine the population variance.
3-20
So, the population variance for the number of citations is 124.
The sample variance is used to estimate the population variance. Notice the change in the denominator. Rather than use n, we use n – 1 so that we do not underestimate the population variance.
20
Standard Deviation
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
The major characteristics of the standard deviation are:
It is in the same units as the original data
It is the square root of the average squared distance from the mean
It cannot be negative
It is the most widely used measure of dispersion
3-21
These properties are useful for understanding both population standard deviations and sample standard deviations. The population standard deviation is the square root of the population variance and uses the Greek symbol sigma.
The standard deviation is the most widely reported measure of dispersion.
21
Sample Variance and Standard Deviation
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-22
The formula for sample standard deviation is simply the square root of the variance and the symbol “x bar” represents sample standard deviation.
Likewise, the population standard deviation (not shown) is the square root of the population variance and uses the Greek symbol sigma.
The standard deviation is the most widely reported measure of dispersion.
22
Interpretations and Uses of the Standard Deviation
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Dupree Paint Company employees contribute a mean of $51.54 to the company’s profit-sharing plan every two weeks. The standard deviation of biweekly contributions is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean, that is, between $25.26 and $77.83?
3-23
We can use the standard deviation to describe a distribution. The Russian mathematician P. L. Chebyshev developed a theorem that allows us to determine the minimum proportion of values that lie within a specified number of standard deviations of the mean. For example, consider 2 standard deviations. We find that 1 – 1/22 = .75, so a minimum of 75% of values lie within the mean. This theorem can be used regardless of the shape of the distribution.
23
Interpretations and Uses of the Standard Deviation (2 of 2)
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-24
THE EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus and minus one standard deviation of the mean, about 95% of the observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will lie within 3 standard deviations of the mean.
If we have a symmetrical distribution, we can use the Empirical Rule, sometimes called the Normal Rule, which allows us to be more precise than with Chebyshev’s Theorem. Here is a symmetrical distribution with a mean of 100 and a standard deviation of 10. Applying the Empirical Rule, we’ll find about 68% of the values between 90 and 110, about 95% of the values between 80 and 120, and about 99.7% of the values between 70 and 130.
24
Sample Mean of Grouped Data
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-25
This formula is used to find an estimated mean of data in a frequency distribution; once the data has been grouped in classes, individual values are no longer available.
Letting the midpoint of each class represent the values in each group, multiply the midpoint by the class frequency for each group, sum these products, and then divide by the total number of frequencies. See table 3-1 in the text for more detail.
25
Standard Deviation of Grouped Data
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-26
To calculate the standard deviation of data grouped into a frequency distribution, we need to adjust formula (3–10) slightly. We weight each of the squared differences by the number of frequencies in each class.
26
Calculating the Standard Deviation of Grouped Data
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Applewood Auto Group Frequency Distribution Compute the standard deviation of the vehicle profits.
3-27
To calculate the standard deviation of grouped data, begin by estimating the mean (in this example, the mean is $1,851), then find the deviation of each value from the midpoint, square the results in this column, and then multiply by the class frequency. Divide the sum of these products and divide by n-1 and finally, take the square root of that calculation.
27
Ethics and Reporting Results
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Useful to know the advantages and disadvantages of mean, median, and mode as we report statistics and as we use statistics to make decisions
Important to maintain an independent and principled point of view
Statistical reporting requires objective and honest communication of any results
3-28
Chapter 3 Practice Problems
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-29
Question 19
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-30
The accounting firm of Rowatti and Koppel specializes in income tax returns for self-employed professionals, such as physicians, dentists, architects, and lawyers. The firm employs 11 accountants who prepare the returns. For last year, the number of returns prepared by each accountant was:
Find the mean, median, and mode for the number of returns prepared by each accountant. If you could report only one, which measure of location would you recommend reporting?
LO3-1
Question 25
Copyright 2018 by McGraw-Hill Education. All rights reserved.
1-31
The Loris Healthcare System employs 200 persons on the nursing staff. Fifty are nurse’s aides, 50 are practical nurses, and 100 are registered nurses. Nurse’s aides receive $12 an hour, practical nurses $20 an hour, and registered nurses $29 an hour. What is the weighted mean hourly wage?
LO3-2
Question 27
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-32
Compute the geometric mean of the following monthly percent increases: 8, 12, 14, 26, and 5.
LO3-3
Question 33
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-33
In 2011 there were 232.2 million cell phone subscribers in the United States. By 2017 the number of subscribers increased to 265.9 million.
What is the geometric mean annual percent increase for the period?
Further, the number of subscribers is forecast to increase to 276.7 million by 2020.
What is the rate of increase from 2017 to 2020?
Is the rate of increase expected to slow?
LO3-3
Question 37
Copyright 2018 by McGraw-Hill Education. All rights reserved.
1-34
Dave’s Automatic Door installs automatic garage door openers. The following list indicates the number of minutes needed to install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.
Calculate the following:
Range
Mean
Variance
LO3-1, 4
Question 45
Copyright 2018 by McGraw-Hill Education. All rights reserved.
1-35
Plywood Inc. reported these returns on stockholder equity for the past 5 years: 4.3, 4.9, 7.2, 6.7, and 11.6. Consider these as population values.
Compute the following:
Range
Arithmetic mean
Variance
Standard deviation
LO3-1, 4
Question 49
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-36
Dave’s Automatic Door installs automatic garage door openers. Based on a sample, following are the times, in minutes, required to install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.
Compute the sample variance.
Determine the sample standard deviation.
LO3-4
Question 53
Copyright 2018 by McGraw-Hill Education. All rights reserved.
1-37
According to Chebyshev’s theorem, at least what percent of any set of observations will be within 1.8 standard deviations of the mean?
LO3-5
Question 55
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-38
The distribution of the weights of a sample of 1,400 cargo containers is symmetric and bell-shaped. According to the Empirical Rule, what percent of the weights will lie:
LO3-5
Question 59
Copyright ©2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
3-39
Estimate the mean and the standard deviation of the following frequency distribution showing the ages of the first 60 people in line on Black Friday at a retail store.
LO3-6