2.5 Measures of the Center of the Data
The center of a data set is also a way of describing location. The two most widely used measures of the center of the data are the mean (average) and the median.
Mode of a data set is the data that occurs with the highest frequency. Since mode is the most frequently occurring value, it can be used to describe a typical (central) value of a data set. If a data set has two modes, it is called bimodal; three modes, trimodal, or multimodal. A data set can have no modes as well.
Review
Review: Statistical Language – Measures of Central Tendency
The linked resource discusses shapes of distribution that we’ll cover in the next section.
Optional: Finding mean, median, and mode @KhanAcademy
Review & Practice
Review: Statistics intro: Mean, median, and Mode
Please complete the practice exercise Calculating mean and median from data displays at the end of the review linked above.
Resistant measures: A numerical summary of data is said to be resistant if extreme values (very large or small outliers) relative to the data do not affect its value substantially.
Video: Resistance & Measures of Center
Calculating the Mean/Median of a Frequency Distribution Table (FDT) or Grouped Frequency Distribution Tables (GFDT)
Let’s look at a Frequency Distribution Table (FDT) below.
Number of times teenager is reminded | Frequency |
---|---|
0 | 2 |
1 | 5 |
2 | 8 |
3 | 14 |
4 | 7 |
5 | 4 |
How many data observations are there?
5, 6, or way more?
Note that frequency represents the count of a data value, and we if add up all of the counts (frequencies), then we’ll have the total count (= [latex]n[/latex], the total number of observations). In this case, [latex]n=40[/latex]. if there are [latex]40[/latex] values, the median must be the average of the data values in the [latex]20^{\text{th}}[/latex] and [latex]21^{\text{st}}[/latex] positions. But what are those values? We see from the table that the first two are [latex]0[/latex]‘s, the next 5 are [latex]1[/latex]‘s; that’s a total of [latex]7[/latex]. The next eight are [latex]2[/latex]‘s, so that brings us to [latex]15[/latex] total we’ve seen so far. If you thought that sounded like cumulative frequency, you’d be correct! The cumulative frequency of the data value of [latex]2[/latex] is [latex]15[/latex]. The next 14 values, 16 through 29 are all [latex]3[/latex]‘s. That means the value in the [latex]20^{\text{th}}[/latex] and [latex]21^{\text{st}}[/latex] are both [latex]3[/latex]. So the median must be [latex]3[/latex] as well since the average of two [latex]3[/latex]‘s is a [latex]3[/latex].
If we revert back to the raw data, this data set can be written as [latex]0, 0, \underbrace{1, 1, 1, 1, 1}_{\text{5 times}}, \underbrace{2, 2, 2, 2, 2, 2, 2, 2}_{\text{8 times}}, ...[/latex] and so on. We can also look at the expanded raw data without having to worry about cumulative frequencies. For calculating the mean, we need to add up all the [latex]40[/latex] data values. We know there are 2 zeros, that add to 0; five 1’s that add to 5; eight 2’s that add to [latex]16 = 2\times 8[/latex], and so on. We can get the total from each data value by multiplying the data value with its frequency. So, the mean for the data set is: \[\bar x = \frac{0 \times 2+1 \times 5+2 \times 8+ 3 \times 14+ 4 \times 7+ 5 \times 4}{40} = 111/40 = 2.775\] Wait, can I do this on a calculator?
Use One Variable Statistics from SUBEDI Calculators to compute mean for frequency tables. For data input make sure to select Frequency Table and enter data values in the first column and corresponding frequencies in the second column before pressing calculate button.
Grouped Frequency Distribution Tables (GFDT)
If a dataset has many data values, it may be too long to list all the values individually in a FDT. In such cases, we can group the data values into classes (groups) to create a Grouped Frequency Distribution Table (GFDT). Here’s an example of scores of students on an exam:
Scores | Frequency |
---|---|
30–39 | 5 |
40–49 | 13 |
50–59 | 11 |
60–69 | 15 |
70–79 | 20 |
Since the data values are given as ranges (class widths), we first find the midpoints of each class and use the method from the FDT above to calculate estimates for the mean and median.
Practice