Using Kaggle Datasets in Google Colab
January 30, 2020
Exploratory Data Analysis
February 3, 2020
Show all

Data Levels of Measurement

There are four measurement scales: nominal, ordinal, interval and ratio. These are simply ways to categorize different types of variables and help us choose the right statistical test, visualization technique and guide our data analysis.

Qualitative Data

Nominal

Let’s start with the easiest one to understand. Nominal scales are used for labeling variables, without any quantitative value.

“Nominal” scales could simply be called “labels.”

Here are some examples, below. Notice that all of these scales are mutually exclusive (no overlap) and none of them have any numerical significance. A good way to remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind of like “names” or labels.

At this level, we cannot perform any quantitative mathematical operations, such as addition or division. These would not make any sense.

We can, however, do basic counts using pandas’ value_counts method.

  • Because of our ability to count at the nominal level, graphs, like bar chartspie charts are available to us.

Ordinal

With ordinal scales, the order of the values is what’s important and significant, but the differences between each one is not really known.

For example, is the difference between “OK” and “Unhappy” the same as the difference between “Very Happy” and “Happy?” We can’t say.

  • Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.

“Ordinal” is easy to remember because is sounds like “order” and that’s the key to remember with “ordinal scales”–it is the order that matters, but that’s all you really get from these.

Advanced note: The best way to determine central tendency on a set of ordinal data is to use the mode or median; a purist will tell you that the mean cannot be defined from an ordinal set.

We can do basic counts as we do with nominal data. Also, for Ordinal data, we can have comparisons and orderings.

For this reason, we may utilize new graphs at this level. We may use bar and pie charts like we did at the nominal level, but because we now have ordering and comparisons, we can calculate medians and percentiles.

  • With medians and percentiles, stem-and-leaf plots, as well as box plots, are possible.

Quantitative Data

Interval

Interval scales are numeric scales in which we know both the order and the exact differences between the values. At Interval level, we will have meaningful differences between values.

  • The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees.

At the interval level, we have addition and subtraction to work with.

  • With the ability to add values together, we may introduce two familiar concepts, the arithmetic mean (referred to simply as the mean) and standard deviation.

The most common graph to utilize starting at this level would be the histogram. This graph is a cousin of the bar graph and visualizes buckets of quantities and shows frequencies of these buckets.

One large advantage of having two or more columns of data at the interval level, is that it opens us up to using scatter plots where we can graph two columns of data on our axes and visualize data-points as literal points on the graph.

Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.

Problem: Here’s the problem with interval scales: they don’t have a “true zero”. For example, there is no such thing as “no temperature,” at least not with Celsius. In the case of interval scales, zero doesn’t mean the absence of value, but is actually another number used on the scale, like 0 degrees Celsius. Negative numbers also have meaning.

Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide.

Confused? Ok, consider this: 10 degreesC + 10 degreesC = 20 degreesC. No problem there. 20 degreesC is not twice as hot as 10 degreesC. When converted to Fahrenheit, it’s clear: 10C=50F and 20C=68F, which is clearly not twice as hot. I hope that makes sense.

— Bottom line, interval scales are great, but we cannot calculate ratios, which brings us to our last measurement scale…

Ratio

Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied.

Good examples of ratio variables include height and weight.

Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios).

— Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.

Summary

In summary, nominal variables are used to “name,” or label a series of values. Ordinal scales provide good information about the order of choices, such as in a customer satisfaction survey. Interval scales give us the order of values + the ability to quantify the difference between each one. Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to calculate ratios since a “true zero” can be defined.

Hope, you got a good understanding of levels of measurement of Data.

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Researcher, Software Developer

Leave a Reply

Your email address will not be published. Required fields are marked *