← back


Stop Mistaking Correlation For Causation In Your Engagement Survey Results

May 23, 2022, 5-min. read
Stop Mistaking Correlation For Causation In Your Engagement Survey Results

If you have any experience with surveys, or if you've ever worked with data, you've had probably come across the term correlation. Correlation describes to which degree two variables (values) move in coordination with one another, or in other words, what is the relationship between them. 

A correlation is expressed as a number that can take range from -1 to +1, where the value of +1 tells us that the two figures move in the same direction (have a positive correlation), whereas a value of -1 tells us the opposite (negative correlation). Thus, with a positive correlation, if one value goes up, the other goes up as well, and with a negative correlation, an increase in one value is related to a decrease in the other.

The correlation can thus give the impression that there is a causal relationship between the two values, i.e. that one causes the other to change. However, this can often lead to a number of fallacies which, in practice, mean that you can invest a lot of effort, time, and cost in something that does not lead to the expected result. 

So if you want to work well with your survey results, it is useful to understand what correlation truly means and what it tells you.

Employee Engagement Surveys | Download Free e-Book

Why correlation is not causation?

Let's try to illustrate this with a simple example. If the data shows you that there is a correlation between phenomenon A and phenomenon B, you know that there is a relationship between the two. But you don't know what that relation is. The reality may be that phenomenon A affects phenomenon B. In that case, we say that A is the independent variable, whereas B is the dependent variable (dependent on A). 

Examples of such variables can be age (A) and height (B) in children. It is quite indisputable that the older children are, the taller they are. The reverse relationship does not hold, i.e. body height does not affect age. In this case, causality, i.e. what affects what can be determined by deeper knowledge and experience. It is logical. 

In another case, however, the relationship may be reversed, i.e., that B affects A. And there may even be a correlation between the two values, but they don't actually affect each other, but another third (fourth, fifth, etc.) variable enters the picture and affects both. 

A long time ago, research proved that children's weight correlates with intelligence. So what does this mean? What is the dependent variable and what is the independent variable? Is it that the heavier a child is, the smarter they are, or is it the other way around, that the smarter a child is, the more they weigh? Neither sounds very logical. How can weight, meaning obesity, be related to intelligence? And yet there was a high correlation between the two. Coincidence? No. The variable the researchers forgot was age. The older the child studied, the more they weighted and thus performed better on intelligence tests. In this instance, age is another variable (C) that affects both A and B without any causal relationship between A and B.

What does this mean for your engagement surveys? 

If you are working with survey results, be sure not to make a false conclusion about cause and effect. Take, for example, variables such as motivation and performance. If you find that there is a correlation between the two, what does that tell you? Does it mean that motivated people perform better or that high performers feel more motivated? It kind of brings to mind the question of what came first: the chicken or the egg? 

What to do about it? And how to make the most out of correlations? Often, a deeper knowledge of the phenomenon and logic can help, like the example where, you know, age affects the height of children. 

Why correlation is not causation?

Never miss a LutherOne article or e-Book: SUBSCRIBE

For example, in surveys, it might be the correlation between eNPS (the willingness to recommend a company as an employer), and the company's index of Employee care. If a positive correlation shows between the two (as it often does), logic tells us that the more people feel a company cares about them, the more they are willing to recommend it as an employer. The reverse causality is not logical, though not impossible. So here again, you can only establish causality if you have enough information and therefore deeper knowledge.

But what if you don't have a wider context and deeper understanding and have to rely on data? Then hypothesis generation and testing can help. In the example of Employee care and eNPS, you can test the hypothesis that people who are cared for will recommend the company more. So you give some people more care than they have been getting and see if and how that translates into their willingness to recommend the business versus people who don't get any extra treatment. 

The reverse causality is not logical, though not impossible. So here again, you can only establish causality if you have enough information and therefore deeper knowledge.

You can test the reverse hypothesis by offering some people a benefit for recommending the firm and others not, and you can observe whether and how this translates into realistic Employee care ratings.

But both experiments may show that there is no direct relationship between these values, i.e. changing one does not change the other, and both depend on something else. For example, on how people perceive their manager. If I am satisfied with my boss, I rate more positively how the company takes care of me and am more likely to recommend the company as an employer. 

So the lesson is to form hypotheses, test them, and only use the ones that are confirmed going forward

The ideal way to test hypotheses in this way is not to work with surveys once in a while, but to measure continuously. In that case, you get clear feedback fairly quickly on whether and how other parameters change at all when you put effort into changing one of them. 

Moreover, such continuous measurement gives you information about the evolution of correlations. These are usually not static and unchanging and can shift over time. Recall the initial fairly clear example that children's height depends on age. However, even this dependence changes over time. Children can literally grow by leaps and bounds during puberty, and it is not so much the exact age at that point that is crucial, but rather the onset of puberty and the level of hormones affecting growth. Typically, you will then see that gender is also a significant determinant of growth at this age, as girls have an earlier onset of puberty than boys. 

In the corporate practice, try to keep on postponing people's bonus payments, for example, and you'll find that no matter how much care you give them, they still won't recommend you as an employer 😉.

Try for free banner 1

Get the latest news straight into your e-mail