When can you assume causation




















The following example is taking from this Wikipedia page. For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests.

One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying or going to the amusement park had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation. Correlation is not sufficient for causation. One can get around the Wikipedia example by imagining that those twins always cheated in their tests by having a device that gives them the answers.

The twin that goes to the amusement park loses the device, hence the low grade. A good way to get this stuff straight is to think of the structure of Bayesian network that may be generating the measured quantities, as done by Pearl in his book Causality. His basic point is to look for hidden variables.

If there is a hidden variable that happens not to vary in the measured sample, then the correlation would not imply causation. Expose all hidden variables and you have causation. I'll just add some additional comments about causality as viewed from an epidemiological perspective. Most of these arguments are taken from Practical Psychiatric Epidemiology , by Prince et al. Causation, or causality interpretation , are by far the most difficult aspects of epidemiological research.

Cohort and cross-sectional studies might both lead to confoundig effects for example. Quoting S. Asher in Causal Modeling Sage, initially proposed the following set of criteria to be fulfilled:.

While the first two criteria can easily be checked using a cross-sectional or time-ordered cross-sectional study, the latter can only be assessed with longitudinal data, except for biological or genetic characteristics for which temporal order can be assume without longitudinal data. Of course, the situation becomes more complex in case of a non-recursive causal relationship.

I also like the following illustration Chapter 13, in the aforementioned reference which summarizes the approach promulgated by Hill which includes 9 different criteria related to causation effect, as also cited by James.

The original article was indeed entitled "The environment and disease: association or causation? I'd like to add the following references roughly taken from an online course in epidemiology are also very interesting:. Finally, this review offers a larger perspective on causal modeling, Causal inference in statistics: An overview J Pearl, SS 3.

At the heart of your question is the question "when is a relationship causal? They start from the experimental ideal where we are able to randomise the "treatment" under study in some fashion and then they move onto alternative methods for generating this randomisation in order to draw causal influences. This begins with the study of so called natural experiments.

One of the first examples of a natural experiment being used to identify causal relationships is Angrist's paper on "Lifetime Earnings and the Vietnam Era Draft Lottery. A key problem with estimating any causal effect is that certain types of people may be more likely to enlist, which may bias any measurement of the relationship. Angrist uses the natural experiment created by the Vietnam draft lottery to effectively "randomly assign" the treatment "military service" to a group of men.

So when do we have a causality? Under experimental conditions. When do we get close? Under natural experiments. There are also other techniques that get us close to "causality" i. They include regression discontinuity, difference-in-differences, etc.

There is also a problem with the opposite case, when lack of correlation is used as a proof for the lack of causation. This problem is nonlinearity; when looking at correlation people usually check Pearson, which is only a tip of an iceberg. Your example is that of a controlled experiment. The only other context that I know of where a correlation can imply causation is that of a natural experiment.

Basically, a natural experiment takes advantage of an assignment of some respondents to a treatment that happens naturally in the real world. Since assignment of respondents to treatment and control groups is not controlled by the experimenter the extent to which correlation would imply causation is perhaps weaker to some extent. Researchers using nonrandomized designs have an extra obligation to explain the logic behind covariates included in their designs and to alert the reader to plausible rival hypotheses that might explain their results.

Even in randomized experiments, attributing causal effects to any one aspect of the treatment condition requires support from additional experimentation. In the twins example it is not just the correlation that suggests causality, but also the associated information or prior knowledge. Suppose I add one further piece of information. Assume that the diligent twin spent 6 hours studying for a stats exam, but due to an unfortunate error the exam was in history.

Would we still conclude the study was the cause of the superior performance? Determining causality is as much a philosophical question as a scientific one, hence the tendency to invoke philosophers such as David Hume and Karl Popper when causality is discussed. Not surprisingly medicine has made significant contributions to establishing causality through heuristics, such as Koch's postulates for establishing the causal relationship between microbes and disease.

These have been extended to "molecular Koch's postulates" required to show that a gene in a pathogen encodes a product that contributes to the disease caused by the pathogen. The real reason is anybody's guess. But it's very rare to have only a correlation between two variables. Often you also know something about what those variables are and a theory, or theories, suggesting why there might be a causal relationship between the variables.

Despite this strong correlation, it would not be wise to conclude that the success of Facebook has somehow caused the current Greek debt crisis, nor that the Greek debt crisis has caused the adoption of Facebook! The standard scientific answer to this question is that with some caveats we can infer causality from a well designed randomized controlled experiment. And, given that we can find more general procedures for inferring causal relationships, what does causality mean, anyway, for how we reason about a system?

It might seem that the answers to such fundamental questions would have been settled long ago. In fact, they turn out to be surprisingly subtle questions.

Over the past few decades, a group of scientists have developed a theory of causal inference intended to address these and other related questions. This theory can be thought of as an algebra or language for reasoning about cause and effect.

Many elements of the theory have been laid out in a famous book by one of the main contributors to the theory, Judea Pearl. Although the theory of causal inference is not yet fully formed, and is still undergoing development, what has already been accomplished is interesting and worth understanding.

In this post I will describe one small but important part of the theory of causal inference, a causal calculus developed by Pearl.

This causal calculus is a set of three simple but powerful algebraic rules which can be used to make inferences about causal relationships. The post is a little technically detailed at points. However, the first three sections of the post are non-technical, and I hope will be of broad interest. You may find it informative to work through these exercises and problems. Before diving in, one final caveat: I am not an expert on causal inference, nor on statistics.

The reason I wrote this post was to help me internalize the ideas of the causal calculus. Occasionally, one finds a presentation of a technical subject which is beautifully clear and illuminating, a presentation where the author has seen right through the subject, and is able to convey that crystalized understanding to others. Nonetheless, I hope others will find my notes useful, and that experts will speak up to correct any errors or misapprehensions on my part.

Let me start by explaining two example problems to illustrate some of the difficulties we run into when making inferences about causality. You might think that we could conclude from this that being Republican, rather than Democrat, was an important factor in causing someone to vote for the Civil Rights Act.

However, the picture changes if we include an additional factor in the analysis, namely, whether a legislator came from a Northern or Southern state. If we include that extra factor, the situation completely reverses, in both the North and the South. Yes, you read that right: in both the North and the South, a larger fraction of Democrats than Republicans voted for the Act, despite the fact that overall a larger fraction of Republicans than Democrats voted for the Act.

You might wonder how this can possibly be true. You can skip the numbers if you trust my arithmetic. In fact, at the time the House had 94 Democrats, and only 10 Republicans. The numbers above are for the House of Congress. The numbers were different in the Senate, but the same overall phenomenon occurred. If we take a naive causal point of view, this result looks like a paradox. As I said above, the overall voting pattern seems to suggest that being Republican, rather than Democrat, was an important causal factor in voting for the Civil Rights Act.

So two variables which appear correlated can become anticorrelated when another factor is taken into account. You might wonder if results like those we saw in voting on the Civil Rights Act are simply an unusual fluke. But, in fact, this is not that uncommon. In each case, understanding the causal relationships turns out to be much more complex than one might at first think. Imagine you suffer from kidney stones, and your Doctor offers you two choices: treatment A or treatment B.

Your Doctor tells you that the two treatments have been tested in a trial, and treatment A was effective for a higher percentage of patients than treatment B. Keep in mind that this really happened. Suppose you divide patients in the trial up into those with large kidney stones, and those with small kidney stones. Then even though treatment A was effective for a higher overall percentage of patients than treatment B, treatment B was effective for a higher percentage of patients in both groups , i.

I find it more than a little mind-bending that my heuristics about how to behave on the basis of statistical evidence are obviously not just a little wrong, but utterly, horribly wrong. Or, to put it another way, they have not the first clue about statistics. Partial evidence may be worse than no evidence if it leads to an illusion of knowledge, and so to overconfidence and certainty where none is justified.

As a second example of the difficulties in establishing causality, consider the relationship between cigarette smoking and lung cancer. Unfortunately, according to Pearl the evidence in the report was based primarily on correlations between cigarette smoking and lung cancer.

They claimed that there could be a hidden factor — maybe some kind of genetic factor — which caused both lung cancer and people to want to smoke i. If that was true, then while smoking and lung cancer would be correlated, the decision to smoke or not smoke would have no impact on whether you got lung cancer.

Now, you might scoff at this notion. One way of demonstrating this kind of causal connection is to do a randomized, controlled experiment. We suppose there is some experimenter who has the power to intervene with a person, literally forcing them to either smoke or not according to the whim of the experimenter.

The experimenter takes a large group of people, and randomly divides them into two halves. One half are forced to smoke, while the other half are forced not to smoke. By doing this the experimenter can break the relationship between smoking and any hidden factor causing both smoking and lung cancer. By comparing the cancer rates in the group who were forced to smoke to those who were forced not to smoke, it would then be possible determine whether or not there is truly a causal connection between smoking and lung cancer.

In the case of smoking, this kind of experiment would probably be illegal today, and, I suspect, even decades into the past. To help address problems like the two example problems just discussed, Pearl introduced a causal calculus. In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the smoking-cancer connection.

The ideas are causal models covered in this section , causal conditional probabilities , and d-separation , respectively. To understand causal models, consider the following graph of possible causal relationships between smoking, lung cancer, and some unknown hidden factor say, a hidden genetic factor :.

This is a quite general model of causal relationships, in the sense that it includes both the suggestion of the US Surgeon General smoking causes cancer and also the suggestion of the tobacco companies a hidden factor causes both smoking and cancer. Indeed, it also allows a third possibility: that perhaps both smoking and some hidden factor contribute to lung cancer.

This combined relationship could potentially be quite complex: it could be, for example, that smoking alone actually reduces the chance of lung cancer, but the hidden factor increases the chance of lung cancer so much that someone who smokes would, on average, see an increased probability of lung cancer.

But at the very least this is an interesting causal model, since it encompasses both the US Surgeon General and the tobacco company suggestions. Mathematically speaking, what do the arrows of causality in the diagram above mean? It helps to start by moving away from the specific smoking-cancer model to allow a causal model to be based on a more general graph indicating possible causal relationships between a number of variables:.

Each vertex in this causal model has an associated random variable,. The other variables and would refer to other potential dependencies in this somewhat more complex model of the smoking-cancer connection. It should be clear from context which is meant. For the notion of causality to make sense we need to constrain the class of graphs that can be used in a causal model.

At least, not without a time machine. Because of this we constrain the graph to be a directed acyclic graph , meaning a directed graph which has no loops in it. It sounds like a very complicated notion, at least to my ear, when what it means is very simple: a graph with no loops. Our picture so far is that a causal model consists of a directed acyclic graph, whose vertices are labelled by random variables. To complete our definition of causal models we need to capture the allowed relationships between those random variables.

Intuitively, what causality means is that for any particular the only random variables which directly influence the value of are the parents of , i. For instance, in the graph shown below which is the same as the complex graph we saw a little earlier , we have :.

Now, of course, vertices further back in the graph — say, the parents of the parents — could, of course, influence the value of. But it would be indirect, an influence mediated through the parent vertices. Motivated by the above discussion, one way we could define causal influence would be to require that be a function of its parents:. We do this by requiring that be expressible in the form:. The intuition is that the are a collection of auxiliary random variables which inject some extra randomness into and, through , its descendants , but which are otherwise independent of the variables in the causal model.

Summing up, a causal model consists of a directed acyclic graph, , whose vertices are labelled by random variables, , and each is expressible in the form for some function. The are independent of one another, and each is independent of all variables , except when is or a descendant of. In practice, we will not work directly with the functions or the auxiliary random variables.

All the arrows in a causal model indicate are the possibility of a direct causal influence. This results in two caveats on how we think about causality in these models. First, it may be that a child random variable is actually completely independent of the value of one or more of its parent random variables. This is, admittedly, a rather special case, but is perfectly consistent with the definition. For example, in a causal model like.

The second caveat in how we think about the arrows and causality is that the arrows only capture the direct causal influences in the model.

It is possible that in a causal model like. This would be an indirect causal influence, mediated by other random variables, but it would still be a causal influence. The notion of ordinary conditional probabilities is no doubt familiar to you. In such an experiment you really could see if there was a causal influence by looking at what fraction of people who smoked got cancer. But Pearl had what turns out to be a very clever idea: to imagine a hypothetical world in which it really is possible to force someone to for example smoke, or not smoke.

In particular, he introduced a conditional causal probability , which is the conditional probability of cancer in this hypothetical world. Now, at first sight this appears a rather useless thing to do. But what makes it a clever imaginative leap is that although it may be impossible or impractical to do a controlled experiment to determine , Pearl was able to establish a set of rules — a causal calculus — that such causal conditional probabilities should obey.

And, by making use of this causal calculus, it turns out to sometimes be possible to infer the value of probabilities such as , even when a controlled, randomized experiment is impossible. Suppose we have a causal model of some phenomenon:.

Now suppose we introduce an external experimenter who is able to intervene to deliberately set the value of a particular variable to. In other words, the experimenter can override the other causal influences on that variable. This is equivalent to having a new causal model:.

All other parents of are cut off, i. In this case that means the edge from to has been deleted. Note that the edges to the children of are left undisturbed. This model has no vertex explicitly representing the experimenter, but rather the relation is replaced by the relation. We will denote this graph by , indicating the graph in which all edges pointing to have been deleted.

We will call this a perturbed graph , and the corresponding causal model a perturbed causal model. In the perturbed causal model the only change is to delete the edges to , and to replace the relation by the relation.

Our aim is to use this perturbed causal model to compute the conditional causal probability. In this expression, indicates that the term is omitted before the , since the value of is set on the right. By definition, the causal conditional probability is just the value of the probability distribution in the perturbed causal model,.

To compute the value of the probability in the perturbed causal model, note that the probability distribution in the original causal model was given by.

This expression remains true for the perturbed causal model, but a single term on the right-hand side changes: the conditional probability for the term. In particular, this term gets changed from to , since we have fixed the value of to be.

As a result we have:. This equation is a fundamental expression, capturing what it means for an experimenter to intervene to set the value of some particular variable in a causal model. It can easily be generalized to a situation where we partition the variables into two sets, and , where are the variables we suppose have been set by intervention in a possibly hypothetical randomized controlled experiment, and are the remaining variables:.

Note that on the right-hand side the values for are assumed to be given by the appropriate values from and. The expression [1] can be viewed as a definition of causal conditional probabilities. But although this expression is fundamental to understanding the causal calculus, it is not always useful in practice.

The problem is that the values of some of the variables on the right-hand side may not be known, and cannot be determined by experiment. Consider, for example, the case of smoking and cancer.

Recall our causal model:. All is not lost, however. But it does help. Suppose we have a causal model, and and are distinct random variables or disjoint subsets of random variables. Then we say has a causal influence over if there are values and of and of such that. In other words, an external experimenter who can intervene to change the value of can cause a corresponding change in the distribution of values at. The following exercise gives an information-theoretic justification for this definition of causal influence: it shows that an experimenter who can intervene to set can transmit information to if and only if the above condition for causal inference is met.

What does this mean? Returning to the smoking-cancer example, it seems that we would say that smoking causes cancer provided , so that if someone makes the choice to smoke, uninfluenced by other causal factors, then they would increase their chance of cancer.

Intuitively, it seems to me that this notion of events causing one another should be related to the notion of causal influence just defined above. The first problem below suggests a conjecture in this direction:. Clearly, knowing can in general tell us something about in this kind of causal model, and so in this case and are not d-separated.

A useful piece of terminology is to say that a vertex like the middle vertex in this model is a collider for the path from to , meaning a vertex at which both edges along the path are incoming. In this case, it is possible that knowing will tell us something about , because of their common ancestry. By contrast, a path that contains no colliders is called an unblocked path. Note that by the above exercise, an unblocked path must contain either one or no forks.

In general, we define and to be d-connected if there is an unblocked path between them. We define them to be d-separated if there is no such unblocked path. As a result, you can determine d-separation or d-connectdness simply by inspecting the graph.

This fact — that d-separation and d-connectdness are determined by the graph — also holds for the more sophisticated notions of d-separation and d-connectedness we develop below. This is a connection you can optionally develop through the following exercises. So far, this is pretty simple stuff. It gets more complicated, however, when we extend the notion of d-separation to cases where we are conditioning on already knowing the value of one or more random variables in the causal model.

Consider, for example, the graph:. So it makes sense to say that blocks this path from to , even though in the unconditioned case this path would not have been considered blocked. It is helpful to give a name to vertices like the middle vertex in Figure A, i. Using this language, the lesson of the above discussion is that if is in a traverse along a path from to , then the path is blocked.

In this case, knowing will in general give us additional information about , even if we know. This is because while blocks one path from to there is another unblocked path from to.

And so we say that and are d-connected, given. Again, in this example and are d-separated, given. The lesson of this model is that if is located at a fork along a path from to , then the path is blocked. In the unconditioned case this would have been considered a blocked path. He gives the example of a graduate school in music which will admit a student a possibility encoded in the value of if either they have high undergraduate grades encoded in or some other evidence that they are exceptionally gifted at music encoded in.

It would not be surprising if these two attributes were anticorrelated amongst students in the program, e. And so in this case knowledge of exceptional gifts would give us knowledge of likely to have low grades , conditioned on knowledge of they were accepted into the program. Consider, for example, a causal model in which and are independent random bits, or , chosen with equal probabilities. We suppose that , where is addition modulo. This causal model does, indeed, have the structure of Figure B.

But given that we know the value , knowing the value of tells us everything about , since. The immediate lesson from the graph of Figure B is that and can tell us something about one another, given , if there is a path between and where the only collider is at. In fact, the same phenomenon can occur even in this graph:.

To see this, suppose we choose and as in the example just described above, i. We will let the unlabelled vertex be. And, finally, we choose. For two variables, a statistical correlation is measured by the use of a Correlation Coefficient, represented by the symbol r , which is a single number that describes the degree of relationship between two variables.

If the correlation coefficient has a negative value below 0 it indicates a negative relationship between the variables. This means that the variables move in opposite directions ie when one increases the other decreases, or when one decreases the other increases. If the correlation coefficient has a positive value above 0 it indicates a positive relationship between the variables meaning that both variables move in tandem, i. Where the correlation coefficient is 0 this indicates there is no relationship between the variables one variable can remain constant while the other increases or decreases.

While the correlation coefficient is a useful measure, it has its limitations: Correlation coefficients are usually associated with measuring a linear relationship. For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear or straight line relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear , where the correlation coefficient may be closer to 0.

Care is needed when interpreting the value of 'r'. It is possible to find correlations between many variables, however the relationships can be due to other factors and have nothing to do with the two variables being considered. For example, sales of ice creams and the sales of sunscreen can increase and decrease across a year in a systematic manner, but it would be a relationship that would be due to the effects of the season ie hotter weather sees an increase in people wearing sunscreen as well as eating ice cream rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient should not be used to say anything about cause and effect relationship. By examining the value of 'r', we may conclude that two variables are related, but that 'r' value does not tell us if one variable was the cause of the change in the other.

How can causation be established?



0コメント

  • 1000 / 1000