Research diaries #13: statistical DAGs

avatar

After the last adventure in statistics another exploration. Today we will be looking at DAGs.


image.png

DAGs are a systematic way of finding the link between correlation and causation. Suppose we have a phenomenon X and a phenomenon Y. We want to see how X influences Y. So if the relation is such that X directly influences Y to as denoted by X → Y, then everything is simple but normally there are also other variables in play. In the last post we saw an example of this. These other variables can mess around with the link between correlation and causation.

To aid us in finding causation in the correlation we can use DAGs or Directed Acyclic Graphs. Graph here means vertices connected by lines, as these are Directed graphs these lines have direction, so they are arrows, and finally as the graph is acyclic when we follow these arrows we never go round in a circle. This was maybe all a bit technical so let's look at an example.


image.png

We want to see how X influences Y (in a statistics book X is typically referred to as predictor variable and Y outcome variable). There is a direct link between X and Y but there are also another path between X and Y as a result of A:


X ← A → Y

This other paths is problematic because it influences both X and Y so directly measuring the connection between X and Y we don't know how much they are influenced by A.

So how can we remove this influence? The trick here will be to see how the arrows move through the paths and conditioning in a proper way on a variable in that path. Conditioning means that in a sense I am going to fix it. For example if a variable is gender. Then conditioning on that variable means that I will look at male and female separately. Here in the example we can condition on A and this gives a the ability to obtain a causal relation between X and Y.

In real statistical problems there many issues which can add more difficulty to discovering causality. For example certain variables might not be observable in the sense that we know that are certain variable is present and we know how it relate to others but we cannot measure it or the data is so bad that is does not constitute a proper measurement. But still in these case you might be in luck in the sense that you can condition on another variable to close the path as in the example.

References: This post is based on Chapter 6 from McElreath's Statistical rethinking which is a must read. There is also a lovely series of lectures by him on the book


Cat tax

WhatsApp Image 2023-02-18 at 23.27.06.jpg



0
0
0.000
8 comments
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

Then conditioning on that variable means that I will look at male and female separately.

Good luck with that in this political climate :P

I wonder if there's ever an 'X directly causes Y' relation. Seems like some other conditions must be there for anything to happen.

0
0
0.000
avatar

I wonder if there's ever an 'X directly causes Y' relation

It is true there could always be hidden variables that because you didn't account for it they mess up your DAG and consequenlty the causality D:

0
0
0.000
avatar

Congratulations @mathowl! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You distributed more than 21000 upvotes.
Your next target is to reach 22000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

The Hive Gamification Proposal
Support the HiveBuzz project. Vote for our proposal!
0
0
0.000