AGORA hurricane prediction markets settled

The 2020 hurricane season was one of the most active ever seen. There were 30 named storms, beating the 2005 record of 28. Of the 30 storms, 13 achieved hurricane strength and 6 of those made…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




The Problem of Confounding bias in AI

Using Interventions with Backdoor Criterion to mitigate it (Conceptual framework)

After reading the book ‘The Book of Why’ by AI Researcher Judea Pearl a sudden curiosity about the non-trivial challenge of incorporating causal reasoning into AI systems gripped me , hence I have decided to write series of articles on the key concepts the book is trying to educate us with. Since we all know that field of AI is a cohort of many sub fields, therefore in this write up my frame of reference for AI would be the area of Natural Language Processing (NLP) as it happens to my area of interest.

I also believe that its worth pondering that AI model reliability and fairness shall evaluated using non conventional methods of which one yet very important is causal inference.

What do you mean by confounding bias in causality ?

Confounding bias can be referred to an instance where a variable Z exerts an undue influence over both treatment variable X and outcome variable Y. This is a highly undesirable occurrence during any statistical experiment or when analyzing how a particular treatment directly affects the target variable, as the confounding variable obscures the evaluation of the outcome of applying the treatment.

This concept would be more intuitively understood by an illustration. Say a fast food restaurant chain company wants to assess how the customers rate their food quality in furtherance of quality improvement .We are assuming that the company has a mobile app for online orders & dropping food reviews and their rating scale is binary (Happy or Unhappy) to make analysis less ambiguous. The NLP module has an embedded Named Entity Recognition system where the language model can detect the food items mentioned and map them to the items they offer on their menu.

In order for a Deep Learning sentiment classifier to predict if the customer is happy or not by the food it would be first trained on an word embedding ,thereby for each review for a unique customer id the classifier would look for words like “Delicious”, “Exciting”, “Amazing” , essentially every word that corresponds to a happy sentiment. However there could be certain instances where the mere presence of positive words could not translate into customers actually being happy with the food alone which is what we are aiming to know.

Delving Deeper…

If we critically think about the above statement there could be some extraneous factors like weather, ambience or personal reasons which are enticing the customer enjoy the meal regardless of the quality or taste.

In the above diagram we can see that Ambience, weather and personal reasons are acting as confounder’s which not only perturbs the evaluation of the direct effect on food but also makes the estimate biased. Now framing this problem in the language of causality

What we see = P(Y| Xf, Xa, Xw, Xp)

What we want = P(Y| do(Xf))

Here ‘do’ is the intervention used to measure the Average Treatment Effect (ATE) of food quality Xf on the sentiment for a realistic evaluation of sentiments for what we have as True Positives(TP) could in actual reality be False Positives(FP). The ATE Mentioned above is formulated as

ATE=[E(Y1)] -[ E(Y0)]

Y1i= Customer i after treatment

Y0i= Customer i after not getting treatment

Eliminating confounders is the actual treatment and not getting treatment would mean deconfounding is not done. In reality its very tough to get the true ATE because every treatment has at its core a hypothesis which is itself not entirely verified.

Now all of this might be counter intuitive those of us who customarily use LSTMs and BERT or their respective variants for text sentiment classification but seldom wonder if the customer is actually happy with the main product. Therefore disambiguating the true effect of the food is what causal techniques help us accomplish through de-confounding.

Key Underpinnings for performing causal modelling

Interventions are treatments applied to the independent variable whose true effect on the target variable we are trying to isolate. This approach thereby manifests a portal to a parallel Universe, where in we can assess the counterfactual counterpart of the outcome we actually got, However we will discuss counterfactual evaluation in a future article.

You might now wonder why go through all this laborious process to find out what we think could be the true effect ?

Well the point being, its important to develop new hypothesis which enables us to question the measured effects of the input features so as to assess them in a critical manner by applying treatments. So we are assuming that there could be confounders in the data set, below is a dummy data set I prepared for your intuitive understanding.

Representing the text data in a graphical manner at every customer id level not only bolsters the inference of causal models but also aids in model explainability. Structural Causal Models are built on Directed Acyclic Graphs (DAG)to explain causation among the elements in the graph and its sub-graphs. However preprocessing is a very important stage even for graph representations, so every review of a unique customer will be split into sentences and those sentences into words. Words will be nodes and bi-grams would be edges, stopwords and punctuation's shall be removed.

Using graphs enables the cultivation of important local contextual information which even Attention-based pre-trained models likes of BERT lag in enabling.

Image taken from journal of clinical epidemiology

This image is just an example of DAG for your understanding. Now since our primary emphasis is deconfounding so i wont go too much deep into NLP or Graphs, just enough to understand the Backdoor criterion invented by Judea Pearl. I am not giving any code as of yet but will be sure to publish an article soon on the same topic with a coding illustration.

Interventions & The Backdoor Criterion

An important precursor to applying Intervention and using the backdoor criterion is ensuring we have sufficient data on the confounding variables. So a simple solution to verify if we have enough data would be extracting confounding variables from text by writing a search program using any NLP library which can identify and store in separate columns the words corresponding to treatment variable and the confounding variable. example of two types of words would be

Treatment(X) : Tasty,Delicious,Greasy,Gross,Consistency,soggy,crisp

Confounding(Z): mood,Good day,service,climate,Celebration time,sad

Note: Since we are dealing with text data the features in case of both the variables could single words or even n-grams.

Now using the do() operator we will intervene on X such that the data flow from Z to X would be completely blocked, as in every non causal path between X and Y is blocked. In a Nutshell the backdoor criterion seals any path from X to Y that starts with an arrow pointing to X ,until X and Y are completely deconfounded. Statistically speaking we control for Variables Z so that further on no spurious correlations are induced, however we must be cautious that no member of Z is a descendant of X on a causal path as it might risk closing it off. All these operations are taking place on DAG therefore it would help improve your understanding of controlling by going through concepts like colliders, forks and chains.

Important clarifications…

Intervention and Backdoor criterion are not two entirely separate things , because when we intervene we can use various other techniques among which one is the backdoor criterion. Moreover Interventions are also used for other criteria such as counterfactual evaluation of outcomes and even along with mediation analysis when effect of the variable occurs through some mediator or other similar mechanism.

The end output we desire is the true sentiment of the customer based on solely the text features that correspond to food quality.

Ultimately our choice of what causal tools to use is guided by the problem statement we are trying to solve , how the data is collected & organized and whether we have data for certain variables. All steps described above can be carried out using relevant software libraries in python 3.

Conclusion

The important takeaway I am willing to wager is that causality in general even though its based on highly sophisticated, intriguing and intricate mathematics is more about thought experimentation inasmuch it entails us to think in an abstract manner. Judea Pearl the AI scientist who revived the research area of causal inference maintains that retrospective thinking in terms of alternate scenarios gives us true perspective on what affects what, through what and what does not. This principle has been non negligibly fostered through out the article.

I would admit that the causal conceptual things explained in terms of NLP might not have entirely appeared unambiguous since causal NLP research has not yet gained good momentum hence it is still in an infant stage. Nevertheless it has a potential to scale wide horizons .

References

Add a comment

Related posts:

RNDR Implements New Layer 2 Payment System

The cryptocurrency market has witnessed unprecedented growth in yield farming, on-chain lending, staking, pooling, DEXs and other DeFi applications over the past year. According to Cointelegraph…

Top 10 Cryptocurrencies To Invest In Right Now

Today we will present you a Portfolio of 10 Cryptocurrencies which we consider as the best investments in the current bear market. Those encompass cryptos in GameFi, The Metaverse, Exchange Tokens…

3 Best Oral Steroids and Their Benefits

Many people are unsure which oral steroid is better for them, so we’ve put together a list of 3 popular oral steroids and their benefits to help you choose the right steroid. Anavar Anavar is a…