Using Control Variables

category_specifier : "Causal Inference"

Reference Docs: Frisch Waugh Theorem | AB Testing Framework With Regression | Omitted Variable Bias | Standard Error | Linear Regression and Coefficient

Motivation

control_variables_summary

💡What should we do if, significance is low for A/B test?

Adding control variable is one of three ways to help improving precision of AB Testing Framework With Regression (Lowering SE of regression coefficient estimate \(\beta\))
Choosing 50:50 for probability of control, treatment to maximize \(Var(X)\)
Choosing right sample size to lower Standard Error
Adding control variables in AB test

If you don’t, or can’t run random experiment, you cannot make causal interpretation in observational data
How can we make the best causal claim from observational data, if you cannot run experiment?
Adding control variables can isolate random variation, and remove ‘non random’ part of each variable. This can control some extent of engogeneity.
Adding control variable can help controlling Omitted Variable Bias

\[ Target = \beta_0 + \beta_1 * treatment + e \]

\[ Target = \gamma_0 + \gamma_1 * treatment + \gamma_2X_2 + \gamma_3X_3 + ... + u \]

\(Target\) could be target variable as sales, while \(X_2, X_3 ...\) could be all other available variables (data we have: e.g. )

Here, \(Var(\beta_1) \geq Var(\gamma_1)\) always holds, which means adding up control variable can reduce error (improve precision) of the A/B test coefficients.
CAUTION: For randomized AB test, we are not adding control variables for Omitted Variable Bias - there is no OVB if we have randomized AB test.

\[ y =\beta_0 + \beta_1 * treatment+ e \]

You don’t have AB test results, but only observational data which is likely to have Omitted Variable Bias), because assignment of treatment is not random for each of the variables.
Here, \(\beta_1\) represents effect of the treatment, while \(e\) contains all the variation not explained by the treatment.

\[ y =\beta_0' + \beta_1' * treatment + \beta_2X_2 +e' \]

By adding omitted variable \(X_2\) as control, \(\beta_1'\) will be less biased,

removing bias caused by relationship between treatment and \(X_2.\)

Since \(OVB = \beta_2 * \frac{Cov(treatment, X_2)}{Var(X_2)}\) (Omitted Variable Bias)), if \(Cov(treatment, X_2)\) is not 0, \(\beta_1'\) will change.
Since treatment is not randomly assigned regarding \(X_2\), \(Cov(treatment, X_2)\) is not likely to be 0

\(\tilde{X_1}\) represents the portion of \(X_1\)(treatment) that is not correlated with other variables.
This allows us to estimate the true random treatment effect more accurately.
By adding a control variable, we isolate the random effect of \(X_1\)removing confounding influences.
Specifically, controlling for \(X_2\) ensures that the variation in \(X_1\) used for estimation is uncorrelated with \(X_2\) , leading to an unbiased estimate of the treatment effect.