Regression with Binary Outcomes
This app was developed with the assistance of AI tools, specifically ChatGPT (version 5.1) and GitHub Copilot. These tools were used to generate formulas and code snippets, support debugging, and provide suggestions for improving the app’s functionality.
Binary Outcomes
We define the binary outcome variable \(Y\) as
\[ Y_i = \begin{cases} 1, & \text{if the outcome occurs} \\ 0, & \text{if the outcome does not occur} \end{cases} \tag{1}\]
Examples include: success vs. failure, yes vs. no, transition vs. no transition, click vs. no click, and similar binary events.
Key Terms and Formulas used in Logistic and Probit Regression
| Term | Formula | Example | Help |
|---|---|---|---|
| Conditional Probability | \[P(Y = 1 \mid X) \tag{2}\] | \(P(Y=1 \mid X = 1) = 0.75 \;\;\) | |
| Odds | \[\dfrac{P(Y = 1 \mid X)}{P(Y = 0 \mid X)} \;=\; \dfrac{P(Y = 1 \mid X)}{1 - P(Y = 1 \mid X)} \tag{3}\] | \(\text{Odds} = \frac{0.75}{0.25} = 3 \;\;\) | |
| Logit / log-odds | \[\log\!\left(\dfrac{P(Y = 1 | X)}{P(Y = 0 | X)}\right) \;=\; \log\!\left(\dfrac{P(Y = 1 | X)}{1 - P(Y = 1 | X)}\right) \tag{4}\] | \(\text{logit} = \log(\frac{0.75}{0.25}) \approx 1.10 \;\;\) | |
| Regression Coefficient | \[\beta = \log\!\left(\dfrac{P(Y = 1 | X+1)}{1 - P(Y = 1 | X+1)}\right) - \log\!\left(\dfrac{P(Y = 1 | X)}{1 - P(Y = 1 | X)}\right) \tag{5}\] | \(\beta = 1.10 - 0.405 = 0.695 \;\;\) | |
| Odds Ratio (OR) | \[\text{OR} = \frac{\dfrac{P(Y = 1 \mid X+1)}{P(Y = 0 \mid X+1)}}{\dfrac{P(Y = 1 \mid X)}{P(Y = 0 \mid X)}} = e^{\beta_1} \tag{6}\] | \(\text{OR} = \frac{3}{1.5} = e^{0.69} \approx 2.00 \;\;\) | |
| Linear Predictor | \[\eta = \beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k \tag{7}\] | \(\eta = 0 + 0.695 \cdot 1 = 0.695\) |
Model Equation with \(k\) Predictor
Let the linear predictor be defined as in Equation 7:
\[ \eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]
Euler’s number: \(e \approx 2.71828\)
We model the conditional probability of a binary outcome variable \(Y = 1\) given predictor variables \(X_1, \ldots, X_k\) using the logistic function:
\[ P(Y = 1 \mid X_1, \ldots, X_k) = \frac{e^{\eta}}{1 + e^{\eta}} \tag{8}\]
Taking the logarithm of the odds (see Equation 3) gives a linear relationship:
\[ \log\left( \dfrac{P(Y = 1 \mid X_1, \ldots, X_k)} {1 - P(Y = 1 \mid X_1, \ldots, X_k)} \right) = \eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \tag{9}\]
We now model the conditional probability of \(Y = 1\) using the standard normal cumulative distribution function \(\Phi(\cdot)\):
\[ P(Y = 1 \mid X_1, \ldots, X_k) = \Phi(\eta) \tag{10}\]
Under the latent-variable view, probit regression assumes
\[ Y_i^* = \eta_i + \varepsilon_i = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_k X_{ik} + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal N(0,1) \]
and the observed binary outcome is generated by a threshold at zero:
\[ Y_i = \begin{cases} 1, & \text{if } Y_i^* > 0, \\ 0, & \text{otherwise.} \end{cases} \tag{11}\]
The logistic and probit models differ in how the linear predictor \(\eta\) is mapped to the probability scale (i.e., to values between 0 and 1).
Interactive Playground
Inputs for Data Generation
This directed acyclic graph (DAG) represents the data generating process which can be modified via the sliders in the data generation sidebar.
Importantly, the coefficients estimated in the SEM are identical to those from a GLM with a probit link, except that the threshold parameter must be multiplied by −1.
In addition, logit coefficients can be approximately converted into probit coefficients and vice versa. The conversion factors are:
\[ \beta_{\text{probit}} \approx \frac{\beta_{\text{logit}}}{1.6} \]
\[ \beta_{\text{logit}} \approx \beta_{\text{probit}} \times 1.6 \]
Adapting x-axis is comming soon…