Regression with Binary Outcomes

About this App

An interactive app to illustrate logistic and probit regression models. This app is in an early stage of development. If you encounter any issues or bugs, or have suggestions for improvement, please feel free to contact me via email.

Transparency Statement regarding the use of AI tools

This app was developed with the assistance of AI tools, specifically ChatGPT (version 5.1) and GitHub Copilot. These tools were used to generate formulas and code snippets, support debugging, and provide suggestions for improving the app’s functionality.

Binary Outcomes

We define the binary outcome variable \(Y\) as

\[ Y_i = \begin{cases} 1, & \text{if the outcome occurs} \\ 0, & \text{if the outcome does not occur} \end{cases} \tag{1}\]

Examples include: success vs. failure, yes vs. no, transition vs. no transition, click vs. no click, and similar binary events.

Key Terms and Formulas used in Logistic and Probit Regression

Term	Formula	Example
Conditional Probability	\[P(Y = 1 \mid X) \tag{2}\]	\(P(Y=1 \mid X = 1) = 0.75 \;\;\)
Odds	\[\dfrac{P(Y = 1 \mid X)}{P(Y = 0 \mid X)} \;=\; \dfrac{P(Y = 1 \mid X)}{1 - P(Y = 1 \mid X)} \tag{3}\]	\(\text{Odds} = \frac{0.75}{0.25} = 3 \;\;\)
Logit / log-odds	\[\log\!\left(\dfrac{P(Y = 1 \| X)}{P(Y = 0 \| X)}\right) \;=\; \log\!\left(\dfrac{P(Y = 1 \| X)}{1 - P(Y = 1 \| X)}\right) \tag{4}\]	\(\text{logit} = \log(\frac{0.75}{0.25}) \approx 1.10 \;\;\)
Regression Coefficient	\[\beta = \log\!\left(\dfrac{P(Y = 1 \| X+1)}{1 - P(Y = 1 \| X+1)}\right) - \log\!\left(\dfrac{P(Y = 1 \| X)}{1 - P(Y = 1 \| X)}\right) \tag{5}\]	\(\beta = 1.10 - 0.405 = 0.695 \;\;\)
Odds Ratio (OR)	\[\text{OR} = \frac{\dfrac{P(Y = 1 \mid X+1)}{P(Y = 0 \mid X+1)}}{\dfrac{P(Y = 1 \mid X)}{P(Y = 0 \mid X)}} = e^{\beta_1} \tag{6}\]	\(\text{OR} = \frac{3}{1.5} = e^{0.69} \approx 2.00 \;\;\)
Linear Predictor	\[\eta = \beta_0 + \beta_1 X_1 + \ldots + \beta_k X_k \tag{7}\]	\(\eta = 0 + 0.695 \cdot 1 = 0.695\)

Model Equation with \(k\) Predictor

Let the linear predictor be defined as in Equation 7:

\[ \eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]

Euler’s number: \(e \approx 2.71828\)

We model the conditional probability of a binary outcome variable \(Y = 1\) given predictor variables \(X_1, \ldots, X_k\) using the logistic function:

\[ P(Y = 1 \mid X_1, \ldots, X_k) = \frac{e^{\eta}}{1 + e^{\eta}} \tag{8}\]

Taking the logarithm of the odds (see Equation 3) gives a linear relationship:

\[ \log\left( \dfrac{P(Y = 1 \mid X_1, \ldots, X_k)} {1 - P(Y = 1 \mid X_1, \ldots, X_k)} \right) = \eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \tag{9}\]

We now model the conditional probability of \(Y = 1\) using the standard normal cumulative distribution function \(\Phi(\cdot)\):

\[ P(Y = 1 \mid X_1, \ldots, X_k) = \Phi(\eta) \tag{10}\]

Under the latent-variable view, probit regression assumes

\[ Y_i^* = \eta_i + \varepsilon_i = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_k X_{ik} + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal N(0,1) \]

and the observed binary outcome is generated by a threshold at zero:

\[ Y_i = \begin{cases} 1, & \text{if } Y_i^* > 0, \\ 0, & \text{otherwise.} \end{cases} \tag{11}\]

Difference between Logistic and Probit Regression Models

The logistic and probit models differ in how the linear predictor \(\eta\) is mapped to the probability scale (i.e., to values between 0 and 1).

Regression with Binary Outcomes

Binary Outcomes

Key Terms and Formulas used in Logistic and Probit Regression

Model Equation with \(k\) Predictor

Interactive Playground

Inputs for Data Generation