## 5.1 Assumptions for logistic regression:

• The response variable $$Y$$ is a binomial random variable with a single trial and success probability $$\pi$$. Thus, $$Y=1$$ corresponds to “success” and occurs with probability $$\pi$$, and $$Y=0$$ corresponds to “failure” and occurs with probability $$1-\pi$$.
• The set of predictor or explanatory variables $$x=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$$ are fixed (not random) and can be discrete, continuous, or a combination of both. As with classical regression, two or more of these may be indicator variables to model the nominal categories of a single predictor, and others may represent interactions between two or more explanatory variables.
• Together, the data is collected for the $$i$$ th individual in the vector $$\left(x_{1 i}, \ldots, x_{k i}, Y_{i}\right)$$, for $$i=1, \ldots n$$. These are assumed independent by the sampling mechanism. This also allows us to combine or group the data, which we do below, by summing over trials for which $$\pi$$ is constant. In this section of the notes, we focus on a single explanatory variable $$x$$.

The model is expressed as $\log \left(\frac{\pi_{i}}{1-\pi_{i}}\right)=\beta_{0}+\beta_{1} x_{i}$ Or, by solving for $$\pi_{i}$$, we have the equivalent expression $\pi_{i}=\frac{\exp \left(\beta_{0}+\beta_{1} x_{i}\right)}{1+\exp \left(\beta_{0}+\beta_{1} x_{i}\right)}$ To estimate the parameters, we substitute this expression for $$\pi_{i}$$ into the joint pdf for $$Y_{1}, \ldots, Y_{n}$$ $\prod_{i=1}^{n} \pi_{i}^{y_{i}}\left(1-\pi_{i}\right)^{1-y_{j}}$ to give us the likelihood function $$L\left(\beta_{0}, \beta_{1}\right)$$ of the regression parameters. By maximizing this likelihood over all possible $$\beta_{0}$$ and $$\beta_{1}$$, we have the maximum likelihood estimates (MLEs): $$\hat{\beta}_{0}$$ and $$\hat{\beta}_{1}$$. Extending this to include additional explanatory variables is straightforward.

Binary Logistic Regression

Binary logistic regression models how the odds of “success” for a binary response variable $$Y$$ depend on a set of explanatory variables: $\operatorname{logit}\left(\pi_{i}\right)=\log \left(\frac{\pi_{i}}{1-\pi_{i}}\right)=\beta_{0}+\beta_{1} x_{i}$ - Random component - The distribution of the response variable is assumed to be binomial with a single trial and success probability $$E(Y)=\pi$$. - Systematic component - $$x$$ is the explanatory variable (can be continuous or discrete) and is linear in the parameters. As with the above example, this can be extended to multiple variables of non-linear transformations. - Link function - the log-odds or logit link, $$\eta=g(\pi)=\log \left(\frac{\pi_{i}}{1-\pi_{i}}\right)$$, is used.