Linear Probability, Logistic and Choice Model

Tapas Mahanta
4 min readApr 24, 2020

--

The dependent variable is binary and assumes the probability of the dependent variable is a linear function of regressors.

As we model probability using a linear model(straight line), there is a possibility that we can have predictions of greater than 1 or less than zero has for probability (homoscedasticity), which is meaningless. But sometimes the linear probability model is preferred as it is easy to interpret.

To avoid predicting probabilities of >1 and <0, we need a non-linear (sigmoid) model.

So Now our desired Y/Outcome variable stays between 0 and 1, but we lose the simplicity of linear models.

With basic mathematical transformations, we can interpret the Logistic Regression in terms of probability ratios instead of just one probability, using either Equation 1 or 2.

NOTE: Probability ratios can be interpreted as odds.

Interpretation

SAS Provides these two interpretations by default.

exp(Bj) is always positive. it is greater than 1 if Bj is positive and less than 1 if Bj is negative. i.e. if the co-efficient Bj is positive, the ratio of odds will increase by a factor greater than 1, therefore there is a higher probability that “Event 1” occurs. And if Bj is negative, the ratio of odds will increase by a factor less than 1, therefore there is less probability that “Event 1” will occur.

No longer OLS is a valid method for estimation of coefficients. Most systems use the maximum likelihood method.

Latent Utility Theory/Choice Model

For a multinomial dependent variable(with more than two qualitative values), we prefer a more generalized version of the Logistic model, and to study this we need to define the “choice/benefit/utility” of a consumer.

Utility: Consumers chose the product with the highest utility/benefit.

So we say customer choosing an option J, means she gets maximum utility/benefit from buying J out of all other choices, based on this we try to predict the utilities of different brands(or maximum utility) for a new customer.

NOTE: Utility is a relative term and defined within the choices given.

Now that we have defined the Utility(probability of purchasing a brand), we can extend the logistic model to handle more than two choices/brands.

Multinomial Logistic Model is a generalization of the logistic model.

Interpretation of the Choice Model

Because Utility is a relative term, we need to fix a reference by setting the intercept to 0 for comparison between them.

Brand Equity Effect: When everything else is the same, what is the preference of the user, is given by intercept.

Because everything is a low priced product and we do not think price will be a great differentiator. So we chose the same co-efficient for the price for all three brands.

Limitations of the Choice Model

Example

With the introduction of the new Blue Bus company, we would expect the market share of Red Bus to reduce by half to 25% and the market share of the train to remain the same. But the model suggests each market share will be equal at 33.33%.

--

--

No responses yet