Fitting a logistic regression
Fitting a logistic regression on all the data variables (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free.sulfur.dioxide, total.surfur.dioxide, density, PH, sulphates, alcohol, and good) produces and AIC score of 1679.6, residual deviance of 1655.6 and null deviance of 2209.0. residual deviance represents the recorded error after adding explanatory variables. If the error decrease, the variables are statistically significant in the model.
The bestgml model selection technique selects eight statistically significant variables with respective p values <0.05 to build a better model than the model containing all the predictor variables. (fixed acidity, volatile acidity, citric acid, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, sulphates, and alcohol)
The backward model technique selects eight statistically significant variables with respective p values <0.05 to build a better model than the model containing all the predictor variables. (fixed acidity, volatile acidity, citric acid, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, sulphates, and alcohol)
The forward selection model selection technique selects eight statistically significant variables with respective p values <0.05 to build a better model than the model containing all the predictor variables. (fixed acidity, volatile acidity, citric acid, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, sulphates, and alcohol)
Optimal results from the lasso regression model are obtained at a lambda value of 0.0033. at this point, the model employs only the most important variables in the data.
Optimal ridge regression results are obtained at a lambda value of 0.02173. at this point, only the statistically significant variables are used in the model.