Finding the correlation between Socioeconomic variables and the number of COVID-19 cases

We have evaluated the correlation between several Socioeconomic data and number of approved COVID-19 cases and tried to predict the number of cases in each state using a machine learning Method. Different machine learning methods were tried, and Random Forest was chosen as the best model. 21 Socio Economic indices were retrieved and based on MRMR(Maximum Relation Minimum Redundancy) feature selection method, 9 of these indices were chosen for modeling. The data used are: percent of population with NO high school Diploma, percent of population with Limited English, percent of population institutionalized, percent of population Not Insured, percent of population Over age of 65, percent of population Under age of 17, percent of Minority, percent of population with no Vehicle and Number Housing Structure per person. 30 States were used to train the Model at each time and the rest of the states were used to evaluate the model accuracy. As it can be seen in the figures, at the first time steps the model accuracy was low but as the time moves forward and the number of cases increase, the model accuracy increases, indicating a stronger relation between the predictors and the number of cases.

 

 

total_cases