Generalized Linear Model for Automobile Fatality Rate Predication in R
variable grouping; GIGO; 80/20; interavtion term; k-fold cross validation; 75/25 validation; holdout damply; backward selection; forward selection; calendar year validation; stepwise variable selection
Open Source Software for Statistical Analysis of Big Data: Emerging Research and Opportunities
This chapter demonstrates the descriptive and statistical modeling function in R. The automobile fatal accident data of the United States is extracted from the Fatality Analysis Reporting System (FARS). The model will be used to understand significant contributing factors of automobile accident death when a fatal crash happens. First, descriptive analysis is performed by basic R functions and packages. Then, generalized linear model (GLM) with logit link function is explored and constructed. Finally, multiple validation metrics are introduced and calculated to ensure the reasonability and accuracy of the predictions. The focus of this chapter is to demonstrate the power and flexibility of the most popular Open Source Statistical Software (OSSS) through a real data analysis.