Title
Generalized Linear Model for Automobile Fatality Rate Predication in R
Document Type
Book Chapter
Keywords
variable grouping; GIGO; 80/20; interavtion term; k-fold cross validation; 75/25 validation; holdout damply; backward selection; forward selection; calendar year validation; stepwise variable selection
Identifier Data
https://doi.org/10.4018/978-1-7998-2768-9.ch005
Publisher
IGI Global
Publication Source
Open Source Software for Statistical Analysis of Big Data: Emerging Research and Opportunities
Abstract
This chapter demonstrates the descriptive and statistical modeling function in R. The automobile fatal accident data of the United States is extracted from the Fatality Analysis Reporting System (FARS). The model will be used to understand significant contributing factors of automobile accident death when a fatal crash happens. First, descriptive analysis is performed by basic R functions and packages. Then, generalized linear model (GLM) with logit link function is explored and constructed. Finally, multiple validation metrics are introduced and calculated to ensure the reasonability and accuracy of the predictions. The focus of this chapter is to demonstrate the power and flexibility of the most popular Open Source Statistical Software (OSSS) through a real data analysis.
Comments
Is part of Open Source Software for Statistical Analysis of Big Data