Generalized Linear Model for Automobile Fatality Rate Predication in R

Document Type

Book Chapter


Is part of Open Source Software for Statistical Analysis of Big Data


variable grouping; GIGO; 80/20; interavtion term; k-fold cross validation; 75/25 validation; holdout damply; backward selection; forward selection; calendar year validation; stepwise variable selection

Identifier Data



IGI Global

Publication Source

Open Source Software for Statistical Analysis of Big Data: Emerging Research and Opportunities


This chapter demonstrates the descriptive and statistical modeling function in R. The automobile fatal accident data of the United States is extracted from the Fatality Analysis Reporting System (FARS). The model will be used to understand significant contributing factors of automobile accident death when a fatal crash happens. First, descriptive analysis is performed by basic R functions and packages. Then, generalized linear model (GLM) with logit link function is explored and constructed. Finally, multiple validation metrics are introduced and calculated to ensure the reasonability and accuracy of the predictions. The focus of this chapter is to demonstrate the power and flexibility of the most popular Open Source Statistical Software (OSSS) through a real data analysis.

This document is currently not available here.