-[[!meta date="2017-10-10 16:56:56 +0800"]]
-[[!tag R analysis]]
-
-Road fatalities in Australia
-----------------------------
-
-Recently inspired to doing a little analysis again, I landed on a
-dataset from
-<https://bitre.gov.au/statistics/safety/fatal_road_crash_database.aspx>,
-which I downloaded on 5 Oct 2017. Having open datasets for data is a
-great example of how governments are moving with the times!
-
-Trends
-------
-
-I started by looking at the trends - what is the approximate number of
-road fatalities a year, and how is it evolving over time? Are there any
-differences noticeable between states? Or by gender?
-
-[[Overall trend line|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-1.png]][[Trend lines by Australian state|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-2.png]][[Trend lines by gender|/pics/explore-AU-road-fatalities_files/fatalitiesTrends-3.png]]
-
-What age group is most at risk in city traffic?
------------------------------------------------
-
-Next, I wondered if there were any particular ages that were more at
-risk in city traffic. I opted to quickly bin the data to produce a
-histogram.
-
- fatalities %>%
- filter(Year != 2017, Speed_Limit <= 50) %>%
- ggplot(aes(x=Age))+
- geom_histogram(binwidth = 5) +
- labs(title = "Australian road fatalities by age group",
- y = "Fatalities") +
- theme_economist()
-
- ## Warning: Removed 2 rows containing non-finite values (stat_bin).
-
-[[histogram|/pics/explore-AU-road-fatalities_files/fatalities.cityTraffic-1.png]]
-
-Hypothesis
-----------
-
-Based on the above, I wondered - are people above 65 more likely to die
-in slow traffic areas? To make this a bit easier, I added two variables
-to the dataset - one splitting people in younger and older than 65, and
-one based on the speed limit in the area of the crash being under or
-above 50 km per hour - city traffic or faster in Australia.
-
- fatalities.pensioners <- fatalities %>%
- filter(Speed_Limit <= 110) %>% # less than 2% has this - determine why
- mutate(Pensioner = if_else(Age >= 65, TRUE, FALSE)) %>%
- mutate(Slow_Traffic = ifelse(Speed_Limit <= 50, TRUE, FALSE)) %>%
- filter(!is.na(Pensioner))
-
-To answer the question, I produce a density plot and a boxplot.
-
-[[density plot|/pics/explore-AU-road-fatalities_files/fatalitiesSegmentation-1.png]][[box plot|/pics/explore-AU-road-fatalities_files/fatalitiesSegmentation-2.png]]
-
-Some further statistical analysis does confirm the hypothesis!
-
- # Build a contingency table and perform prop test
- cont.table <- table(select(fatalities.pensioners, Slow_Traffic, Pensioner))
- cont.table
-
- ## Pensioner
- ## Slow_Traffic FALSE TRUE
- ## FALSE 36706 7245
- ## TRUE 1985 690
-
- prop.test(cont.table)
-
- ##
- ## 2-sample test for equality of proportions with continuity
- ## correction
- ##
- ## data: cont.table
- ## X-squared = 154.11, df = 1, p-value < 2.2e-16
- ## alternative hypothesis: two.sided
- ## 95 percent confidence interval:
- ## 0.07596463 0.11023789
- ## sample estimates:
- ## prop 1 prop 2
- ## 0.8351573 0.7420561
-
- # Alternative approach to using prop test
- pensioners <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE, Pensioner == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE, Pensioner == TRUE)))
- everyone <- c(nrow(filter(fatalities.pensioners, Slow_Traffic == TRUE)), nrow(filter(fatalities.pensioners, Slow_Traffic == FALSE)))
- prop.test(pensioners,everyone)
-
- ##
- ## 2-sample test for equality of proportions with continuity
- ## correction
- ##
- ## data: pensioners out of everyone
- ## X-squared = 154.11, df = 1, p-value < 2.2e-16
- ## alternative hypothesis: two.sided
- ## 95 percent confidence interval:
- ## 0.07596463 0.11023789
- ## sample estimates:
- ## prop 1 prop 2
- ## 0.2579439 0.1648427
-
-Conclusion
-----------
-
-It's possible to conclude older people are over-represented in the
-fatalities in lower speed zones. Further ideas for investigation are
-understanding the impact of the driving age limit on the fatalities -
-the position in the car of the fatalities (driver or passenger) was not
-yet considered in this quick look at the contents of the dataset.
-
-[[quantile-quantile plot|/pics/explore-AU-road-fatalities_files/fatalitiesDistComp-1.png]]