The best approach to unlock the value from your datasets - Statistical modelling or machine learning?
Today, there is a huge volume of transport datasets in existence. In addition to data on road casualties, datasets are generated by travel behaviour, opinions, sensors embedded in the environment, use of mobile apps and through surveys.
The datasets are collected independently, and for very different reasons. Each of these datasets plays a vital role in supporting transport system operation, but their use in combination, or for purposes other than the one for which they were originally collected is relatively uncommon. With a rise in connectivity generally, there has been an explosion in both the availability and richness of data. Computing power and bandwidth increases also mean that many datasets are available in real-time.
With a wealth of information comes opportunity – access to big, open and connected data means that we have the ability to solve increasingly complex transport challenges.
In this paper we ask: How best to unlock the value from these datasets? Their value is lost without sound analysis and interpretation, creating a growing demand for data science capabilities.
Predictive models attempt to explain how historical data can be used to predict the future; two of the main approaches are statistical modelling and machine learning. This paper compares both approaches, and for each presents a case study of its high-impact use at TRL, before providing guidance on when to select each approach.