Prediction of passenger load on busses in Oslo using data from Automatic Data Collection-systems
MetadataVis full innførsel
- Master's theses (RealTek) 
Public transport is key to reducing the usage of private vehicles, and by extension carbon emission in urban areas. Ruter is responsible planning and coordinating public transport in Oslo. Through different Automatic Data Collection-systems (ADC-systems) they have access to data about the performance of all vehicles in operation. In this thesis we explore the possibility of using data from Automatic Vehicle Location- and Automatic Passenger Counting-systems in order to predict passenger load on busses in Oslo. Predictions of load can be used by passengers when planning a trip, who may choose a departure where the predicted load is lower. This can serve a dual purpose, giving the passenger a more pleasant trip, but also reducing the pressure on public transport by encouraging a better distribution of the load. Predictions of load can also be used by those monitoring public transport, helping inform decisions when trying to resolve incidents affecting public transport. Two operation situations are explored in this thesis, one where predictions are only based on plan-data, and one where real-time location-data is included. For the first operation situations the model with best performance yielded a mean absolute error (MAE) in predicted passenger load of 7.10, providing a reasonable prediction of load when no major delays or other factors were affecting the flow of traffic. Models developed for the second operation situation managed to account for differing passenger behaviour caused by deviations in planned trips. The best performing model in this situation had a MAE of 6.26. ADC-systems for public transport are complex systems with many potential sources of error. Emphasis it therefor put on how to prepare data for analysis. A machine learning method, isolation forest, is used for automatic detection of trips with erroneous data. This method is compared to manual screening based on observed fallacies on the data, with the result that model performance were slightly better when models were trained on data screened using isolation forest.