By Alex Levering
The road network is an active environment which is continuously affected by inci- dents and disruptions, resulting in delays and economical damage. As car owner- ship and road transport increases, so too does the pressure on the road network. This results in an increased impact of incidents and disruptions as more individu- als and businesses are affected by it. Meanwhile, a recent increase in interest in Autonomous Vehicles (AVs) offers opportunities to lessen the impact of incidents. AVs require up-to-date information on the road network in order to make routing decisions, which promotes the implementation of connected traffic data ecosystems. Such ecosystems allow vehicles to communicate with one another, or with the infras- tructure at large to rapidly disseminate information across the grid. In the context of incidents, this means that all connected vehicles can be informed of road closures as soon as they are detected. However, there exists a gap in the literature on inci- dent detection from AVs as a domain. While research considers individual incidents in specific circumstances, no existing research has attempted to classify incidents as a domain or as groupings. This holds true for data on incidents as seen from vehi- cles as well. As such, a study on incident detection from vehicles that considers the breadth of all possible incidents is needed for the detection of incidents for the pur- poses of disseminating them between vehicles faster, and thus to lessen the impact of incidents on the road network towards the future. In this thesis we assessed the use of Convolutional Neural Networks (CNNs) to classify unsigned physical (non- placarded, tangible) incidents from street images. We do this by firstly gathering a dataset of images, and secondly by training a CNN to distinguish between images containing unsigned physical incidents and images without such incidents.
Applicable incident classes were determined by a grouping study which made use of a Formal Concept Analysis which resulted in a taxonomy of incidents. In total we then targeted 8 classes: Vehicle crash, Road Collapse, Fire, Animal on Road, Treefall, Snowy Road, Flooded Road, and Landslides, as well as negatives (images of normal driving conditions). We first collected 7,759 images of incidents by web harvest- ing from Google, Flickr, and Bing, as well as images supplied by the Geograph UK project. As searching depth-wise (i.e. returning hundreds of images each query) returned poor results on first experimentation, we decided to perform breadthwise querying by searching for combination pairs between synonyms of various concepts. For instance, query pairs between street, road and landslide, rockslide yields 4 possible query pairs. 40,063 images have been collected after 118 queries, of which 5,844 images have been included in the final dataset. Additionally, we have submitted queries in various non-English languages to expand the dataset further. We have searched for images using Dutch, Farsi, Mandarin, Croatian, and Slovak by asking colleagues to supply the most effective queries in their own language. In total, we collected 12,630 images over 63 queries, of which 1,641 were included in the final dataset. 5,145 images from the Geograph project were included. Selection of suitable images was done manually by the author to rigorously control the quality of the input images.
After selection of the positive examples, each class is comprised of the following amount of images: [summary of image numbers]. We aggregated a true-negatives dataset of 40,000 images by combining images from Berkeley Deep Drive (20,000), Cityscapes (10,000), and Geograph tagged with road transportation (10,000). We also retain 200 negative boundary cases of the class snow during the cleaning of Geograph images to help determine whether the model has the correct visual cues. We distribute this dataset into training, validation, and testing splits containing 70/20/10% of all the images respectively. We create a second dataset to test the sensitivity of unsigned physical incident detection to unseen data from different geographical regions by training a second model. We use images supplied by the Geograph project and distribute them into a 72.5/22.5/5% training, validation, and testing split based on the geotags supplied with the images. The training and validation splits contain images from England, Ireland, and Scotland, with the region of Wales being used for the testing split.
Incident detection was performed by training a CNN with the ResNet-34 architecture which performs multiclass-classification over the 8 target classes and the negatives class. The best model achieved a top-1 accuracy of 97.15% and an average unweighted F1-score of 0.8909. We trained and evaluated a second ResNet-34 model for the geographically stratified dataset. The resulting top-1 accuracy for this experiment was 92.9% during testing with an average unweighted F1-score of 0.9169. Assessment of the fully-connected layer of the ResNet-34 model using t- SNE clustering reveals that the model is easily able to tell classes apart. Assessment further revealed that there exists a notable overlap between negative and positive images gathered from the Geograph platform. The results of this thesis indicate that unsigned incidents as a domain can be learned very well. Further research should expand the gathered dataset, consider more incident classes, improve the generated models, perform rigorous bias testing, and experiment with spatial relatedness of features in images (e.g. animal is on the road versus animal is next to the road).