Retrieving critical location based information from Twitter for disaster management using LDA topic modelling; A case study with a chemical fire at Moerdijk

Organised by Laboratory of Geo-information Science and Remote Sensing

Fri 28 September 2018 09:00 to 09:30

Venue Gaia, gebouwnummer 101
Room 1

By Rony Nedkov (the Netherlands)

Twitter has been recognized as a valuable information source for disaster management, due to the content in combination with the spatial component. Raw Twitter data has an unstructured nature, which consist of both relevant and irrelevant data, therefore classification and information retrieval techniques are needed to make it useful for the emergency services. There is a lack of integration between information retrieval methods and spatial temporal analysis, which makes the practical application limited. Therefore the objective was: what critical location based information can be derived from Twitter during the response phase of a disaster to support the decision making process in disaster management. The chemical fire at Chemi-pack was chosen as a case study to demonstrate the effectiveness of Latent Dirichlet Allocation (LDA) Topic Modelling. Self-Organizing Maps (SOM) and word clouds were used to analyze and interpret the results more in depth. The dataset consists of Tweets which have been retrieved from Twitter several days after the incident. Before topic discovery was done, additional location information was retrieved from Twitter to geocode Tweets without a location and enlarge the spatial dataset. The topic modelling several topics which could be matched with events in reality. One of these topics was considered of interest for emergency services. This topic was related to the toxic plume. We found that half of the Tweets from this topic contained actionable information and that one third of the Tweets was correctly geocoded. Thereafter it was possible to spatially visualize the event on the granularity level of residential areas. In conclusion, with additional location information provided by Twitter it is possible to significantly increase the size of the dataset with an acceptable accuracy. LDA topic modelling has a limited performance when used on Twitter data, but the use of SOM and word clouds makes these results easier to analyze and interpret. The various steps in this research and the limited performance of LDA topic modelling resulted in insufficient Tweet classification, therefore a majority of the Tweets remained unused. We recommend to extend the contextual information in Tweets by means of tweet pooling. Additionally tokenization and word stemming can be applied to improve the performance of topic modelling.

Keywords: Twitter; LDA; Topic Modeling; Disaster Management; Self-Organizing Maps; Actionable Information