Managing Big Geospatial Data with Apache Spark

Organisator Laboratory of Geo-information Science and Remote Sensing

vr 9 maart 2018 15:30 tot 16:00

Locatie Gaia, building number 101
Droevendaalsesteeg 3
6708 PB Wageningen
+31 317 48 16 00
Zaal/kamer 2

By Hector Muro (Spain)

Apache Spark is one of the most widely used and fast-evolving cluster-computing frameworks for big data. As most environmental modeling applications involve spatial data, this research investigates what is the state of the art with managing big geospatial data. As Apache Spark is a relatively new platform, and geospatial data extensions are mostly still work-in-progress, three packages for dealing with geospatial data in Apache Spark have been investigated, namely GeoSpark, GeoPySpark, and Magellan. First describing the functionality, then evaluating their performance with annoyingly big data geospatial datasets; and finally, compare their performance with a relational database management system. Conclussions have been derived about the maturity of the libraries, the scalability of solutions in Apache Spark, and discuss opportunities for large-scale environmental modeling.