Geospatial Data Science
A hands-on approach for building geospatial applications using linked data technologies


Geospatial Data Science
A hands-on approach for building geospatialapplications

Manolis Koubarakis (editor)
Chapter contributors: Konstantina Bereta, Dimitris Bilidas, Theofilos Ioannidis, Nikolaos Karalis, George Mandilaras, Charalampos Nikolaou, Despina-Athanasia Pantazi, George Papadakis, Dharmen Punjani, George Stamoulis, Eleni Tsalapati.

  1. Introduction
  2. Geospatial Data Modeling
  3. Legacy Geospatial Data Technologies
  4. Ontologies and Linked Data
  5. Geospatial Ontologies
  6. Linked Geospatial Data
  7. Querying Geospatial Data Expressed in RDF
  8. Visualizing Linked Geospatial Data
  9. Transforming Geopstial Data into RDF
  10. Interlinking Geopstial Data Sources
  11. Geospatial Ontology-based Data Access
  12. Incomplete Geospatial Infomration
  13. Geospatial RDF stores
  14. Geospatial Knowledge Graphs
  15. Question Answering Engines for Geospatial Knowledge Graphs
  16. Putting it All Together: a Data Science Pipeline for Linked Earth Observation Data
  17. Conclusions
Geospatial Data Science

The purpose of this book is to teach the readers how to develop geospatial applications easily based on the principles and software tools of geospatial data science. Geospatial data science is the science of collecting, organizing, analyzing, and visualizing geospatial data. The book introduces a new generation of geospatial technologies that have emerged from the development of Semantic Web and the linked data paradigm, and shows how data scientists can use them to build environmental applications easily.

This book takes the view that data scientists only need to be experts in semantic and linked data technologies. These technologies are typically covered at the advanced undergraduate or graduate level through a course on “Semantic Web and Linked Data” or through some dedicated lectures in a course on “Database Management Systems”. Semantic technologies are not specific to the geospatial domain, but they have recently been extended for modeling geospatial domains. The strong point of semantic technologies is that they do not deal with data formats or other low level details of the data. Instead, they allow a data scientist to model their application at the conceptual level using well-known concepts like objects, classes, and properties that most data scientists or software developers are familiar with today. They also enable a data scientist to interlink datasets containing information about the same thing (e.g., a dataset containing information about roads in Crete can be interlinked with a dataset containing land cover information about Crete). Once geospatial semantic technologies (geospatial ontologies, stRDF, stSPARQL, GeoSPARQL, OBDA mappings) are mastered using the book, the data scientist may use them to model their data as linked data. If the original data the data scientist needs to work with is not in linked data form, it can be transformed into linked data easily using the right tool. There is also the option of not transforming the dataset into linked data, and yet access it like it was a linked data source! Semantic technologies can then be used to analyze and visualize the data with the help of appropriate linked data tools. Applications can also be built very easily. Some applications will be just a sequence of GeoSPARQL queries!


We would like to acknowledge the support of our employer, the National and Kapodistrian University of Athens. In addition, we would like to thank the European Commission, the Greek General Secretariat for Research and Technology and the Hellenic Foundation for Research and Innovation for their generous funding of the authors’ research in geospatial data science since January 2010 through projects TELEIOS, LEO, MELODIES, Optique, SCARE, BigDataEurope, Copernicus App Lab, GeoQA and ExtremeEarth. We would like to thank many collaborators from these research projects who provided insightful comments on our work and used in their institutions our geospatial data systems. We are also grateful to the students of the graduate course “Knowledge Technologies” at the Dept. of Informatics and Telecommunications of the National and Kapodistrian University of Athens from 2012 until today. They are the ones who first faced the exercises in this book and its Web site. Manolis Koubarakis would also like to thank his colleagues Izambo Karali and Takis Stamatopoulos for taking over the teaching of “Knowledge Technologies” and “Artificial Intelligence” in 2014 and 2018 when he was on sabbatical leave, in the second occasion, for writing this book.

Finally, we would like to thank the people at ACM for giving us the opportunity to write this book and for guiding us through to completion.


This part of our Web page contains all datasets that are used in this book.

CORINE Land Cover 2012

The CORINE Land Cover dataset of year 2012 is provided by the European Environment Agency.

Global Administrative Areas

The Global Administrative Areas dataset contains information about the administrative boundaries of all areas in the world.


The Kallikratis plan is the most recent administrative organization of Greece that came into effect in 2011. There are 7 decentralized administrations, 13 regions and 325 municipalities.


OpenStreetMap is a gazetteer that contains information about a wide variety of points of interest.

Leaf Area Index

This global database of Leaf Area Indices (LAIs) is derived using input from the Moderate Resolution Imaging Spectroradiometer (MODIS) operational reflectance product.