Data Integration

Get More Value out of Geospatial Data

In the ever-expanding world of IoT, no industry is left untouched and the growth potential shows no sign of slowing down. According to a report published by GlobalData in May 2021, the global Internet of Things market is expected to reach more than a trillion dollars by 2024. 

Predictive maintenance, optimized energy consumption, road traffic management – the use cases are virtually endless! Meanwhile, that connectivity is generating a goldmine of data for your business.  

This article series is designed to show you exactly what’s possible in the world of IoT - starting with technical use cases that leverage Talend to tap into that potential. Let’s dive into:

Real-time Geospatial Calculation 

When listening to customer feedback from different events, I realized that a common Talend platform use case is valuing the GPS coordinates of an object (ie: vehicle, mobile device). For example, this is used in cases of:  

  • Fleet management: (Where are my vehicles?) 

  • Theft prevention: (Has my object left the security zone that I have defined?) 

  • Improved customer experience: (Which points of interest are around my customer's current position? Ie: shops, parking lots )  

Where does Talend come into real-time geospatial research?  

Let’s use the example of a train running on H line near Paris. You want to generate alerts when that train reaches a level crossing, a tunnel, or a station. The goal is to detect infrastructure elements within a restricted perimeter of the head end the train, in real time. 

SNCF (France’s railway company) provides a large number of geographical files with the location of its infrastructures here.

(Note that this example is implemented with version 8 Enterprise of Talend. Talend is a scalable solution thanks to its Remotes Engines that can be organized into clusters.)  

To keep the example simple, we won’t dig into the "telecommunications - networks" layer of a real IoT architecture: we consider that the position of the train is communicated on Google PubSub. We will use the same mechanism for producing and consuming alert messages. 

Disy, a Talend partner, offers a geospatial module; but for this tutorial, I’ll be using the geospatial indexing capabilities of MongoDB (tutorial here). Note that Postgres SQL or Snowflake (and many other solutions) offer this feature as well. 

The first step is to load all SNCF infrastructures in GeoJSON format in MongoDB

Loading GeoJSON shapes into MongoDB 

We are going to load the set of GeoJSON shapes that represent the points of interest that I want to detect in a MongoDB collection. We will add a geospatial index on the GeoJSON collection metadata. 

Talend makes it easy to use MongoDB for: 

  • Bulk Loading 

  • Index creation 

Train movement simulation 

Here, I downloaded the layout of the SNCF tracks, and I extracted line H (which I use regularly!). I transformed the line into a sequence of points. 

The idea is to generate a PubSub message for each GPS location as if the train passed by these points.

Trigger alerts

Now for the heart of the demonstration:  

  1. Retrieve the train's position each time it is transmitted (Google PubSub) using a Talend real-time route. 

  2. Write in the datalake (not demonstrated here but trivial). 

  3. Search for points of interest within a given perimeter using a geospatial query in MongoDB. 

  4. Produce PubSub messages with the infrastructures reported. 

Loading in streaming in a "streaming" table of the Data Warehouse. 

This step is to stream the alerts in a "streaming" table of Google Big Query - we use Talend's Pipeline Designer for this. This solution allows you to create streams, window messages, and write them to GBQ in streaming. 

You can then create a real-time dashboard in Google Studio / Kibana or more "statically" in Tableau. 

We can view all the alerts generated here, in blue for level crossings and in orange for stations. 

In Conclusion  

  • Talend offers a solution to accelerate projects related to geolocation: 

  • Real-time management (streaming) / Production / Subscription to message queues 

  • Support on MongoDB (or other) for fast geospatial calculations 

  • Data mapping and formatting 

  • Streaming in Big Query with Talend Pipeline Designer 

The demonstration above is just on my computer –  but what about scalability? Talend offers several solutions for enterprise-level projects: use a Remote Engine cluster (Enterprise version of Talend) or use a Spark Streaming layer to distribute streaming calculations on your Spark cluster.  

You can find the full demo here.

Stay tuned for the next article in the series, and thanks for joining!  

In this article:

Data Integration

Ready To Get Started?