Jan 27, 2022

Spatial Indexing aids Finding which Polygons contain a Point

Speed up your queries by using a spatial index to select fewer polygons before testing if a point is inside a polygon.
Javier Santana
Co-founder

Finding the polygons that contain a given point is a common geospatial operation but can be complex. Speed up finding polygons using the geohash trick. Geohash subdivides the Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer the geohash string, the more precise the geographic location.

Code to List Polygons that Contain a Point

We use ClickHouse’s functions for working with geohash:

  • geohashEncode(longitude, latitude, [precision])
  • geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision)

The schema of the data used here is an integer id and a multipolygon as an array of single polygons, each an array of tuples.

First we generate the materialized view geo_index with a mapping geohash -> Array(geoname_id). That is, for each geohash there is an array of ids whose polygon(s) are partly in the geohash polygon.

In order to calculate all the geohash covered by a polygon, the polygon bounding box is calculated (longitude is wrapped to [-180, 180]) and the box is used with geohashesInBox to return a list of polygons.

Geohash level is decided based on the polygon area. In this example, polygons are pretty large, so a low number (3) is used. For small polygons, a higher level of detail should be used.

When querying data the geohash of the point is calculated from the geo_index table, the list of geoname_ids is retrieved and this list of candidates is used to get the actual polygons.

Note that the candidates list is not the final one, the point can have the same geohash but be outside of the polygon, so we need to check if the point is inside the polygon using pointInPolygon().

The steps are:

  • calculate the geohash of the current point. The geohash level should match the one used in geo_index_mv (here 3).
  • calculate the candidate polygons based on the geohash index table
  • check the candidate polygons one by one with {% code-line %}pointInPolygon, since geohash is a lossy filter. This function is pretty expensive but, thanks to the index, you only query the relevant part of the data.
Finding the polygon containing the selected point

Experimental Geo Types

ClickHouse now has experimental_geo_types:

  • Point: (x,y) - a tuple of Float64s
  • Ring: a simple polygon without holes - an array of points
  • Polygon: a polygon with holes - an array of rings. The first element of the outer array is the outer shape of the polygon and all the following elements are holes.
  • MultiPolygon: multiple polygons - an array of polygons.

These will open the door to even faster queries in the future.

Speed Up with Spatial Indexing

By using a spatial indexing you test far fewer polygons and so speed up your query. In the same way that indexing your Data Sources with sorting keys improves query speed, spatial indexing also increases efficiency.

If you have an interesting use case, write to us at hi@tinybird.co, sign up to Tinybird and easily build your own data apps. If you already have an account, join our community Slack. Looking forward to seeing you there.

Do you like this post?

Related posts

Real-time Data Visualization: How to build faster dashboards
A new way to create intermediate Data Sources in Tinybird
Tinybird
Team
Jun 15, 2023
Export data from Tinybird to Amazon S3 with the S3 Sink
Tinybird
Team
Mar 21, 2024
Tinybird: A ksqlDB alternative when stateful stream processing isn't enough
To the limits of SQL... and beyond
Automating data workflows with plaintext files and Git
Chatting GraphQL with Jamie Barton of Grafbase
Tinybird
Team
Apr 24, 2023
What it takes to build a real-time recommendation system
We launched an open source ClickHouse Knowledge Base
Tinybird
Team
Oct 11, 2022
The definition of real-time data

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.
Need more? Contact sales for Enterprise support.