Written by Paul
This article focuses on optimizing the handling of spatial data in MySQL from a service-oriented perspective.
While the methods described here are specific to MySQL, the basic principles can also apply to other databases (e.g., PostgreSQL).
Introduction
Handling spatial data from a service perspective can be challenging. Spatial data generally refers to two-dimensional map coordinates with latitude and longitude values.
So, why is this difficult? Here are a few reasons:
- Coordinate data often contains many decimal points, meaning that parsing exact data requires extensive pre- and post-processing.
- When working with large amounts of coordinate data, a lot of calculations are required, increasing exponentially with the number of rows.
This led me to wonder: is there a way to manage spatial data with elegance and simplicity from a service perspective?
Working with Coordinate Data from a Data Perspective
If we handle coordinate data purely from a data perspective (not from a service perspective), we might approach it as follows:
- Prioritizing accuracy over query speed.
- Extracting precise data as the highest priority.
MySQL offers several spatial data types and functions to handle coordinates.
Spatial Datatypes
MySQL provides several spatial datatypes for handling coordinates, including
POINT
, LINESTRING
, POLYGON
, MULTIPOINT
, MULTILINESTRING
, MULTIPOLYGON
, and GEOMETRYCOLLECTION
.For instance, to define and query coordinates in MySQL, we can use the
POINT
type:CREATE TABLE LOCATION ( location_point POINT ); SELECT ST_Y(location_point) AS latitude, ST_X(location_point) AS longitude FROM LOCATIONS;
With the
ST_Contains
operator, we can check if a POINT
data falls within a specific boundary:SELECT NOT ST_Contains( (SELECT shape FROM spatial_data WHERE id = 1), (SELECT shape FROM spatial_data WHERE id = 2) ) AS is_outside;
However, this approach can be computationally expensive, especially as the complexity of polygons increases. Using spatial indexing on
POINT
types can help optimize operations like ST_Contains
, making it better suited for applications requiring high precision in data science.But from my perspective, handling raw spatial queries increases complexity and may reduce developer experience (DX).
Handling Spatial Data from a Service Perspective (Using Geohash)
In most service contexts, the following aspects of spatial data are typically prioritized:
- High precision is not necessary for location data.
- Filtering by approximate location is sufficient (e.g., retrieving coordinates within a city or region).
For these cases, using MySQLโs spatial types may not be necessary, leading me to explore Geohash.
Geohash with Prefix
Geohashing enables us to represent a geographic location as a string of characters, where the prefix of the Geohash represents a broader area.
Hereโs a sample table and query structure using Geohash for approximate location matching:
CREATE TABLE locations ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), latitude DOUBLE, longitude DOUBLE, geohash CHAR(12), INDEX (geohash) ); INSERT INTO locations (name, latitude, longitude, geohash) VALUES ('Location A', 37.7749, -122.4194, 'wydm'), ('Location B', 37.7750, -122.4180, 'wydm4'), ('Location C', 37.7800, -122.4200, 'wydms');
Using the Geohash prefix for querying allows us to filter data within a specific boundary without complex operations:
SELECT * FROM locations WHERE geohash LIKE 'wydm%';
This approach keeps query code simple, improving DX and allowing ORM tools like Prisma to support the data types directly (e.g., using
DOUBLE
). Geohash also makes it easier for developers and planners to collaborate by using intuitive, human-understandable location precision levels.For example, zoom levels on a map could correspond to specific Geohash precision levels, creating a consistent understanding across teams. The precision length of a Geohash string reflects its granularity; by zooming in on an area, only locations within a higher-precision Geohash boundary are retrieved.
Implementation with Prisma ORM and Typescript for Clustering Data
Define Prisma Schema
model Location { id Int @id @default(autoincrement()) name String latitude Float longitude Float geohash String @db.VarChar(12) // Use a VARCHAR column for geohashes @@index([geohash]) // Index to optimize geohash queries }
Install Geohash Utility
yarn add ngeohash
Define Utility Functions
To encode geohashes in
geohashUtil.ts
:import * as ngeohash from 'ngeohash'; export function encodeGeohash(latitude: number, longitude: number, precision: number = 12): string { return ngeohash.encode(latitude, longitude, precision); }
Insert and Query Data
Inserting location data:
import { PrismaClient } from '@prisma/client'; import { encodeGeohash } from './geohashUtils'; const prisma = new PrismaClient(); async function insertLocation(name: string, latitude: number, longitude: number) { const geohash = encodeGeohash(latitude, longitude); await prisma.location.create({ data: { name, latitude, longitude, geohash } }); }
Querying locations by Geohash prefix:
export async function findLocationsByGeohashPrefix(prefix: string) { return await prisma.location.findMany({ where: { geohash: { startsWith: prefix, }, }, }); }
Grouping Data by Geohash Prefix
To retrieve and group data by Geohash prefix:
async function groupLocationsByGeohashPrefix() { const prefixes = ['wydm', 'wydn', 'wydp']; // Example prefixes const groupedLocations: { [key: string]: any[] } = {}; for (const prefix of prefixes) { const locations = await findLocationsByGeohashPrefix(prefix); groupedLocations[prefix] = locations; } return groupedLocations; } groupLocationsByGeohashPrefix().then(groups => { console.log('Grouped Locations by Geohash Prefix:', groups); });
Calculating Cluster Centroids
If we need to compute centroids for Geohash clusters:
type Location = { latitude: number; longitude: number }; type Cluster = { [key: number]: Location[] }; function computeCentroids(clusters: Cluster): { centroidLat: number; centroidLng: number }[] { return Object.values(clusters).map((locations) => { const totalLat = locations.reduce((sum, loc) => sum + loc.latitude, 0); const totalLng = locations.reduce((sum, loc) => sum + loc.longitude, 0); const count = locations.length; return { centroidLat: totalLat / count, centroidLng: totalLng / count, }; }); } async function exampleUsage() { const locations = await findLocationsByGeohashPrefix('wydm'); const clusterAssignments = { 0: [{ latitude: 37.7749, longitude: -122.4194 }, { latitude: 37.7750, longitude: -122.4180 }], 1: [{ latitude: 34.0522, longitude: -118.2437 }], 2: [{ latitude: 40.7128, longitude: -74.0060 }] }; const centroids = computeCentroids(clusterAssignments); console.log('Cluster Centroids:', centroids); }
Conclusion
This guide outlines the core approach to working with spatial data from a service perspective, using Geohash to simplify and optimize data processing.