shapely
Shapely - Planar Geometry
Shapely is the engine behind GeoPandas and many other GIS tools. It focuses on the geometry itself: calculating intersections, unions, distances, and checking spatial relationships (like "is this point inside this polygon?").
When to Use
- Precise manipulation of 2D geometric shapes.
- Performing set-theoretic operations (Intersection, Union, Difference).
- Checking spatial predicates (Contains, Within, Intersects, Touches).
- Cleaning and validating "dirty" geometry (fixing self-intersections).
- Calculating geometric properties (Area, Length, Centroid, Bounds).
- Generating buffers or simplifying complex lines.
- Linear referencing (finding points along a line).
Reference Documentation
Official docs: https://shapely.readthedocs.io/
GEOS (Engine): https://libgeos.org/
Search patterns: shapely.geometry, shapely.ops.unary_union, shapely.validation.make_valid
Core Principles
Geometric Objects
Objects are immutable. Once created, you don't change them; you perform an operation that returns a new object.
- Points: 0-dimensional.
- LineStrings: 1-dimensional curves.
- Polygons: 2-dimensional surfaces with optional holes.
Cartesian Geometry
Shapely operates in a Cartesian plane. It does not know about Earth's curvature, latitudes, or longitudes. Distance is sqrt(dx² + dy²).
Vectorization (Shapely 2.0+)
Modern Shapely supports vectorized operations on NumPy arrays of geometry objects, making it significantly faster than older versions.
Quick Reference
Installation
pip install shapely numpy
Standard Imports
import numpy as np
from shapely import Point, LineString, Polygon, MultiPoint, MultiPolygon
from shapely import ops, wkt, wkb
import shapely
Basic Pattern - Creation and Analysis
from shapely.geometry import Point, Polygon
# 1. Create objects
p = Point(0, 0)
poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
# 2. Check relationships
is_inside = p.within(poly) # True
is_on_border = p.touches(poly) # False (interior counts as within)
# 3. Calculate
print(f"Area: {poly.area}")
print(f"Distance: {p.distance(Point(10, 10))}")
Critical Rules
✅ DO
- Check Validity - Use
.is_validbefore complex operations. Invalid geometry (like a self-intersecting polygon) will cause errors. - Use unary_union - When merging many polygons,
ops.unary_union([list])is orders of magnitude faster than a loop ofp1.union(p2). - Prefer Vectorized Functions - Use
shapely.intersects(array_a, array_b)instead of loops for performance. - Use prepare() - If you are checking many points against the same polygon, use
shapely.prepare(poly)to speed up subsequent queries. - Simplify for Analysis - Use
.simplify(tolerance)for complex boundaries to improve performance if high precision isn't required.
❌ DON'T
- Mix Cartesian and Spherical - Don't calculate distance on Lat/Lon points; the result will be in meaningless "degrees".
- Assume Polygon Orientation - While Shapely handles it, remember that exterior rings should ideally be counter-clockwise and holes clockwise.
- Use for 3D Math - Shapely supports Z-coordinates for storage, but most operations (like area, intersection) ignore Z and project onto the XY plane.
- Loop over large collections - Use NumPy-style vectorization provided in Shapely 2.0+.
Anti-Patterns (NEVER)
from shapely.geometry import Point, Polygon
from shapely.ops import unary_union
# ❌ BAD: Merging geometries in a loop (O(n²) complexity)
result = geometries[0]
for g in geometries[1:]:
result = result.union(g)
# ✅ GOOD: Use unary_union (O(n log n) complexity)
result = unary_union(geometries)
# ❌ BAD: Checking many points without preparing
for p in many_points:
if complex_poly.contains(p): # Slow for complex shapes
pass
# ✅ GOOD: Prepare the geometry (Builds a spatial index)
from shapely import prepare
prepare(complex_poly)
for p in many_points:
if complex_poly.contains(p): # Much faster
pass
Geometry Types and Creation
Standard Primitives
# Point (x, y, z)
pt = Point(1.0, 2.0)
# LineString (Ordered sequence of points)
line = LineString([(0, 0), (1, 1), (2, 0)])
# Polygon (Shell, [Holes])
shell = [(0, 0), (10, 0), (10, 10), (0, 10)]
hole = [(2, 2), (2, 4), (4, 4), (4, 2)]
poly = Polygon(shell, [hole])
# Multi-Geometries (Collections)
points = MultiPoint([(0,0), (1,1)])
Spatial Predicates (Relationships)
Checking how objects relate
a = Point(1, 1).buffer(1.5) # A circle
b = Polygon([(0,0), (2,0), (2,2), (0,2)]) # A square
print(a.intersects(b)) # Shared space?
print(a.contains(b)) # B entirely inside A?
print(a.disjoint(b)) # No shared space?
print(a.overlaps(b)) # Same dimension, shared space, but not within?
print(a.touches(b)) # Only boundaries share space?
print(a.crosses(b)) # Line crossing a polygon?
Set-Theoretic Operations
Creating new geometries from old ones
poly1 = Point(0, 0).buffer(1)
poly2 = Point(1, 0).buffer(1)
# Intersection (Shared area)
inter = poly1.intersection(poly2)
# Union (Combined area)
union = poly1.union(poly2)
# Difference (Area in poly1 NOT in poly2)
diff = poly1.difference(poly2)
# Symmetric Difference (Area in either but NOT both)
sdiff = poly1.symmetric_difference(poly2)
Constructive Methods
Buffering, Splicing, and Simplifying
# Buffer: Expand/shrink geometry
# cap_style: 1=Round, 2=Flat, 3=Square
line_thick = line.buffer(0.5, cap_style=2)
# Centroid: Geometric center
center = poly.centroid
# Representative Point: Guaranteed to be INSIDE the geometry
# Useful for label placement in U-shaped polygons
label_pt = poly.representative_point()
# Simplify: Reduce number of vertices
simple_line = complex_line.simplify(tolerance=0.1, preserve_topology=True)
# Convex Hull: Smallest convex box containing all points
hull = MultiPoint(points).convex_hull
Linear Referencing
Working with positions along a LineString
line = LineString([(0, 0), (0, 10), (10, 10)])
# Find distance along line to the point nearest to (5, 5)
dist = line.project(Point(5, 5)) # returns 5.0 (it's at (0, 5))
# Find the actual point at a specific distance along the line
pt = line.interpolate(15.0) # returns Point(5, 10)
I/O: WKT, WKB, and NumPy
Serialization and Data Exchange
# WKT (Well-Known Text) - Human readable
text = "POINT (10 20)"
p = wkt.loads(text)
print(p.wkt)
# WKB (Well-Known Binary) - Fast and compact
binary = wkb.dumps(p)
p_new = wkb.loads(binary)
# NumPy Integration (Shapely 2.0)
points_array = np.array([Point(0,0), Point(1,1), Point(2,2)])
areas = shapely.area(points_array) # Returns array of zeros
dist_matrix = shapely.distance(points_array[:, np.newaxis], points_array)
Practical Workflows
1. Cleaning Invalid Geometries
from shapely.validation import make_valid
def safe_area(geom):
"""Calculates area even for invalid/self-intersecting polygons."""
if not geom.is_valid:
geom = make_valid(geom)
# After make_valid, a Polygon might become a MultiPolygon or GeometryCollection
return geom.area
2. Point-in-Polygon Search (Optimized)
from shapely import prepare
def find_points_in_poly(points, poly):
"""Efficiently filters points inside a complex polygon."""
prepare(poly) # Builds internal STRtree or spatial index
# Using vectorized intersection (much faster)
mask = shapely.contains(poly, points)
return points[mask]
3. Splitting a Polygon by a Line
from shapely.ops import split
def divide_land(polygon, line):
"""Splits a polygon into multiple parts using a LineString."""
result = split(polygon, line)
# Returns a GeometryCollection of the resulting parts
return list(result.geoms)
Performance Optimization
STRtree for Nearest Neighbors
If you have thousands of geometries and need to find which ones are near a point, use STRtree.
from shapely import STRtree
tree = STRtree(geometries)
# Find indices of geometries whose bounding boxes intersect the point's buffer
indices = tree.query(Point(0,0).buffer(10))
# Find the single nearest geometry index
nearest_idx = tree.nearest(Point(0,0))
Common Pitfalls and Solutions
The "Sliver Polygon" problem
Calculations like intersection can sometimes produce tiny, almost invisible polygons due to floating-point errors.
# ✅ Solution: Filter by area
intersection = p1.intersection(p2)
if intersection.area < 1e-9:
intersection = None
Latitude/Longitude Confusion
Points are (x, y). In GIS, this usually means (Longitude, Latitude).
# ❌ Error: Point(Latitude, Longitude)
# This will plot your maps sideways!
# ✅ Solution: Always use (Lon, Lat) to match (X, Y)
nyc = Point(-74.006, 40.7128)
GeometryCollections
Operations like split or intersection can return GeometryCollection. This is a container for mixed types.
# ❌ Problem: Calling .area on a collection with Lines and Polygons
# ✅ Solution: Filter for the type you want
polys = [g for g in collection.geoms if g.geom_type == 'Polygon']
Best Practices
- Always validate geometry with
.is_validbefore complex operations - Use
unary_unioninstead of looping over unions - Prepare geometries with
prepare()when checking many points against the same shape - Use vectorized operations in Shapely 2.0+ for performance
- Remember Shapely is Cartesian - don't use lat/lon directly for distance calculations
- Filter sliver polygons by area threshold after geometric operations
- Use STRtree for spatial indexing when working with many geometries
- Simplify complex geometries when high precision isn't required
- Handle GeometryCollections properly - filter by geometry type when needed
- Use representative_point() for guaranteed interior points in complex polygons
Shapely is a specialized, sharp tool. It doesn't care about your coordinate system or your file format — it only cares about the pure, mathematical relationship between shapes. Mastering it is the key to building advanced spatial algorithms.
More from tondevrel/scientific-agent-skills
xgboost-lightgbm
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
193opencv
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
141ortools
Google Optimization Tools. An open-source software suite for optimization, specialized in vehicle routing, flows, integer and linear programming, and constraint programming. Features the world-class CP-SAT solver. Use for vehicle routing problems (VRP), scheduling, bin packing, knapsack problems, linear programming (LP), integer programming (MIP), network flows, constraint programming, combinatorial optimization, resource allocation, shift scheduling, job-shop scheduling, and discrete optimization problems.
73matplotlib
The foundational library for creating static, animated, and interactive visualizations in Python. Highly customizable and the industry standard for publication-quality figures. Use for 2D plotting, scientific data visualization, heatmaps, contours, vector fields, multi-panel figures, LaTeX-formatted plots, custom visualization tools, and plotting from NumPy arrays or Pandas DataFrames.
70scipy
Comprehensive guide for SciPy - the fundamental library for scientific and technical computing in Python. Use for integration, optimization, interpolation, linear algebra, signal processing, statistics, ODEs, Fourier transforms, and advanced scientific algorithms. Built on NumPy and essential for research and engineering.
51plotly
A high-level interactive graphing library for Python. Ideal for web-based visualizations, 3D plots, and complex interactive dashboards. Built on plotly.js, it allows users to zoom, pan, and hover over data points in a browser-based environment. Use for interactive charts, web applications, Jupyter notebooks, 3D data visualization, geographic maps, financial charts, animations, time-series analysis, and building production-ready dashboards with Dash.
50