block by jacquestardie e6074eb7f5fa4a152ccd90339e354e87

PostgreSQL & PostGIS cheatsheet (a work in progress)

PostgreSQL & PostGIS Cheatsheet

This is a collection of information on PostgreSQL and PostGIS for what I tend to use most often.

TOC

Installation

Postgres

Note that for OS X and Ubuntu you may need to run the above commands as a super user / using sudo.

Set Up

On Ubuntu you typically need to log in as the Postgres user and do some admin things:

For Mac OS X you can skip the above if you install with homebrew.

For Windows….

Starting the Postgres Database

On Mac OS X:

PostGIS

psql

psql is the interactive unix command line tool for interacting with Postgres/PostGIS.

Common Commands

Importing Data

Exporting Data

Joining Tables Using a Shared Key

From CartoDB’s tutorial Join data from two tables using SQL

Upgrading Postgres

This Tutorial was very helpful for upgrading on Mac OS X via homebrew.

WARNING: Back up your data before doing this incase you screw up like I did!

Basically the steps are:

  1. Shut down Postgresql:
    launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

  2. Create a new Postgresql9.x data directory:
    initdb /usr/local/var/postgres9.4 -E utf8

  3. Run the pg_upgrade command:

     pg_upgrade \
     -d /usr/local/var/postgres \
     -D /usr/local/var/postgres9.4 \
     -b /usr/local/Cellar/postgresql/9.3.5_1/bin/ \
     -B /usr/local/Cellar/postgresql/9.4.0/bin/ \
     -v
  4. Change kernel settings if necessary:

     sudo sysctl -w kern.sysv.shmall=65536
     sudo sysctl -w kern.sysv.shmmax=16777216
    • I also ran sudo vi /etc/sysctl.conf and entered the same values:

      kern.sysv.shmall=65536
      kern.sysv.shmmax=16777216
    • re-run the pg_upgrade command in step 3

  5. Move the new data directory into place:

     cd /usr/local/var
     mv postgres postgres9.2.4
     mv postgres9.3 postgres
  6. Start the new version of PostgreSQL: launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

    • check to make sure it worked:
    psql postgres -c "select version()"
    psql -l
  7. Cleanup:

    • vacuumdb --all --analyze-only
    • analyze_new_cluster.sh*
    • delete_old_cluster.sh*
    • brew cleanup postgresql
      (* scripts were generated in same the directory where pg_upgrade was ran)

PostGIS

PostGIS is the extension for Postgres that allows for working with geometry data types and doing GIS operations in Postgres.

Common Commands

Common Spatial Queries

You may view more of these in my intro to Visualizing Geospatial Data with CartoDB.

Find all polygons from dataset A that intersect points from dataset B:

SELECT a.*
FROM table_a_polygons a, table_b_points b
WHERE ST_Intersects(a.the_geom, b.the_geom);

Find all rows in a polygon dataset that intersect a given point:

-- note: geometry for point must be in the order lon, lat (x, y)
SELECT * FROM nyc_tenants_rights_service_areas
where
ST_Intersects(
  ST_GeomFromText(
   'Point(-73.982557 40.724435)', 4326
  ),
  nyc_tenants_rights_service_areas.the_geom    
);

Or using ST_Contains:

SELECT * FROM nyc_tenants_rights_service_areas
where
st_contains(
  nyc_tenants_rights_service_areas.the_geom,
  ST_GeomFromText(
   'Point(-73.917104 40.694827)', 4326
  )      
);

Counting points inside a polygon:

With ST_Containts():

SELECT us_counties.the_geom_webmercator,us_counties.cartodb_id,
count(quakes.the_geom)
AS total
FROM us_counties JOIN quakes
ON st_contains(us_counties.the_geom,quakes.the_geom)
GROUP BY us_counties.cartodb_id;

To update a column from table A with the number of points from table B that intersect table A’s polygons:

update noise.hoods set num_complaints = (
    select count(*)
    from noise.locations
    where
    ST_Intersects(
        noise.locations.geom,
        noise.hoods.geom
    )
);

Select data within a bounding box
Using ST_MakeEnvelope

HINT: You can use bboxfinder.com to easily grab coordinates of a bounding box for a given area.

SELECT * FROM some_table
where geom && ST_MakeEnvelope(-73.913891, 40.873781, -73.907229, 40.878251, 4326)

Select points from table a that do not fall within any polygons in table b
This method makes use of spatial indexes and the indexes on gid for better performance

SELECT
  a.gid,
  a.st_address,
  a.city,
  a.st_num,
  a.the_geom
FROM
  points AS a LEFT JOIN
  polygons AS b ON
  ST_Intersects(a.the_geom, b.the_geom)
WHERE b.gid IS NULL;

credit: Nicklas Avén

Make a line from a series of points

SELECT ST_MakeLine (the_geom ORDER BY id ASC)
AS the_geom, route
FROM points_table
GROUP BY route;

Order points in a table by distance to a given lat lon
This one uses CartoDB’s built-in function CDB_LatLng which is short hand for doing:
SELECT ST_Transform( ST_GeomFromText( 'Point(-73.982557 40.724435)',),4326)

SELECT * FROM table
ORDER BY the_geom <->
CDB_LatLng(42.5,-73) LIMIT 10;

Access the previous row of data and get value (time, value, number, etc) difference

WITH calc_duration AS (
 SELECT
 cartodb_id,
 extract(epoch FROM (date_time - lag(date_time, 1) OVER(ORDER BY date_time))) AS duration_in_seconds
 FROM tracking_eric
 ORDER BY date_time
)
UPDATE tracking_eric
SET duration_in_seconds = calc_duration.duration_in_seconds
FROM calc_duration
WHERE calc_duration.cartodb_id = tracking_eric.cartodb_id

Select population density

In this query we cast the geometry data type to the geography data type to get units of measure in meters.

SELECT pop_sqkm,
 round( pop / (ST_Area(the_geom::geography)/1000000))
 as psqkm
 FROM us_counties

Repair Invalid Geometries
Sometimes when data is imported into PostGIS geometries get screwed up. If you get an error message like:

ERROR:  GEOSIntersects: TopologyException: side location conflict at -116.03227135270012 33.309736898054787

You can try doing:

UPDATE tablename SET geom=ST_MAKEVALID(geom) WHERE NOT ST_ISVALID(geom);

Spatial Indexing

Makes queries hella fast. OSGeo has a good tutorial.

Importing Spatial Data to PostGIS

Using shp2pgsql

  1. Create an SQL file with a CREATE TABLE statement:

     shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods > noise.sql

    Or for using the geography data type do:

     shp2pgsql -G -I nyc-pediacities-hoods-v3-edit.shp noise.nyc-pediacities-hoods-v3-edit_geographic > nyc_pediacities-hoods-v3-edit.sql

    If you’d like to transform your data from one SRID to another, you may pass two EPSG codes separated by a colon to the -s flag:

     shp2pgsql -I -s 2236:4326 shapefile_in_2236.shp schema.table-name > table-name.sql
  2. Then run the SQL using psql:

     psql -d nyc_noise -f noise.sql

    Or for the geography type above:

     psql -d nyc_noise -f nyc_pediacities-hoods-v3-edit.sql
  3. Alternatively do both of the above in a single command:

     shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods | psql -d nyc_noise

Using osm2pgsql

To import an OpenStreetMap extract in PBF format do:
osm2pgsql -H localhost --hstore-all -d nyc_from_osm ~/Downloads/newyorkcity.osm.pbf

Using ogr2ogr

Example importing a GeoJSON file into a database called nyc_pluto:

ogr2ogr -f PostgreSQL \
PG:"host='localhost' user='chrislhenrick' port='5432' \
dbname='nyc_pluto' password=''" \
bk_map_pluto_4326.json -nln bk_pluto

Exporting Spatial Data from PostGIS

The two main tools used to export spatial data with more complex geometries from Postgres/PostGIS than points are pgsql2shp and ogr2ogr.

Using pgsql2shp

pgsql2shp is a tool that comes installed with PostGIS that allows for exporting data from a PostGIS database to a shapefile format. To use it you need to specify a file path to the output shapefile (just stating the basename with no extension will output in the current working directory), a host name (usually this is localhost), a user name, a password for the user, a database name, and an SQL query.

pgsql2shp -f <path to output shapefile> -h <hostname> -u <username> -P <password> databasename "<query>"

A sample export of a shapefile called my_data from a database called my_db looks like this:

pgsql2shp -f my_data -h localhost -u clhenrick -P 'mypassword' my_db "SELECT * FROM my_data "

Using ogr2ogr

Note: You may need to set the GDAL_DATA path if you git this error:

ERROR 4: Unable to open EPSG support file gcs.csv.
Try setting the GDAL_DATA environment variable to point to the
directory containing EPSG csv files.

If on Linux / Mac OS do this: export GDAL_DATA=/usr/local/share/gdal
If on Windows do this: C:\> set GDAL_DATA=C:\GDAL\data

To Export Data
Use ogr2ogr as follows to export a table (in this case a table called dob_jobs_2014) to a GeoJSON file (in this case a file called dob_jobs_2014_geocoded.geojson):

ogr2ogr -f GeoJSON -t_srs EPSG:4326 dob_jobs_2014_geocoded.geojson \
PG:"host='localhost' dbname='dob_jobs' user='chrislhenrick' password='' port='5432'" \
-sql "SELECT bbl, house, streetname, borough, jobtype, jobstatus, existheight, proposedheight, \
existoccupancy, proposedoccupany, horizontalenlrgmt, verticalenlrgmt, ownerbusinessname, \
ownerhousestreet, ownercitystatezip, ownerphone, jobdescription, geom \
FROM dob_jobs_2014 WHERE geom IS NOT NULL"

Other Methods of Interacting With Postgres/PostGIS

to do…

PGAdmin

Python

Node JS