09-Projecting-Data
import os
%load_ext sql
Connecting to database¶
host = "localhost"
database = "nyc"
user = os.getenv("SQL_USER")
password = os.getenv("SQL_PASSWORD")
connection_string = f"postgresql://{user}:{password}@{host}/{database}"
%sql $connection_string
'Connected: postgres@nyc'
%%sql
SELECT * from nyc_subway_stations LIMIT 5
* postgresql://postgres:***@localhost/nyc 5 rows affected.
id | geom | objectid | name | alt_name | cross_st | long_name | label | borough | nghbhd | routes | transfers | color | express | closed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
376 | 010100002026690000371775B5C3CE2141CBD2347771315141 | 1 | Cortlandt St | None | Church St | Cortlandt St (R,W) Manhattan | Cortlandt St (R,W) | Manhattan | None | R,W | R,W | YELLOW | None | None |
2 | 010100002026690000CBE327F938CD21415EDBE1572D315141 | 2 | Rector St | None | None | Rector St (1) Manhattan | Rector St (1) | Manhattan | None | 1 | 1 | RED | None | None |
1 | 010100002026690000C676635D10CD2141A0ECDB6975305141 | 3 | South Ferry | None | None | South Ferry (1) Manhattan | South Ferry (1) | Manhattan | None | 1 | 1 | RED | None | None |
125 | 010100002026690000F4CF3E3654032241B5704681A73C5141 | 4 | 138th St | Grand Concourse | Grand Concourse | 138th St / Grand Concourse (4,5) Bronx | 138th St / Grand Concourse (4,5) | Bronx | None | 4,5 | 4,5 | GREEN | None | None |
126 | 01010000202669000084DADF7AED0422410C380E6E3A3D5141 | 5 | 149th St | Grand Concourse | Grand Concourse | 149th St / Grand Concourse (4) Bronx | 149th St / Grand Concourse (4) | Bronx | None | 4 | 2,4,5 | GREEN | express | None |
Check SRID¶
- We’ve already encountered projections when we loaded our
nyc
data. (Recall thatpesky SRID 26918
).- Sometimes, however, you need to transform and re-project between spatial reference systems.
- PostGIS includes built-in support for changing the projection of data, using the
ST_Transform(geometry, srid)
function. - For managing the spatial reference identifiers on geometries, PostGIS provides the
ST_SRID(geometry)
andST_SetSRID(geometry, srid)
functions.
We can confirm the SRID
of our data with the ST_SRID
function:
%%sql
SELECT ST_SRID(geom) FROM nyc_streets LIMIT 1;
* postgresql://postgres:***@localhost/nyc 1 rows affected.
st_srid |
---|
26918 |
And what is definition of “26918”?
- As we saw in loading data section, the definition is contained in the
spatial_ref_sys
table. In fact, two definitions are there.- The
“well-known text” (WKT)
definition is in thesrtext
column, - and there is a second definition in “
proj.4
” format in theproj4text
column.
- The
%%sql
SELECT * FROM spatial_ref_sys
WHERE srid = 26918
* postgresql://postgres:***@localhost/nyc 1 rows affected.
srid | auth_name | auth_srid | srtext | proj4text |
---|---|---|---|---|
26918 | EPSG | 26918 | PROJCS["NAD83 / UTM zone 18N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","26918"]] | +proj=utm +zone=18 +datum=NAD83 +units=m +no_defs |
In fact, for the internal PostGIS re-projection calculations, it is the contents of the proj4text
column that are used. For our 26918 projection, here is the proj.4 text:
%%sql
SELECT proj4text FROM spatial_ref_sys WHERE srid = 26918
* postgresql://postgres:***@localhost/nyc 1 rows affected.
proj4text |
---|
+proj=utm +zone=18 +datum=NAD83 +units=m +no_defs |
In practice, both the srtext
and the proj4text
columns are important:
- the
srtext
column is used by external programs likeGeoServer
,QGIS
, andFME
and others; - the
proj4text
column is used internally.
Comparing Data¶
Taken together, a coordinate and an SRID
define a location on the globe. Without an SRID
, a coordinate is just an abstract notion.
- A “Cartesian” coordinate plane is defined as a “flat” coordinate system placed on the surface of Earth.
- Because PostGIS functions work on such a plane, comparison operations require that both geometries be represented in the same
SRID
.
If you feed in geometries with differing SRIDs you will just get an error:
%%sql
SELECT ST_Equals(
ST_GeomFromText('POINT(0 0)', 4326),
ST_GeomFromText('POINT(0 0)', 26918)
)
* postgresql://postgres:***@localhost/nyc
--------------------------------------------------------------------------- InternalError_ Traceback (most recent call last) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1705, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1704 if not evt_handled: -> 1705 self.dialect.do_execute( 1706 cursor, statement, parameters, context 1707 ) 1709 if self._has_events or self.engine._has_events: File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\default.py:681, in DefaultDialect.do_execute(self, cursor, statement, parameters, context) 680 def do_execute(self, cursor, statement, parameters, context=None): --> 681 cursor.execute(statement, parameters) InternalError_: ST_Equals: Operation on mixed SRID geometries (Point, 4326) != (Point, 26918) The above exception was the direct cause of the following exception: InternalError Traceback (most recent call last) Cell In[8], line 1 ----> 1 get_ipython().run_cell_magic('sql', '', "\nSELECT ST_Equals(\n ST_GeomFromText('POINT(0 0)', 4326),\n ST_GeomFromText('POINT(0 0)', 26918)\n )\n") File d:\Anaconda3\envs\mybase\Lib\site-packages\IPython\core\interactiveshell.py:2478, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2476 with self.builtin_trap: 2477 args = (magic_arg_s, cell) -> 2478 result = fn(*args, **kwargs) 2480 # The code below prevents the output from being displayed 2481 # when using magics with decodator @output_can_be_silenced 2482 # when the last Python token in the expression is a ';'. 2483 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File d:\Anaconda3\envs\mybase\Lib\site-packages\sql\magic.py:95, in SqlMagic.execute(self, line, cell, local_ns) 92 return self._persist_dataframe(parsed['sql'], conn, user_ns) 94 try: ---> 95 result = sql.run.run(conn, parsed['sql'], self, user_ns) 97 if result is not None and not isinstance(result, str) and self.column_local_vars: 98 #Instead of returning values, set variables directly in the 99 #users namespace. Variable names given by column names 101 if self.autopandas: File d:\Anaconda3\envs\mybase\Lib\site-packages\sql\run.py:340, in run(conn, sql, config, user_namespace) 338 else: 339 txt = sqlalchemy.sql.text(statement) --> 340 result = conn.session.execute(txt, user_namespace) 341 _commit(conn=conn, config=config) 342 if result and config.feedback: File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1200, in Connection.execute(self, statement, *multiparams, **params) 1196 util.raise_( 1197 exc.ObjectNotExecutableError(statement), replace_context=err 1198 ) 1199 else: -> 1200 return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\sql\elements.py:313, in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options) 309 def _execute_on_connection( 310 self, connection, multiparams, params, execution_options 311 ): 312 if self.supports_execution: --> 313 return connection._execute_clauseelement( 314 self, multiparams, params, execution_options 315 ) 316 else: 317 raise exc.ObjectNotExecutableError(self) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1389, in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options) 1377 compiled_cache = execution_options.get( 1378 "compiled_cache", self.engine._compiled_cache 1379 ) 1381 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( 1382 dialect=dialect, 1383 compiled_cache=compiled_cache, (...) 1387 linting=self.dialect.compiler_linting | compiler.WARN_LINTING, 1388 ) -> 1389 ret = self._execute_context( 1390 dialect, 1391 dialect.execution_ctx_cls._init_compiled, 1392 compiled_sql, 1393 distilled_params, 1394 execution_options, 1395 compiled_sql, 1396 distilled_params, 1397 elem, 1398 extracted_params, 1399 cache_hit=cache_hit, 1400 ) 1401 if has_events: 1402 self.dispatch.after_execute( 1403 self, 1404 elem, (...) 1408 ret, 1409 ) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1748, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1745 branched.close() 1747 except BaseException as e: -> 1748 self._handle_dbapi_exception( 1749 e, statement, parameters, cursor, context 1750 ) 1752 return result File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1929, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context) 1927 util.raise_(newraise, with_traceback=exc_info[2], from_=e) 1928 elif should_wrap: -> 1929 util.raise_( 1930 sqlalchemy_exception, with_traceback=exc_info[2], from_=e 1931 ) 1932 else: 1933 util.raise_(exc_info[1], with_traceback=exc_info[2]) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\util\compat.py:211, in raise_(***failed resolving arguments***) 208 exception.__cause__ = replace_context 210 try: --> 211 raise exception 212 finally: 213 # credit to 214 # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/ 215 # as the __traceback__ object creates a cycle 216 del exception, replace_context, from_, with_traceback File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\base.py:1705, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1703 break 1704 if not evt_handled: -> 1705 self.dialect.do_execute( 1706 cursor, statement, parameters, context 1707 ) 1709 if self._has_events or self.engine._has_events: 1710 self.dispatch.after_cursor_execute( 1711 self, 1712 cursor, (...) 1716 context.executemany, 1717 ) File d:\Anaconda3\envs\mybase\Lib\site-packages\sqlalchemy\engine\default.py:681, in DefaultDialect.do_execute(self, cursor, statement, parameters, context) 680 def do_execute(self, cursor, statement, parameters, context=None): --> 681 cursor.execute(statement, parameters) InternalError: (psycopg2.errors.InternalError_) ST_Equals: Operation on mixed SRID geometries (Point, 4326) != (Point, 26918) [SQL: SELECT ST_Equals( ST_GeomFromText('POINT(0 0)', 4326), ST_GeomFromText('POINT(0 0)', 26918) )] (Background on this error at: http://sqlalche.me/e/14/2j85)
Be careful of getting too happy with using ST_Transform
for on-the-fly conversion.
- Spatial indexes are built using
SRID
of the stored geometries.- If comparison are done in a different
SRID
, spatial indexes are (often) not used. It is best practice to choose oneSRID
for all the tables in your database. - Only use the transformation function when you are reading or writing data to external applications.
- If comparison are done in a different
Transforming Data¶
If we return to our proj4
definition for SRID 26918
, we can see that our working projection is UTM (Universal Transverse Mercator) of zone 18, with meters as the unit of measurement.
+proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
Let’s convert some data from our working projection to geographic coordinates – also known as “longitude/latitude”.
To convert data from one SRID
to another
- you must first verify that your geometry has a valid
SRID
. - Since we have already confirmed a valid
SRID
, we next need theSRID
of the projection to transform into. In other words, what is theSRID
of geographic coordinates?
The most common SRID for geographic coordinates is 4326
, which corresponds to “longitude/latitude on the WGS84 spheroid”. You can see the definition at the spatialreference.org site:
You can also pull the definitions from the spatial_ref_sys
table:
%%sql
SELECT srtext FROM spatial_ref_sys
WHERE srid = 4326;
* postgresql://postgres:***@localhost/nyc 1 rows affected.
srtext |
---|
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]] |
Let’s convert the coordinates of the ‘Broad St’ subway station into geographics:
%%sql
SELECT ST_AsText(geom)
FROM nyc_subway_stations
WHERE name = 'Broad St';
* postgresql://postgres:***@localhost/nyc 1 rows affected.
st_astext |
---|
POINT(583571.9059213118 4506714.341192182) |
%%sql
SELECT ST_AsText(ST_Transform(geom,4326))
FROM nyc_subway_stations
WHERE name = 'Broad St';
* postgresql://postgres:***@localhost/nyc 1 rows affected.
st_astext |
---|
POINT(-74.01067146887341 40.70710481558761) |
If you load data or create a new geometry without specifying an SRID
, the SRID
value will be 0.
- Recall in geometries, that when we created our geometries table we didn’t specify an
SRID
. - If we query our database, we should expect all the
nyc_
tables to have anSRID
of26918
- while the geometries table defaulted to an
SRID
of0
.
To view a table’s SRID
assignment, query the database’s geometry_columns table.
%%sql
SELECT f_table_name AS name, srid
FROM geometry_columns;
* postgresql://postgres:***@localhost/nyc 6 rows affected.
name | srid |
---|---|
nyc_subway_stations | 26918 |
nyc_neighborhoods | 26918 |
nyc_census_blocks | 26918 |
nyc_homicides | 26918 |
nyc_streets | 26918 |
geometries | 0 |
However, if you know what the SRID
of the coordinates is supposed to be, you can set it post-facto, using ST_SetSRID
on the geometry. Then you will be able to transform the geometry into other systems.
%%sql
SELECT ST_AsText(
ST_Transform(
ST_SetSRID(geom,26918),
4326)
)
FROM geometries;
* postgresql://postgres:***@localhost/nyc 5 rows affected.
st_astext |
---|
POINT(-79.48874388438705 0) |
LINESTRING(-79.48874388438705 0,-79.48873492539037 0.000009019375921,-79.48872596639353 0.000009019376033,-79.48872596639369 0.000018038752065) |
POLYGON((-79.48874388438705 0,-79.4887349253903 0,-79.48873492539037 0.000009019375921,-79.4887438843871 0.00000901937581,-79.48874388438705 0)) |
POLYGON((-79.48874388438705 0,-79.48865429441472 0,-79.48865429442024 0.000090193769243,-79.48874388439259 0.000090193758097,-79.48874388438705 0),(-79.48873492539037 0.000009019375921,-79.48873492539053 0.000018038751842,-79.48872596639369 0.000018038752065,-79.48872596639353 0.000009019376033,-79.48873492539037 0.000009019375921)) |
GEOMETRYCOLLECTION(POINT(-79.48872596639346 0),POLYGON((-79.48874388438705 0,-79.4887349253903 0,-79.48873492539037 0.000009019375921,-79.4887438843871 0.00000901937581,-79.48874388438705 0))) |
Function List¶
ST_AsText
: Returns the Well-Known Text (WKT) representation of the geometry/geography without SRID metadata.ST_SetSRID(geometry, srid)
: Sets the SRID on a geometry to a particular integer value.ST_SRID(geometry)
: Returns the spatial reference identifier for the ST_Geometry as defined in spatial_ref_sys table.ST_Transform(geometry, srid)
: Returns a new geometry with its coordinates transformed to the SRID referenced by the integer parameter.