01-PostgreSQL-Installation
What Can PostgreSQL/PostGIS Offer You?¶
Integration with Statistical Tools
PostgreSQL
/PostGIS
can seamlessly integrate with statistical packages likeR
, enabling you to write database procedures inProcedural Language(PL)/R
.- Extend functionality further with
PL/Python
orPL/JavaScript
, allowing direct interaction between spatial data and your favorite programming environments.
Spatial Analysis Beyond GIS
- While traditionally seen as a tool for
Geographic Information Systems (GIS)
,PostGIS
excels in spatial analysis. - Spatial analysis explores space and use of space, such as:
- Mapping undefined or unconventional locations (e.g., brain regions or nuclear plant sites).
- Visualizing complex, inherently non-visual models (e.g., process modeling).
- While traditionally seen as a tool for
Built-in Statistical Power
- Relational databases are ideal for managing large volumes of data, offering built-in
statistical/rollup functions
for quick summaries and detailed analyses. PostGIS
enhances spatial data visualization and analysis by adding geographic context to your research.
- Relational databases are ideal for managing large volumes of data, offering built-in
Ease of Data Collection
PostGIS
supports data input from sources likeGPS
, remote sensing devices, andIoT systems
.- With the decreasing cost of data collection tools like smartphones and drones,
PostGIS
helps process and store this data efficiently for scientific research.
Data Distribution Made Simple
- Use
PostGIS
to createdatamarts
ordata dispensers
for:- Easy sharing of research data subsets.
- Public downloads for broader accessibility.
- Use
Database and Spatial Database¶
What's Database¶
A
database
organizes and stores data for efficient management in support for multiple uses.A
database
is an abstraction of reality.A
database life cycle
is like a life cycle for software engineering design and consists of the following six stages, and it's called the waterfall modelAnalysis
of objectives and user requirements: data and functionDesign
of a database: conceptual design, logical design, and physical designImplementation
of a database management system (DBMS
): installation of software, populating data, customizing user interfaceTesting
usability, applications, and liabilityDevelopment
Maintenance
of hardware, software, data, and applications
A
database
specifies what data can be stored, the systematic encoding (e.g.data structure
) and organization (e.g.data model
) of the data, and functions for data management.Data model
: representation of reality
The stage of
conceptual database design
creates aconceptual data model
, such asEntity-Relationship (ER) diagrams
.A
relationship
describes the association between twoentities
.Relationships have
cardinality
that defines the number of instances from one entity associated with the number of instances in the other entity.- one to one
- one to many
- many to many
Example of
ERD
Data schema
: data organization and classificationData structure
: organizing (i.e., indexing) and encoding (i.e., ordering) of data values (linked lists, arrays, trees, graphs, hashing, etc.).- In
GIS
, data structures also includespatial partitioning
.
- In
Data type
: the type of data values: integers, floats, text (orchar
), points, lines, polygons, multi-polygons, rings,BLOB
, etc.- Three fundamental dataset types in a
geodatabase
- Three fundamental dataset types in a
A database management system shall include a
data definition language (DDL)
and adata manipulation language (DML)
.Relational database management systems (RDBMS)
commonly usestructural query language (SQL)
for bothDDL
tocreate
,drop
,alter
,truncate
,rename
a table andDML
toselect
,insert
,update
, anddelete
data entries.
Spatial Database¶
Spatial database
s are not just databases containing spatial data; they explicitly include location information with 2D or 3D coordinates.- The 2D or 3D coordinates make possible spatial representation of the information, and the spatial information enable the calculation of spatial measurements (e.g.
geometry
) and spatial relationships (e.g.topology
).
- The 2D or 3D coordinates make possible spatial representation of the information, and the spatial information enable the calculation of spatial measurements (e.g.
Setting Up Your First Spatial Database¶
Creating a Database Using psql
(Command Line)¶
Step 1: Create the Database
The first step is to create a new database. Open your
psql
and run the following command:This command creates a new database called
postgis_in_action
. You can name it anything, but keeping names meaningful helps with organization.
CREATE DATABASE postgis_in_action;
Step 2: Connect to the Database
Once the database is created, connect to it by typing:
This command switches your session to the newly created database. You’re now ready to enable
PostGIS
.
\connect postgis_in_action
Step 3: Enable PostGIS
- To enable PostGIS, run the following commands:
CREATE SCHEMA postgis; GRANT USAGE ON schema postgis TO public; CREATE EXTENSION postgis SCHEMA postgis; ALTER DATABASE postgis_in_action SET search_path=public,postgis,contrib;
Explanation of Commands:
CREATE SCHEMA postgis
: This creates a separate namespace, or "schema," within the database to storePostGIS
-related functions and data.GRANT USAGE ON schema postgis TO public
: This grants all database users access to thePostGIS
schema.CREATE EXTENSION postgis SCHEMA postgis
: Installs thePostGIS
extension and places its components in thepostgis
schema.ALTER DATABASE postgis_in_action SET search_path=public,postgis,contrib
: Sets the default search path, ensuringPostGIS
functions and data types are easily accessible without needing to specify the schema every time.
Why Add to Search Path?
The
search path
determines the order in whichPostgreSQL
looks for objects (tables, functions, etc.) when they are referenced in queries. By includingpostgis
in thesearch path
, you ensurePostGIS
functions are always found without explicitly prefixing them withpostgis.
in your queries.
Creating a Database Using pgAdmin
¶
If you prefer a graphical user interface, you can use pgAdmin
to create the database and enable PostGIS
. Follow these steps:
Step 1: Create the Database
- Open
pgAdmin
and log in to yourPostgreSQL
server. - Right-click on
Databases
in the tree view and selectCreate > Database
. - Enter
postgis_in_action
as the database name and clickSave
.
- Open
Step 2: Enable PostGIS
- Connect to your new database by refreshing the tree and selecting it.
- Open the query tool (
SQL Editor
) and run the following commands:
CREATE SCHEMA postgis; GRANT USAGE ON schema postgis TO public; CREATE EXTENSION postgis SCHEMA postgis; ALTER DATABASE postgis_in_action SET search_path=public,postgis,contrib;
Step 3: Verify Installation
- Refresh the database tree.
- Navigate to
Schemas > postgis > Functions
to see a list of PostGIS functions.
Loading spatial data and table¶
Importing the Shapefile using PostGIS Shapefile Import/Export Manager
¶
Step 1: Open the Shapefile Import/Export Manager
- Open
pgAdmin
and connect to yourPostgreSQL
database (postgis_in_action
). - In the toolbar, look for
PostGIS Shapefile Import/Export Manager
. If you don’t see it, ensure you’ve installed thePostGIS
extension correctly.
- Open
Step 2: Import the Shapefile
- Open the
Shapefile Import/Export Manager
. - In the
Shapefile
field, browse and selectUS_tract_2020.shp
. - Set the
SRID
to102003
andGeo Column
togeometry
(or theSRID
andGeo Column
match your shapefile). - Choose a target schema (e.g.,
postgis
) and set a table name (e.g.,us_tracts
). - Click
Import
. The tool will:- Create a new table.
- Insert shapefile data into the table.
- Create a spatial index for faster querying.
- Open the
Step 3: Verify the Import
Check if the shapefile was imported successfully:
- Open the query tool in
pgAdmin
. - Run the following query:
SELECT * FROM us_tract_2020 LIMIT 5;
- Open the query tool in
Importing the CSV File¶
Step 1: Create the Table
- This SQL script defines the
schema
for the data described in the metadata:
CREATE TABLE nhgis_tract_data ( GEO_ID VARCHAR(20) PRIMARY KEY, -- Census Geographic Area Identifier (Primary Key) GISJOIN VARCHAR(20), -- GIS Join Match Code "YEAR" VARCHAR(10), -- Data File Year (2018-2022) STUSAB VARCHAR(20), -- State Abbreviation STATE VARCHAR(50), -- State Name STATEA VARCHAR(50), -- State Code COUNTY VARCHAR(50), -- County Name COUNTYA INTEGER, -- County Code TRACTA VARCHAR(20), -- Census Tract Code TL_GEO_ID VARCHAR(20), -- TIGER/Line Shapefile Geographic Area Identifier NAME_E VARCHAR(255), -- Full Geographic Area Name (Estimates) AQNFE001 INTEGER, -- Total Population AQNGE001 INTEGER, -- Total Race Population AQNGE002 INTEGER, -- White Population AQNGE003 INTEGER, -- Black or African American Population AQNGE004 INTEGER, -- American Indian/Alaska Native Population AQNGE005 INTEGER, -- Asian Population AQNGE006 INTEGER, -- Native Hawaiian/Other Pacific Islander Population AQNGE007 INTEGER, -- Some Other Race Population AQNGE008 INTEGER, -- Two or More Races Population AQNGE009 INTEGER, -- Two Races Including Some Other Race AQNGE010 INTEGER, -- Two Races Excluding Some Other Race AQPKE001 INTEGER, -- Total Population (Educational Attainment) AQPKE002 INTEGER, -- No Schooling Completed AQPKE003 INTEGER, -- Nursery School AQPKE004 INTEGER, -- Kindergarten AQPKE005 INTEGER, -- 1st Grade AQPKE006 INTEGER, -- 2nd Grade AQPKE007 INTEGER, -- 3rd Grade AQPKE008 INTEGER, -- 4th Grade AQPKE009 INTEGER, -- 5th Grade AQPKE010 INTEGER, -- 6th Grade AQPKE011 INTEGER, -- 7th Grade AQPKE012 INTEGER, -- 8th Grade AQPKE013 INTEGER, -- 9th Grade AQPKE014 INTEGER, -- 10th Grade AQPKE015 INTEGER, -- 11th Grade AQPKE016 INTEGER, -- 12th Grade, No Diploma AQPKE017 INTEGER, -- Regular High School Diploma AQPKE018 INTEGER, -- GED or Alternative Credential AQPKE019 INTEGER, -- Some College, Less Than 1 Year AQPKE020 INTEGER, -- Some College, 1 or More Years, No Degree AQPKE021 INTEGER, -- Associate’s Degree AQPKE022 INTEGER, -- Bachelor’s Degree AQPKE023 INTEGER, -- Master’s Degree AQPKE024 INTEGER, -- Professional School Degree AQPKE025 INTEGER, -- Doctorate Degree NAME_M VARCHAR(255), -- Full Geographic Area Name (Margins of Error) AQNGM002 INTEGER, -- White Alone (Margin of Error) AQNGM003 INTEGER, -- Black or African American Alone (Margin of Error) AQNGM004 INTEGER, -- American Indian/Alaska Native Alone (Margin of Error) AQNGM005 INTEGER, -- Asian Alone (Margin of Error) AQNGM006 INTEGER, -- Native Hawaiian/Other Pacific Islander Alone (Margin of Error) AQNGM007 INTEGER, -- Some Other Race Alone (Margin of Error) AQNGM008 INTEGER, -- Two or More Races (Margin of Error) AQNGM009 INTEGER, -- Two Races Including Some Other Race (Margin of Error) AQNGM010 INTEGER, -- Two Races Excluding Some Other Race (Margin of Error) AQPKM001 INTEGER, -- Total Population (Margin of Error) AQPKM002 INTEGER, -- No Schooling Completed (Margin of Error) AQPKM003 INTEGER, -- Nursery School (Margin of Error) AQPKM004 INTEGER, -- Kindergarten (Margin of Error) AQPKM005 INTEGER, -- 1st Grade (Margin of Error) AQPKM006 INTEGER, -- 2nd Grade (Margin of Error) AQPKM007 INTEGER, -- 3rd Grade (Margin of Error) AQPKM008 INTEGER, -- 4th Grade (Margin of Error) AQPKM009 INTEGER, -- 5th Grade (Margin of Error) AQPKM010 INTEGER, -- 6th Grade (Margin of Error) AQPKM011 INTEGER, -- 7th Grade (Margin of Error) AQPKM012 INTEGER, -- 8th Grade (Margin of Error) AQPKM013 INTEGER, -- 9th Grade (Margin of Error) AQPKM014 INTEGER, -- 10th Grade (Margin of Error) AQPKM015 INTEGER, -- 11th Grade (Margin of Error) AQPKM016 INTEGER, -- 12th Grade, No Diploma (Margin of Error) AQPKM017 INTEGER, -- Regular High School Diploma (Margin of Error) AQPKM018 INTEGER, -- GED or Alternative Credential (Margin of Error) AQPKM019 INTEGER, -- Some College, Less Than 1 Year (Margin of Error) AQPKM020 INTEGER, -- Some College, 1 or More Years, No Degree (Margin of Error) AQPKM021 INTEGER, -- Associate’s Degree (Margin of Error) AQPKM022 INTEGER, -- Bachelor’s Degree (Margin of Error) AQPKM023 INTEGER, -- Master’s Degree (Margin of Error) AQPKM024 INTEGER, -- Professional School Degree (Margin of Error) AQPKM025 INTEGER -- Doctorate Degree (Margin of Error) );
- This SQL script defines the
Step 2: Import the CSV File using
pgAdmin
- Open
pgAdmin
and connect to your database (postgis_in_action
). - Navigate to the
nhgis_tract_data
table. - Right-click on the table and select
Import/Export
. - Choose the file format as
CSV
, select the file path, and check the Header option. - Click
OK
to load the data.
- Open
Step 3: Verify the Import
Run the following query to confirm the data has been successfully imported:
SELECT * FROM nhgis_tract_data LIMIT 10;
Basic SQL¶
SQL (Structured Query Language)
: SQL
is a standard language for managing relational databases. It provides a set of commands for performing various operations such as querying data, updating data, creating and modifying database schema, and managing access controls.
Key Types in SQL¶
Numeric Types¶
INTEGER
(orINT
): Represents whole numbers.NUMERIC
(orDECIMAL
): Represents fixed-point numbers with exact precision.CREATE TABLE Products ( ProductID SERIAL PRIMARY KEY, -- Auto-incrementing integer Price NUMERIC(10, 2) -- Fixed-point number with 10 digits in total, 2 after the decimal );
String Types¶
VARCHAR
: Represents variable-length character strings with a maximum length specified.TEXT
: Represents variable-length character strings with a maximum length that can be extremely large.CREATE TABLE Customers ( CustomerID SERIAL PRIMARY KEY, -- Auto-incrementing integer FirstName VARCHAR(50), -- Variable-length string up to 50 characters LastName VARCHAR(50), -- Variable-length string up to 50 characters Address TEXT -- Large variable-length text );
Date/Time Types¶
DATE
: Represents a date value without time.TIMESTAMP
: Represents date and time values.CREATE TABLE Orders ( OrderID SERIAL PRIMARY KEY, -- Auto-incrementing integer OrderDate DATE, -- Date only (e.g., '2025-01-01') LastUpdated TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Date and time, with a default value );
Boolean Type¶
BOOLEAN
: Represents a boolean value (TRUE
,FALSE
, orNULL
).CREATE TABLE Employees ( EmployeeID SERIAL PRIMARY KEY, -- Auto-incrementing integer IsActive BOOLEAN DEFAULT TRUE -- Boolean value, with a default of TRUE );
Database normalization¶
Database normalization
is a process used to organize a relational database into tables and columns to minimize redundancy
and dependency
. It aims to reduce data anomalies such as insertion, update, and deletion anomalies, which can occur when data is not properly organized.
Let's illustrate normalization with a simple example:
Suppose we have a database to store information about students
and the courses
they are enrolled in. Initially, we might design a single table like this:
StudentCourses Table:
StudentID | StudentName | Course1 | Course2 | Course3 |
---|---|---|---|---|
1 | Alice | Math | Physics | Chemistry |
2 | Bob | Physics | Biology | |
3 | Carol | Chemistry |
In this table, each row represents a student
, and each column represents a course
they are enrolled in. However, this table violates the principles of normalization because:
- Data Redundancy: The
course
names are repeated in multiple rows, leading to redundancy. - Insertion Anomaly: If a
student
wants to enroll in morecourses
than the available columns, we need to modify the table structure. - Deletion Anomaly: If we delete a course from the database, we may lose information about which
students
were enrolled in that course. - Update Anomaly: If we update the name of a
course
, we need to update it in multiple places, which can lead to inconsistencies.
To normalize this database, we can break it down into multiple tables. We'll create separate tables for Students
and Courses
and introduce a linking table to represent the many-to-many
relationship between them:
In this normalized schema:
- Students Table stores information about
students
. - Courses Table stores information about
courses
. - Enrollments Table represents the
many-to-many
relationship betweenstudents
andcourses
, indicating which student is enrolled in which course.
Students Table:
StudentID | StudentName |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Courses Table:
CourseID | CourseName |
---|---|
1 | Math |
2 | Physics |
3 | Chemistry |
4 | Biology |
Enrollments Table:
StudentID | CourseID |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 2 |
2 | 4 |
3 | 3 |
DDL (Data Definition Language)¶
DDL
is used to define the structure and schema of a database. It includes commands for creating, modifying, and deleting database objects such as tables
, indexes
, views
, and schemas
.
Common DDL commands include CREATE
, ALTER
, DROP
, TRUNCATE
, and RENAME
.
CREATE
¶
The
CREATE
command is used to create new database objects such astables
,indexes
,views
, etc.This command creates a new
table
namedEmployees
with columnsEmployeeID
,FirstName
,LastName
, andDepartment
.CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), Department VARCHAR(50) );
ALTER
¶
The
ALTER
command is used to modify existing database objects.This command adds a new column named
Salary
to theEmployees
table.ALTER TABLE Employees ADD Salary DECIMAL(10, 2);
DROP
¶
The
DROP
command is used to delete existing database objects.This command deletes the
Employees
table and all its data from the database.DROP TABLE Employees;
TRUNCATE
¶
The
TRUNCATE
command is used to delete all rows from a table but keeps the table structure intact.This command removes all rows from the
Employees
table, but the table structure remains.TRUNCATE TABLE Employees;
RENAME
¶
The
RENAME
command is used to rename existing database objects.This command renames the column
FirstName
in theEmployees
table toFirst_Name
.ALTER TABLE Employees RENAME COLUMN FirstName TO First_Name;
CAST
and ::
¶
The
CAST
command is used to explicitly specify the data type to which you want to convert a value.SELECT CAST('10' AS INT); SELECT '10'::INT; -- Same SELECT * FROM Employees WHERE CAST(Salary AS DECIMAL) > 50000;
DML (Data Manipulation Language)¶
DML
is used to manipulate data stored in the database. It includes commands for inserting
, updating
, deleting
, and querying
data within tables.
Common DML commands include INSERT
, UPDATE
, DELETE
, and SELECT
.
INSERT¶
The
INSERT
command is used to add new rows to a table.CREATE TABLE Employees ( EmployeeID SERIAL PRIMARY KEY, FirstName VARCHAR(50), LastName VARCHAR(50), Department VARCHAR(50), Salary NUMERIC(10, 2) ); INSERT INTO Employees (FirstName, LastName, Department, Salary) VALUES ('John', 'Doe', 'IT', 75000), ('Jane', 'Smith', 'HR', 65000), ('Alice', 'Johnson', 'Finance', 80000);
Output Table:
| EmployeeID | FirstName | LastName | Department | Salary | |------------|-----------|----------|------------|----------| | 1 | John | Doe | IT | 75000.00 | | 2 | Jane | Smith | HR | 65000.00 | | 3 | Alice | Johnson | Finance | 80000.00 |
UPDATE¶
The
UPDATE
command is used to modify existing rows in a table.UPDATE Employees SET Salary = 85000 WHERE EmployeeID = 1;
Output Table:
| EmployeeID | FirstName | LastName | Department | Salary | |------------|-----------|----------|------------|----------| | 1 | John | Doe | IT | 85000.00 | | 2 | Jane | Smith | HR | 65000.00 | | 3 | Alice | Johnson | Finance | 80000.00 |
DELETE¶
The
DELETE
command is used to remove rows from a table.DELETE FROM Employees WHERE EmployeeID = 2;
Output Table:
| EmployeeID | FirstName | LastName | Department | Salary | |------------|-----------|----------|------------|----------| | 1 | John | Doe | IT | 85000.00 | | 3 | Alice | Johnson | Finance | 80000.00 |
SELECT¶
The
SELECT
command is used to query data from a table.SELECT * FROM Employees;
Output Table:
| EmployeeID | FirstName | LastName | Department | Salary | |------------|-----------|----------|------------|----------| | 1 | John | Doe | IT | 85000.00 | | 3 | Alice | Johnson | Finance | 80000.00 |
Operators¶
In SQL
, operators are used to perform operations on data within queries. There are various types of operators in SQL
, including arithmetic operators
, comparison operators
, logical operators
, and more. Here's an introduction to some common SQL operators with examples:
Arithmetic Operators¶
- Addition (
+
), Subtraction (-
), Multiplication (*
), Division (/
), and Modulus (%
).
SELECT
10 + 5 AS Addition,
10 - 5 AS Subtraction,
10 * 5 AS Multiplication,
10 / 5 AS Division,
10 % 3 AS Modulus;
Comparison Operators¶
Equal to (=
), Not equal to (<>
or !=
), Greater than (>
), Less than (<
), Greater than or equal to (>=
), Less than or equal to (<
=).
SELECT * FROM Products WHERE Price > 100;
CONCAT
and SUBSTRING
Operator¶
Concatenates two or more strings together.
SELECT CONCAT(FirstName, ' ', LastName) AS FullName FROM Customers;
-- OR using ||
SELECT FirstName || ' ' || LastName AS FullName FROM Customers;
IN
Operator¶
Checks if a value matches any value in a list.
SELECT * FROM Products WHERE Category IN ('Electronics', 'Appliances');
BETWEEN
Operator¶
Checks if a value is within a specified range.
SELECT * FROM Orders WHERE OrderDate BETWEEN '2022-01-01' AND '2022-01-31';
LIKE
Operator¶
Compares a value to similar values using wildcard
characters (%
for zero or more characters, _
for a single character).
SELECT * FROM Products WHERE ProductName LIKE 'Apple%';
IS NULL
/ IS NOT NULL
Operators¶
Checks if a value is NULL or not NULL.
SELECT * FROM Employees WHERE ManagerID IS NULL;
SELECT * FROM Employees WHERE ManagerID IS NOT NULL;
ANY
/ ALL
Operators¶
The
ANY
operator returnstrue
if the comparison istrue
for at least one of the values returned by the subquery.SELECT * FROM Products WHERE Price > ANY (SELECT Price FROM SpecialOffers);
- This query selects all
Products
with aPrice
greater than anyPrice
listed in theSpecialOffers
table.
- This query selects all
The
ALL
operator returnstrue
if the comparison istrue
for all the values returned by the subquery.SELECT * FROM Orders WHERE TotalAmount > ALL (SELECT Budget FROM Departments WHERE DepartmentName = 'Sales');
- This query selects all
orders
with aTotalAmount
greater than theBudget
of the Sales department.
- This query selects all
What Is the Average Median Household Income Across All Census Tracts?¶
To calculate the average median household income:
SELECT AVG(aqpke001) AS avg_median_income
FROM nhgis_tract_data
WHERE aqpke001 IS NOT NULL;
Which Census Tract Has the Highest Percentage of Asian Population?¶
To determine the tract with the highest percentage of the Asian population:
SELECT GISJOIN, GEO_ID, AQNGE005, AQNFE001,
(AQNGE005::FLOAT / AQNFE001 * 100) AS asian_percentage
FROM nhgis_tract_data
WHERE AQNFE001 > 0 -- Ensure total population is not zero
ORDER BY asian_percentage DESC
LIMIT 1;
What Are the Top 5 Census Tracts by Educational Attainment (Bachelor’s Degree)?¶
To identify the tracts with the most residents holding a bachelor’s degree:
SELECT GISJOIN, GEO_ID, AQPKE022 AS bachelors_degree
FROM nhgis_tract_data
ORDER BY AQPKE022 DESC
LIMIT 5;
What Is the Total Population by County?¶
Find Top5 total population at the county level:
SELECT COUNTY, COUNTYA, SUM(AQNFE001) AS total_population
FROM nhgis_tract_data
GROUP BY COUNTY, COUNTYA
ORDER BY total_population DESC
LIMIT 5;
Which Census Tracts Have More Than 80% of the Population Identifying as Asian?¶
To find tracts where the majority population identifies as Asian:
SELECT GISJOIN, GEO_ID, AQNGE003, AQNFE001,
(AQNGE005::FLOAT / AQNFE001 * 100) AS asian_percentage
FROM nhgis_tract_data
WHERE AQNFE001 > 0 AND (AQNGE005::FLOAT / AQNFE001) > 0.8;