--- title: Geo-Data Intake and Operations keywords: fastai sidebar: home_sidebar summary: "This notebook was made to demonstrate how to work with geographic data." description: "This notebook was made to demonstrate how to work with geographic data." nb_path: "notebooks/03_Map_Basics_Intake_and_Operations.ipynb" ---
{% raw %}
/content/drive/My Drive/Sites/dataplay/dataplay/acsDownload.py:27: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('display.max_colwidth', -1)
/usr/local/lib/python3.7/dist-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
{% endraw %}

This Coding Notebook is the third in a series.

An Interactive version can be found here Open In Colab.

This colab and more can be found on our webpage.

  • Content covered in previous tutorials will be used in later tutorials.

  • New code and or information should have explanations and or descriptions attached.

  • Concepts or code covered in previous tutorials will be used without being explaining in entirety.

  • The Dataplay Handbook development techniques covered in the Datalabs Guidebook

  • If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.

  • This notebook has been optimized for Google Colabs ran on a Chrome Browser.

  • Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.

Binder Binder Binder Open Source Love svg3

NPM License Active Python Versions GitHub last commit No Maintenance Intended

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

About this Tutorial:

Whats Inside?

The Tutorial

In this notebook, the basics of working with geographic data are introduced.

  • Reading in data (points/ geoms) -- Convert lat/lng columns to point coordinates -- Geocoding address to coordinates -- Changing coordinate reference systems -- Connecting to PostGisDB's
  • Basic Operations
  • Saving shape data
  • Get Polygon Centroids
  • Working with Points and Polygons -- Map Points and Polygons -- Get Points in Polygons -- Create Choropleths -- Create Heatmaps (KDE?)

Objectives

By the end of this tutorial users should have an understanding of:

  • How to read in and process geo-data asa geo-dataframe.
  • The Coordinate Reference System and Coordinate Encoding
  • Basic geo-visualization strategies

Background

Datatypes and Geo-data

Geographic data must be encoded properly order to attain the full potential of the spatial nature of your geographic data.

If you have read in a dataset using pandas it's data type will be a Dataframe.

It may be converted into a Geo-Dataframe using Geopandas as demonstrated in the sections below.

You can check a variables at any time using the dtype command:

yourGeoDataframe.dtype

Coordinate Reference Systems (CRS)

Make sure the appropriate spatial Coordinate Reference System (CRS) is used when reading in your data!

ala wiki:

A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities

CRS 4326 is the CRS most people are familar with when refering to latiude and longitudes.

Baltimore's 4326 CRS should be at (39.2, -76.6)

BNIA uses CRS 2248 internally Additional Information: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

Ensure your geodataframes' coordinates are using the same CRS using the geopandas command:

yourGeoDataframe.CRS

Coordinate Encoding

When first recieving a spatial dataset, the spatial column may need to be encoded to convert its 'text' data type values into understood 'coordinate' data types before it can be understood/processed accordingly.

Namely, there are two ways to encode text into coordinates:

  • df[geom] = df[geom].apply(lambda x: loads( str(x) ))
  • df[geom] = [Point(xy) for xy in zip(df.x, df.y)]

The first approach can be used for text taking the form "Point(-76, 39)" and will encode the text too coordinates. The second approach is useful when creating a point from two columns containing lat/lng information and will create Point coordinates from the two columns.

More on this later

Raster Vs Vector Data

There exists two types of Geospatial Data, Raster and Vector. Both have different file formats.

This lab will only cover vector data.

Vector Data

Vector Data: Individual points stored as (x,y) coordinates pairs. These points can be joined to create lines or polygons.

Format of Vector data

Esri Shapefile — .shp, .dbf, .shx Description - Industry standard, most widely used. The three files listed above are needed to make a shapefile. Additional file formats may be included.

Geographic JavaScript Object Notation — .geojson, .json Description — Second most popular, Geojson is typically used in web-based mapping used by storing the coordinates as JSON.

Geography Markup Language — .gml Description — Similar to Geojson, GML has more data for the same amount of information.

Google Keyhole Markup Language — .kml, .kmz Description — XML-based and predominantly used for google earth. KMZ is a the newer, zipped version of KML.

Raster Data

Raster Data: Cell-based data where each cell represent geographic information. An Aerial photograph is one such example where each pixel has a color value

Raster Data Files: GeoTIFF — .tif, .tiff, .ovr ERDAS Imagine — .img IDRISI Raster — .rst, .rdc

Information Sourced From: https://towardsdatascience.com/getting-started-with-geospatial-works-1f7b47955438

Vector Data: Census Geographic Data:

Guided Walkthrough

SETUP:

Import Modules

{% raw %}
%%capture
! pip install geopy
! pip install geopandas
! pip install geoplot
! pip install dataplay
! pip install geopandas
{% endraw %} {% raw %}
from dataplay import geoms
{% endraw %} {% raw %}
{% endraw %} {% raw %}
import psycopg2
{% endraw %}

Configure Enviornment

{% raw %}
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# pd.set_option('display.expand_frame_repr', False)
# pd.set_option('display.precision', 2)
# pd.reset_option('max_colwidth')
pd.set_option('max_colwidth', 50)
# pd.reset_option('max_colwidth')
{% endraw %}

Retrieve GIS Data

As mentioned earlier:

When you use a pandas function to 'read-in' a dataset, the returned value is of a datatype called a 'Dataframe'.

We need a 'Geo-Dataframe', however, to effectively work with spatial data.

While Pandas does not support Geo-Dataframes; Geo-pandas does.

Geopandas has everything you love about pandas, but with added support for geo-spatial data.

Principle benefits of using Geopandas over Pandas when working with spatial data:

  • The geopandas plot function will now render a map by default using your 'spatial-geometries' column.
  • Libraries exist spatial-operations and interactive map usage.

There are many ways to have our spatial-data be read-in using geo-pandas into a geo-dataframe.

Namely, it means reading in Geo-Spatial-data from a:

  1. (.geojson or .shp) file directly using Geo-pandas
  2. (.csv, .json) file using Pandas and convert it to Geo-Pandas
    • using a prepared 'geometry' column
    • by transformting latitude and longitude columns into a 'geometry' column.
    • acquiring coordinates from an address
    • mapping your non-spatial-data to data-with-space
  3. Connecting to a DB

We will review each one below

Approach 1: Reading in Data Directly

If you are using Geopandas, direct imports only work with geojson and shape files.

spatial coordinate data is properly encoded with these types of files soas to make them particularly easy to use.

You can perform this using geopandas' read_file() function.

{% raw %}
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/1/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")
{% endraw %}

As you can see, the resultant variable is of type GeoDataFrame.

{% raw %}
type(csa_gdf)
geopandas.geodataframe.GeoDataFrame
{% endraw %}

GeoDataFrames are only possible when one of the columns are of a 'Geometry' Datatype

{% raw %}
csa_gdf.dtypes
OBJECTID            int64
CSA2010            object
hhchpov15         float64
hhchpov16         float64
hhchpov17         float64
hhchpov18         float64
hhchpov19         float64
Shape__Area       float64
Shape__Length     float64
geometry         geometry
dtype: object
{% endraw %}

Awesome. So that means, now you can plot maps all prety like:

{% raw %}
csa_gdf.plot(column='hhchpov15')
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5ef6187d0>
{% endraw %}

And now lets take a peak at the raw data:

{% raw %}
csa_gdf.head(1)
OBJECTID CSA2010 hhchpov15 hhchpov16 hhchpov17 hhchpov18 hhchpov19 Shape__Area Shape__Length geometry
0 1 Allendale/Irvington/S. Hilton 38.93 34.73 32.77 35.27 32.6 6.38e+07 38770.17 POLYGON ((-76.65726 39.27600, -76.65726 39.276...
{% endraw %}

I'll show you more ways to save the data later, but for our example in the next section to work, we need a csv.

We can make one by saving the geo-dataframe avove using the to_gdf function.

The spatial data will be stored in an encoded form that will make it easy to re-open up in the future.

{% raw %}
csa_gdf.to_csv('example.csv')
{% endraw %}

Approach 2: Converting Pandas into Geopandas

Approach 2: Method 1: Convert using a pre-formatted 'geometry' column

This approach loads a map using a geometry column

In our previous example, we saved a geo-dataframe as a csv.

Now lets re-open it up using pandas!

{% raw %}
url = "example.csv"
geom = 'geometry'
# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'} 

# Read in the dataframe
csa_gdf = pd.read_csv(url)
{% endraw %}

Great!

But now what?

Well, for starters, regardless of the project you are working on: It's always a good idea to inspect your data.

This is particularly important if you don't know what you're working with.

{% raw %}
csa_gdf.head(1)
Unnamed: 0 OBJECTID CSA2010 hhchpov15 hhchpov16 hhchpov17 hhchpov18 hhchpov19 Shape__Area Shape__Length geometry
0 0 1 Allendale/Irvington/S. Hilton 38.93 34.73 32.77 35.27 32.6 6.38e+07 38770.17 POLYGON ((-76.65725742964381 39.276002083707, ...
{% endraw %}

Take notice of how the geometry column has a special.. foramatting.

All spatial data must take on a similar form encoding for it to be properly interpretted as a spatial data-type.

As far as I can tell, This is near-identical to the table I printed out in our last example.

BUT WAIT!

You'll notice, that if I run the plot function a pretty map will not de-facto appear

{% raw %}
csa_gdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5da3f9dd0>
{% endraw %}

Why is this? Because you're not working with a geo-dataframe but just a dataframe!

Take a look:

{% raw %}
type(csa_gdf)
pandas.core.frame.DataFrame
{% endraw %}

Okay... So thats not right..

What can we do about this?

Well for one, our spatial data (in the geometry-column) is not of the right data-type even though it takes on the right form.

{% raw %}
csa_gdf.dtypes
Unnamed: 0         int64
OBJECTID           int64
CSA2010           object
hhchpov15        float64
hhchpov16        float64
hhchpov17        float64
hhchpov18        float64
hhchpov19        float64
Shape__Area      float64
Shape__Length    float64
geometry          object
dtype: object
{% endraw %}

Ok. So how do we change it? Well since it's already been properly encoded... You can convert a columns data-type from an object (or whatver) to a 'geometry' using the loads function.

In the example below, we convert the datatypes for all records in the 'geometry' column

{% raw %}
csa_gdf[geom] = csa_gdf[geom].apply(lambda x: loads( str(x) ))
{% endraw %}

Thats all! Now lets see the geometry columns data-type and the entire tables's data-type

{% raw %}
csa_gdf.dtypes
Unnamed: 0         int64
OBJECTID           int64
CSA2010           object
hhchpov15        float64
hhchpov16        float64
hhchpov17        float64
hhchpov18        float64
hhchpov19        float64
Shape__Area      float64
Shape__Length    float64
geometry          object
dtype: object
{% endraw %} {% raw %}
type(csa_gdf)
pandas.core.frame.DataFrame
{% endraw %}

As you can see, we have a geometry column of the right datatype, but our table is still only just a dataframe.

But now, you are ready to convert your entire pandas dataframe into a geo-dataframe.

You can do that by running the following function:

{% raw %}
csa_gdf = GeoDataFrame(csa_gdf, crs=crs, geometry=geom)
/usr/local/lib/python3.7/dist-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
{% endraw %}

Aaaand BOOM.

{% raw %}
csa_gdf.plot(column='hhchpov18')
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5da2ef390>
{% endraw %}

goes the dy-no-mite

{% raw %}
type(csa_gdf)
geopandas.geodataframe.GeoDataFrame
{% endraw %}

Approach 2: Method 2: Convert Column(s) to Coordinate

Approach 2: Method 1: A Generic Outline

This is the generic example but it will not work since no URL is given.

{% raw %}
# If your data has coordinates in two columns run this cell
# It will create a geometry column from the two.
# A public dataset is not provided for this example and will not run.

# Load DF HERE. Accidently deleted the link. Need to refind. 
# Just rely on example 2 for now. 
"""
exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
# exe_df = exe_df.replace(np.nan, 0, regex=True)

# An example of loading in an internal BNIA file
geometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]
exe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=crs, geometry=geometry)
"""
"\nexe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')\nexe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')\n# exe_df = exe_df.replace(np.nan, 0, regex=True)\n\n# An example of loading in an internal BNIA file\ngeometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]\nexe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=crs, geometry=geometry)\n"
{% endraw %}
Approach 2: Method 2: Example: Geoloom

Since I do not readily have a dataset with lat and long's I will have to make one.

We can split the coordinates from a geodataframe like so...

{% raw %}
# Table: Geoloom, 
# Columns:  
geoloom_gdf = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
geoloom_gdf['POINT_X'] = geoloom_gdf['geometry'].centroid.x
geoloom_gdf['POINT_Y'] = geoloom_gdf['geometry'].centroid.y
# Now lets just drop the geometry column and save it to have our example dataset. 
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
geoloom_gdf.to_csv('example.csv')
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:5: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  """
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:6: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  
{% endraw %}

The first thing you will want to do when given a dataset with a coordinates column is ensure its datatype.

{% raw %}
geoloom_df = pd.read_csv('example.csv')
# We already know the x and y columns because we just saved them as such.
geoloom_df['POINT_X'] = pd.to_numeric(geoloom_df['POINT_X'], errors='coerce')
geoloom_df['POINT_Y'] = pd.to_numeric(geoloom_df['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)

# And filter out for points only in Baltimore City. 
geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] > 39.3  ]
geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] < 39.5  ]
{% endraw %} {% raw %}
crs = {'init' :'epsg:2248'} 
geometry=[Point(xy) for xy in zip(geoloom_df['POINT_X'], geoloom_df['POINT_Y'])]
geoloom_gdf = gpd.GeoDataFrame( geoloom_df.drop(['POINT_X', 'POINT_Y'], axis=1), crs=crs, geometry=geometry)
# 39.2904° N, 76.6122°
/usr/local/lib/python3.7/dist-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
{% endraw %} {% raw %}
geoloom_gdf.head(1)
Unnamed: 0 OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments GlobalID geometry
3 4 5 Artists & Resources NaN Open Works Maker Space 1400 Greenmount Ave, Baltimore, MD, 21202, USA http://www.openworksbmore.com Alyce Myatt alycemyattconsulting@gmail.com One of Jane Brown's projects! 140e7db7-33f1-49cd-8133-b6f75dba5851 POINT (-76.60850 39.30593)
{% endraw %}

Heres a neat trick to make it more presentable, because those points mean nothing to me.

{% raw %}
ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')

# now plot our points over it.
geoloom_gdf.plot(ax=ax, color='red')

plt.show()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5da270050>
{% endraw %}
Approach 2: Method 3: Using a Crosswalk (Need Crosswalk on Esri)

When you want to merge two datasets that do not share a common column, it is often useful to create a 'crosswalk' file that 'maps' records between two datasets. We can do this to append spatial data when a direct merge is not readily evident.

Check out this next example where we pull ACS Census data and use its 'tract' column and map it to a community. We can then aggregate the points along a the communities they belong to and map it on a choropleth!

We will set up our ACS query variables right here for easy changing

{% raw %}
# Change these values in the cell below using different geographic reference codes will change those parameters
tract = '*'
county = '510' # '059' # 153 '510'
state = '24' #51

# Specify the download parameters the function will receieve here
tableId = 'B19049' # 'B19001'
year = '17'
saveAcs = True
{% endraw %}

And now we will call the function with those variables and check out the result

{% raw %}
import dataplay
from dataplay import acsDownload
import IPython
retrieve_acs_data = acsDownload.retrieve_acs_data
# from IPython.core.display import HTML
IPython.core.display.HTML("<style>.rendered_html th {max-width: 200px; overflow:auto;}</style>")
# state, county, tract, tableId, year, saveOriginal, save 
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
df.head(1)
df.to_csv('tracts_data.csv')
Number of Columns 5
B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Total B19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_years B19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_years B19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_years B19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_over state county tract
NAME
Census Tract 2710.02 38358 -666666666 34219 40972 37143 24 510 271002
{% endraw %} {% raw %}
df['tract'].dtype
dtype('int64')
{% endraw %}

As you can see, the tract column

{% raw %}
ls
24510_B19049_5y17_est.csv           CSA-to-Tract-2010.csv.2
24510_B19049_5y17_est_Original.csv  CSA-to-Tract-2010.csv.20
CSA-to-Tract-2010.csv               CSA-to-Tract-2010.csv.3
CSA-to-Tract-2010.csv.1             CSA-to-Tract-2010.csv.4
CSA-to-Tract-2010.csv.10            CSA-to-Tract-2010.csv.5
CSA-to-Tract-2010.csv.11            CSA-to-Tract-2010.csv.6
CSA-to-Tract-2010.csv.12            CSA-to-Tract-2010.csv.7
CSA-to-Tract-2010.csv.13            CSA-to-Tract-2010.csv.8
CSA-to-Tract-2010.csv.14            CSA-to-Tract-2010.csv.9
CSA-to-Tract-2010.csv.15            example.csv
CSA-to-Tract-2010.csv.16            Hhchpov.csv
CSA-to-Tract-2010.csv.17            Hhchpov.geojson
CSA-to-Tract-2010.csv.18            sample_data/
CSA-to-Tract-2010.csv.19            tracts_data.csv
{% endraw %} {% raw %}
!wget https://bniajfi.org/vs_resources/CSA-to-Tract-2010.csv
--2021-03-09 18:39:02--  https://bniajfi.org/vs_resources/CSA-to-Tract-2010.csv
Resolving bniajfi.org (bniajfi.org)... 172.67.185.89, 104.21.92.29, 2606:4700:3031::ac43:b959, ...
Connecting to bniajfi.org (bniajfi.org)|172.67.185.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8101 (7.9K) [text/csv]
Saving to: ‘CSA-to-Tract-2010.csv.21’

CSA-to-Tract-2010.c 100%[===================>]   7.91K  --.-KB/s    in 0s      

2021-03-09 18:39:02 (91.6 MB/s) - ‘CSA-to-Tract-2010.csv.21’ saved [8101/8101]

{% endraw %} {% raw %}
df['tract'].tail(1)
NAME
Baltimore City    10000
Name: tract, dtype: int64
{% endraw %} {% raw %}
crosswalk = pd.read_csv('CSA-to-Tract-2010.csv')
crosswalk.tail(1)
TRACTCE10 GEOID10 CSA2010
199 280500 24510280500 Oldtown/Middle East
{% endraw %} {% raw %}
Hhchpov = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/1/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")
Hhchpov = Hhchpov[['CSA2010', 'hhchpov15',	'hhchpov16',	'hhchpov17',	'hhchpov18', 'geometry']]
Hhchpov.to_file("Hhchpov.geojson", driver='GeoJSON')
Hhchpov.to_csv('Hhchpov.csv')
gpd.read_file("Hhchpov.geojson").head(1)
CSA2010 hhchpov15 hhchpov16 hhchpov17 hhchpov18 geometry
0 Allendale/Irvington/S. Hilton 38.93 34.73 32.77 35.27 POLYGON ((-76.65726 39.27600, -76.65726 39.276...
{% endraw %}

A simple example of how this would work

{% raw %}
df.merge(crosswalk, left_on='tract', right_on='TRACTCE10')
B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Total B19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_years B19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_years B19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_years B19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_over state county tract TRACTCE10 GEOID10 CSA2010
0 38358 -666666666 34219 40972 37143 24 510 271002 271002 24510271002 Greater Govans
1 42231 -666666666 46467 45484 18750 24 510 260402 260402 24510260402 Claremont/Armistead
2 135441 -666666666 188571 120865 91354 24 510 271200 271200 24510271200 North Baltimore/Guilford/Homeland
3 39479 11806 38393 56369 37031 24 510 280404 280404 24510280404 Allendale/Irvington/S. Hilton
4 44904 -666666666 51324 42083 37269 24 510 90100 90100 24510090100 Greater Govans
... ... ... ... ... ... ... ... ... ... ... ...
195 103889 123750 113500 37778 71250 24 510 230300 230300 24510230300 South Baltimore
196 30688 -666666666 32256 24722 37054 24 510 250207 250207 24510250207 Cherry Hill
197 33833 28750 45152 28333 37604 24 510 250303 250303 24510250303 Morrell Park/Violetville
198 38347 38466 39271 39324 27768 24 510 260202 260202 24510260202 Cedonia/Frankford
199 31097 -666666666 -666666666 35852 19424 24 510 260302 260302 24510260302 Belair-Edison

200 rows × 11 columns

{% endraw %} {% raw %}
ls
24510_B19049_5y17_est.csv           CSA-to-Tract-2010.csv.20
24510_B19049_5y17_est_Original.csv  CSA-to-Tract-2010.csv.21
CSA-to-Tract-2010.csv               CSA-to-Tract-2010.csv.3
CSA-to-Tract-2010.csv.1             CSA-to-Tract-2010.csv.4
CSA-to-Tract-2010.csv.10            CSA-to-Tract-2010.csv.5
CSA-to-Tract-2010.csv.11            CSA-to-Tract-2010.csv.6
CSA-to-Tract-2010.csv.12            CSA-to-Tract-2010.csv.7
CSA-to-Tract-2010.csv.13            CSA-to-Tract-2010.csv.8
CSA-to-Tract-2010.csv.14            CSA-to-Tract-2010.csv.9
CSA-to-Tract-2010.csv.15            example.csv
CSA-to-Tract-2010.csv.16            Hhchpov.csv
CSA-to-Tract-2010.csv.17            Hhchpov.geojson
CSA-to-Tract-2010.csv.18            sample_data/
CSA-to-Tract-2010.csv.19            tracts_data.csv
CSA-to-Tract-2010.csv.2
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data 
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.

# To create this dataset I had to commit a full outer join. 
# In this way geometries will be included even if there merge does not have a direct match. 
# What this will do is that it means at least one (near) empty record for each community will exist that includes (at minimum) the geographic information and name of a Community.
# That way if no point level information existed in the community, that during the merge the geoboundaries are still carried over.

# Primary Table
# Description: I created a public dataset from a google xlsx sheet 'Bank Addresses and Census Tract'.
# Table: FDIC Baltimore Banks
# Columns: Bank Name, Address(es), Census Tract
left_ds = 'tracts_data.csv'
left_col = 'tract'

# Crosswalk Table
# Table: Crosswalk Census Communities
# 'TRACT2010', 'GEOID2010', 'CSA2010'
crosswalk_ds = 'CSA-to-Tract-2010.csv'
use_crosswalk = True
crosswalk_left_col = 'TRACTCE10'
crosswalk_right_col = 'CSA2010'

# Secondary Table
# Table: Baltimore Boundaries => HHCHPOV
# 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
right_ds = 'Hhchpov.geojson'
right_col ='CSA2010'

interactive = True
merge_how = 'outer'

# reutrns a pandas dataframe
mergedf = merge.mergeDatasets( left_ds=left_ds, left_col=left_col, 
              crosswalk_ds=crosswalk_ds,
              crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
              right_ds=right_ds, right_col=right_col, 
              merge_how=merge_how, interactive = interactive )
 Loading Left Dataset
getData :  tracts_data.csv

 Loading Right Dataset
getData :  Hhchpov.geojson
getData Not Interactive: readInGeometryData: 

son
True

 Validating the merge_how Parameter

 Loading Crosswalk... 

 Left:  TRACTCE10  Right:  CSA2010 


getData :  CSA-to-Tract-2010.csv

 coerceForMerge: Left-Crosswalk
cols :  tract TRACTCE10
BEFORE COERCE :  int64 int64
AFTER COERCE int64 int64 TRACTCE10

 coerceForMerge: Right-Crosswalk
cols :  CSA2010 CSA2010
BEFORE COERCE :  object object
AFTER COERCE object object CSA2010


 End Crosswalk Update. Coerceing complete. Status:  True 
 


PERFORMING MERGE : LEFT->CROSSWALK
first_col :  tract int64
how:  CSA2010
second_col :  TRACTCE10 int64

 Local Column Values Not Matched 
[10000]
1

 Crosswalk Unique Column Values
[ 10100  10200  10300  10400  10500  20100  20200  20300  30100  30200
  40100  40200  60100  60200  60300  60400  70100  70200  70300  70400
  80101  80102  80200  80301  80302  80400  80500  80600  80700  80800
  90100  90200  90300  90400  90500  90600  90700  90800  90900 100100
 100200 100300 110100 110200 120100 120201 120202 120300 120400 120500
 120600 120700 130100 130200 130300 130400 130600 130700 130803 130804
 130805 130806 140100 140200 140300 150100 150200 150300 150400 150500
 150600 150701 150702 150800 150900 151000 151100 151200 151300 160100
 160200 160300 160400 160500 160600 160700 160801 160802 170100 170200
 170300 180100 180200 180300 190100 190200 190300 200100 200200 200300
 200400 200500 200600 200701 200702 200800 210100 210200 220100 230100
 230200 230300 240100 240200 240300 240400 250101 250102 250103 250203
 250204 250205 250206 250207 250301 250303 250401 250402 250500 250600
 260101 260102 260201 260202 260203 260301 260302 260303 260401 260402
 260403 260404 260501 260604 260605 260700 260800 260900 261000 261100
 270101 270102 270200 270301 270302 270401 270402 270501 270502 270600
 270701 270702 270703 270801 270802 270803 270804 270805 270901 270902
 270903 271001 271002 271101 271102 271200 271300 271400 271501 271503
 271600 271700 271801 271802 271900 272003 272004 272005 272006 272007
 280101 280102 280200 280301 280302 280401 280402 280403 280404 280500]
PERFORMING MERGE : LEFT->RIGHT
first_col :  CSA2010 object
how:  outer
second_col :  CSA2010 object
{% endraw %} {% raw %}
mergedf.dtypes
NAME                                                                                                                                 object
B19049_001E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Total                               int64
B19049_002E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_under_25_years          int64
B19049_003E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_25_to_44_years          int64
B19049_004E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_45_to_64_years          int64
B19049_005E_Median_household_income_in_the_past_12_months_(in_2017_inflation-adjusted_dollars)_--_Householder_65_years_and_over       int64
state                                                                                                                                 int64
county                                                                                                                                int64
tract                                                                                                                                 int64
CSA2010                                                                                                                              object
hhchpov15                                                                                                                           float64
hhchpov16                                                                                                                           float64
hhchpov17                                                                                                                           float64
hhchpov18                                                                                                                           float64
geometry                                                                                                                           geometry
dtype: object
{% endraw %} {% raw %}
# mergedf[geom] = mergedf[geom].apply(lambda x: loads( str(x) ) ) 

# Process the dataframe as a geodataframe with a known CRS and geom column 
mergedGdf = GeoDataFrame(mergedf, crs=in_crs, geometry=geom) 
{% endraw %} {% raw %}
mergedGdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8ded4d0>
{% endraw %}
Approach 2: Method 4: Geocoding Addresses and Landmarks to Coordinates

Sometimes (usually) we just don't have the coordinates of a place, but we do know it's address or that it is an established landmark.

In such cases we attempt 'geo-coding' these points in an automated manner.

While convenient, this process is error prone, so be sure to check it's work!

For this next example to take place, we need a dataset that has a bunch of addresses.

We can use the geoloom dataset from before in this example. We'll just drop geo'spatial data.

{% raw %}
geoloom = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
geoloom = geoloom.dropna(subset=['geometry'])
geoloom = geoloom.drop(columns=['geometry','GlobalID', 'POINT_X',	'POINT_Y'])
geoloom.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments
0 1 Artists & Resources None Joe Test 123 Market Pl, Baltimore, MD, 21202, USA
{% endraw %}

But if for whatever reason the link is down, you can use this example dataframe mapping just some of the many malls in baltimore.

{% raw %}
address_df = pd.DataFrame({ 
    'Location' : pd.Series([
    '100 N. Holliday St, Baltimore, MD 21202',
    '200 E Pratt St, Baltimore, MD',
    '2401 Liberty Heights Ave, Baltimore, MD',
    '201 E Pratt St, Baltimore, MD',
    '3501 Boston St, Baltimore, MD',
    '857 E Fort Ave, Baltimore, MD',
    '2413 Frederick Ave, Baltimore, MD'
  ]),
    'Address' : pd.Series([ 
    'Baltimore City Council',
    'The Gallery at Harborplace',
    'Mondawmin Mall',
    'Harborplace',
    'The Shops at Canton Crossing',
    'Southside Marketplace',
    'Westside Shopping Center'
  ])
})

address_df.head()
Location Address
0 100 N. Holliday St, Baltimore, MD 21202 Baltimore City Council
1 200 E Pratt St, Baltimore, MD The Gallery at Harborplace
2 2401 Liberty Heights Ave, Baltimore, MD Mondawmin Mall
3 201 E Pratt St, Baltimore, MD Harborplace
4 3501 Boston St, Baltimore, MD The Shops at Canton Crossing
{% endraw %}

You can use either the Location or Address column to perform the geo-coding on.

{% raw %}
address_df = geoloom.copy()
addrCol = 'Location'
{% endraw %}

This function takes a while. The less columns/data/records the faster it executes.

{% raw %}
# In this example we retrieve and map a dataset with no lat/lng but containing an address

# In this example our data is stored in the 'STREET' attribute
geometry = []
geolocator = Nominatim(user_agent="my-application")

for index, row in address_df.iterrows():
  # We will try and return an address for each Street Name
  try: 
      # retrieve the geocoded information of our street address
      geol = geolocator.geocode(row[addrCol], timeout=None)

      # create a mappable coordinate point from the response object's lat/lang values.
      pnt = Point(geol.longitude, geol.latitude)
      
      # Append this value to the list of geometries
      geometry.append(pnt)
      
  except: 
      # If no street name was found decide what to do here.
      # df.loc[index]['geom'] = Point(0,0) # Alternate method
      geometry.append(Point(0,0))
      
# Finally, we stuff the geometry data we created back into the dataframe
address_df['geometry'] = geometry
{% endraw %} {% raw %}
address_df.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, Baltimore, MD, 21202, USA POINT (-76.60681 39.28759)
{% endraw %}

Awesome! Now convert the dataframe into a geodataframe and map it!

{% raw %}
gdf = gpd.GeoDataFrame( address_df, geometry=geometry)
gdf = gdf[ gdf.centroid.y > 39.3  ]
gdf = gdf[ gdf.centroid.y < 39.5  ]
{% endraw %} {% raw %}
ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')

# now plot our points over it.
geoloom_gdf.plot(ax=ax, color='red')
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8ded790>
{% endraw %}

A litte later down, we'll see how to make this even-more interactive.

Approach 3: Connecting to a PostGIS database

In the following example pulls point geodata from a Postgres database.

We will pull the postgres point data in two manners.

  • SQL query where an SQL query uses ST_Transform(the_geom,4326) to transform the_geom's CRS from a DATABASE Binary encoding into standard Lat Long's
  • Using a plan SQL query and performing the conversion using gpd.io.sql.read_postgis() to pull the data in as 2248 and convert the CRS using .to_crs(epsg=4326)
  • These examples will not work in colabs as their is no local database to connect to and has been commented out for that reason
{% raw %}
'''
conn = psycopg2.connect(host='', dbname='', user='', password='', port='')

# DB Import Method One
sql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'
pointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)
pointData = pointData.to_crs(epsg=4326)

# DB Import Method Two
sql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'
pointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)
pointData.head()
pointData.plot()
'''
"\nconn = psycopg2.connect(host='', dbname='', user='', password='', port='')\n\n# DB Import Method One\nsql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'\npointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)\npointData = pointData.to_crs(epsg=4326)\n\n# DB Import Method Two\nsql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'\npointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)\npointData.head()\npointData.plot()\n"
{% endraw %}

Basics Operations

Inspection

{% raw %}
def geomSummary(gdf): return type(gdf), gdf.crs, gdf.columns;
# for p in df['Tract'].sort_values(): print(p)
geomSummary(csa_gdf)
(geopandas.geodataframe.GeoDataFrame, <Projected CRS: EPSG:2248>
 Name: NAD83 / Maryland (ftUS)
 Axis Info [cartesian]:
 - E[east]: Easting (US survey foot)
 - N[north]: Northing (US survey foot)
 Area of Use:
 - name: United States (USA) - Maryland - counties of Allegany; Anne Arundel; Baltimore; Calvert; Caroline; Carroll; Cecil; Charles; Dorchester; Frederick; Garrett; Harford; Howard; Kent; Montgomery; Prince Georges; Queen Annes; Somerset; St Marys; Talbot; Washington; Wicomico; Worcester.
 - bounds: (-79.49, 37.97, -74.97, 39.73)
 Coordinate Operation:
 - name: SPCS83 Maryland zone (US Survey feet)
 - method: Lambert Conic Conformal (2SP)
 Datum: North American Datum 1983
 - Ellipsoid: GRS 1980
 - Prime Meridian: Greenwich, Index(['Unnamed: 0', 'OBJECTID', 'CSA2010', 'hhchpov15', 'hhchpov16',
        'hhchpov17', 'hhchpov18', 'hhchpov19', 'Shape__Area', 'Shape__Length',
        'geometry'],
       dtype='object'))
{% endraw %}

Converting CRS

{% raw %}
# The gdf must be loaded with a known crs in order for the to_crs conversion to work
# We use this often to converting BNIAs custom CRS to the common type 
out_crs = 4326
csa_gdf = csa_gdf.to_crs(epsg=out_crs)
{% endraw %}

Saving

{% raw %}
filename = 'TEST_FILE_NAME'
csa_gdf.to_file(f"{filename}.geojson", driver='GeoJSON')
{% endraw %} {% raw %}
csa_gdf = csa_gdf.to_crs(epsg=2248) #just making sure
csa_gdf.to_file(filename+'.shp', driver='ESRI Shapefile')
csa_gdf = gpd.read_file(filename+'.shp')
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile.
  This is separate from the ipykernel package so we can avoid doing imports until
{% endraw %}

Geometric Manipulations

Draw Tool

{% raw %}
import folium
from folium.plugins import Draw
# Draw tool. Create and export your own boundaries
m = folium.Map()
draw = Draw()
draw.add_to(m)
m = folium.Map(location=[-27.23, -48.36], zoom_start=12)
draw = Draw(export=True)
draw.add_to(m)
# m.save(os.path.join('results', 'Draw1.html'))
m
<folium.plugins.draw.Draw at 0x7fa5d8c9f590>
<folium.plugins.draw.Draw at 0x7fa5d8c90b50>
Make this Notebook Trusted to load map: File -> Trust Notebook
{% endraw %}

Boundary

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.boundary
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8d06b10>
{% endraw %}

envelope

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.envelope
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8b9b850>
{% endraw %}

convex_hull

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.convex_hull
newcsa.plot(column='CSA2010' )
# , cmap='OrRd', scheme='quantiles'
# newcsa.boundary.plot(  )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8b88b10>
{% endraw %}

simplify

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.simplify(30)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8aed850>
{% endraw %}

buffer

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.buffer(0.01)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d8a67050>
{% endraw %}

rotate

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.rotate(30)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d89d5450>
{% endraw %}

scale

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.scale(3, 2)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d89c7410>
{% endraw %}

skew

{% raw %}
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.skew(1, 10)
newcsa.plot(column='CSA2010' )
<matplotlib.axes._subplots.AxesSubplot at 0x7fa5d89bcc50>
{% endraw %}

Advanced

Create Geospatial Functions

Operations:

  • Reading in data (points/ geoms) -- Convert lat/lng columns to point coordinates -- Geocoding address to coordinates -- Changing coordinate reference systems -- Connecting to PostGisDB's
  • Basic Operations
  • Saving shape data
  • Get Polygon Centroids
  • Working with Points and Polygons -- Map Points and Polygons -- Get Points in Polygons

Input(s):

  • Dataset (points/ bounds) url
  • Points/ bounds geometry column(s)
  • Points/ bounds crs's
  • Points/ bounds mapping color(s)
  • New filename

Output: File

This function will handle common geo spatial exploratory methods. It covers everything discussed in the basic operations and more!

{% raw %}

workWithGeometryData[source]

workWithGeometryData(method=False, df=False, polys=False, ptsCoordCol=False, polygonsCoordCol=False, polyColorCol=False, polygonsLabel='polyOnPoint', pntsClr='red', polysClr='white', interactive=False)

{% endraw %} {% raw %}
{% endraw %} {% raw %}

map_points[source]

map_points(df, lat_col='latitude', lon_col='longitude', zoom_start=11, plot_points=False, cluster_points=False, pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15, popup=False)

Creates a map given a dataframe of points. Can also produce a heatmap overlay

Arg: df: dataframe containing points to maps lat_col: Column containing latitude (string) lon_col: Column containing longitude (string) zoom_start: Integer representing the initial zoom of the map plot_points: Add points to map (boolean) pt_radius: Size of each point draw_heatmap: Add heatmap to map (boolean) heat_map_weights_col: Column containing heatmap weights heat_map_weights_normalize: Normalize heatmap weights (boolean) heat_map_radius: Size of heatmap point

Returns: folium map object

{% endraw %} {% raw %}
{% endraw %}

Processing Geometry is tedius enough to merit its own handler

{% raw %}

readInGeometryData[source]

readInGeometryData(url=False, porg=False, geom=False, lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False)

{% endraw %} {% raw %}
{% endraw %}

As you can see we have a lot of points. Lets see if there is any better way to visualize this.

Example: Using the advanced Functions

Playing with Points: Geoloom

Points In Polygons

The red dots from when we mapped the geoloom points above were a bit too noisy.

Lets create a choropleth instead!

We can do this by aggregating by CSA.

To do this, start of by finding which points are inside of which polygons!

Since the geoloom data does not have a CSA dataset, we will need merge it to one that does!

Lets use the childhood poverty link from example one and load it up because it contains the geometry data and the csa labels.

{% raw %}
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
csa_gdf_url = readInGeometryData(url=csa_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=2248, out_crs=False)
RECIEVED url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson, 
 porg: g, 
 geom: geometry, 
 lat: False, 
 lng: False, 
 revgeocode: False, 
 in_crs: 2248, 
 out_crs: 2248
{% endraw %}

And now lets pull in our geoloom data. But to be sure, drop the empty geometry columns or the function directly below will now work.

{% raw %}
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
geoloom_gdf = readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
# geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y'])
geoloom_gdf.head(1)
RECIEVED url: https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson, 
 porg: g, 
 geom: geometry, 
 lat: False, 
 lng: False, 
 revgeocode: False, 
 in_crs: 4326, 
 out_crs: 4326
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, Baltimore, MD, 21202, USA -8.53e+06 4.76e+06 e59b4931-e0c8-4d6b-b781-1e672bf8545a POINT (-76.60661 39.28746)
{% endraw %}

And now use a point in polygon method 'ponp' to get the CSA2010 column from our CSA dataset added as a column to each geoloom record.

{% raw %}
geoloom_w_csas = workWithGeometryData(method='pinp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')
{% endraw %}

You'll see you have a 'pointsinpolygons' column now.

{% raw %}
geoloom_w_csas[13:].head(1)
{% endraw %} {% raw %}
# This can be done programmatically, but i havent added the code. 
# The column needs to be changed from CSA to whatever this new tallied column is named.
geoloom_w_csas.plot( column='pointsinpolygon', legend=True)
{% endraw %} {% raw %}
geoloom_w_csas.head(1)
{% endraw %}

Polygons in Points

Alternately, you can run the ponp function and have returned the geoloom dataset

{% raw %}
geoloom_w_csas = workWithGeometryData(method='ponp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')
{% endraw %}

We can count the totals per CSA using value_counts

{% raw %}
geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x
geoloom_w_csas.head(1)
{% endraw %} {% raw %}
geoloom_w_csas['CSA2010'].value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
{% endraw %}

Alternately, we could map the centroid of boundaries within another boundary to find boundaries within boundaries

{% raw %}
geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)

# And filter out for points only in Baltimore City. 
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3  ]
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5  ]
{% endraw %} {% raw %}
geoloom_w_csas = geoloom_w_csas.dropna(subset=['POINT_X', 'POINT_Y'])
map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False, 
           pt_radius=15, draw_heatmap=True, heat_map_weights_col='POINT_X', heat_map_weights_normalize=True, 
           heat_map_radius=15, popup='CSA2010')
{% endraw %}

But if that doesn't do it for you, we can also create heat maps

{% raw %}
!apt install libspatialindex-dev
!pip install rtree
{% endraw %} {% raw %}
import os
import folium
import geopandas as gpd
import pandas as pd
import numpy as np
from branca.colormap import linear
from folium.plugins import TimeSliderChoropleth
from folium.plugins import MarkerCluster
{% endraw %} {% raw %}
geoloom_w_csas.head(1)
{% endraw %} {% raw %}
geoloom_w_csas.head(1)
{% endraw %} {% raw %}
geoloom_w_csas.plot()
{% endraw %} {% raw %}
geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x

# We already know the x and y columns because we just saved them as such.
geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)

# And filter out for points only in Baltimore City. 
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3  ]
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5  ]

map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=True, 
           pt_radius=15, draw_heatmap=True, heat_map_weights_col='POINT_X', heat_map_weights_normalize=True, 
           heat_map_radius=15, popup='CSA2010')
{% endraw %} {% raw %}
geoloom_w_csas.head(1)
{% endraw %} {% raw %}
# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb
m = folium.Map(location=[39.28759453969165, -76.61278931706487], zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)
stations = geoloom_w_csas.apply(lambda p: folium.Marker( location=[p.y,p.x], popup='Add popup text here.', icon=None ).add_to(marker_cluster) )
m
{% endraw %}

And Time Sliders

Choropleth Timeslider

To simulate that data is sampled at different times we random sample data for n_periods rows of data. Note that the geodata and random sampled data is linked through the feature_id, which is the index of the GeoDataFrame.

{% raw %}
periods = 10
datetime_index = pd.date_range('2010', periods=periods, freq='Y')
dt_index_epochs = ( datetime_index.astype(int) ).astype('U10')
datetime_index
{% endraw %} {% raw %}
styledata = {}
for country in geoloom.index:
    df = pd.DataFrame(
        {'color': np.random.normal(size=periods),
         'opacity':  [1,2,3,4,5,6,7,8,9,1] },
        index=dt_index_epochs
    )
    df = df.cumsum()
    styledata[country] = df
ax = df.plot()
{% endraw %} {% raw %}
df.head()
{% endraw %}

We see that we generated two series of data for each country; one for color and one for opacity. Let's plot them to see what they look like.

{% raw %}
max_color, min_color, max_opacity, min_opacity = 0, 0, 0, 0
for country, data in styledata.items():
    max_color = max(max_color, data['color'].max())
    min_color = min(max_color, data['color'].min())
    max_opacity = max(max_color, data['opacity'].max())
    max_opacity = min(max_color, data['opacity'].max())
linear.PuRd_09.scale(min_color, max_color)
{% endraw %}

We want to map the column named color to a hex color. To do this we use a normal colormap. To create the colormap, we calculate the maximum and minimum values over all the timeseries. We also need the max/min of the opacity column, so that we can map that column into a range [0,1].

{% raw %}
max_color, min_color, max_opacity, min_opacity = 0, 0, 0, 0
for country, data in styledata.items():
    max_color = max(max_color, data['color'].max())
    min_color = min(max_color, data['color'].min())
    max_opacity = max(max_color, data['opacity'].max())
    max_opacity = min(max_color, data['opacity'].max())
{% endraw %} {% raw %}
from branca.colormap import linear
cmap = linear.PuRd_09.scale(min_color, max_color)
def norm(x): return (x - x.min()) / (x.max() - x.min())
for country, data in styledata.items():
    data['color'] = data['color'].apply(cmap)
    data['opacity'] = norm(data['opacity'])
{% endraw %}

Finally we use pd.DataFrame.to_dict() to convert each dataframe into a dictionary, and place each of these in a map from country id to data.

{% raw %}
from folium.plugins import TimeSliderChoropleth
m = folium.Map([39.28759453969165, -76.61278931706487], zoom_start=12)
g = TimeSliderChoropleth(
    geoloom.to_json(),
    styledict={
      str(country): data.to_dict(orient='index') for
      country, data in styledata.items()
    }
).add_to(m)
m
{% endraw %}