--- title: Geo-Data Intake and Operations keywords: fastai sidebar: home_sidebar summary: "This notebook was made to demonstrate how to work with geographic data." description: "This notebook was made to demonstrate how to work with geographic data." nb_path: "notebooks/03_Map_Basics_Intake_and_Operations.ipynb" ---
This Coding Notebook is the third in a series.
An Interactive version can be found here .
This colab and more can be found on our webpage.
Content covered in previous tutorials will be used in later tutorials.
New code and or information should have explanations and or descriptions attached.
Concepts or code covered in previous tutorials will be used without being explaining in entirety.
The Dataplay Handbook development techniques covered in the Datalabs Guidebook
If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.
This notebook has been optimized for Google Colabs ran on a Chrome Browser.
Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.
In this notebook, the basics of working with geographic data are introduced.
Geographic data must be encoded properly order to attain the full potential of the spatial nature of your geographic data.
If you have read in a dataset using pandas it's data type will be a Dataframe.
It may be converted into a Geo-Dataframe using Geopandas as demonstrated in the sections below.
You can check a variables at any time using the dtype command:
yourGeoDataframe.dtype
Make sure the appropriate spatial Coordinate Reference System (CRS) is used when reading in your data!
ala wiki:
A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities
CRS 4326 is the CRS most people are familar with when refering to latiude and longitudes.
Baltimore's 4326 CRS should be at (39.2, -76.6)
BNIA uses CRS 2248 internally Additional Information: https://docs.qgis.org/testing/en/docs/gentle_gis_introduction/coordinate_reference_systems.html
Ensure your geodataframes' coordinates are using the same CRS using the geopandas command:
yourGeoDataframe.CRS
When first recieving a spatial dataset, the spatial column may need to be encoded to convert its 'text' data type values into understood 'coordinate' data types before it can be understood/processed accordingly.
Namely, there are two ways to encode text into coordinates:
The first approach can be used for text taking the form "Point(-76, 39)" and will encode the text too coordinates. The second approach is useful when creating a point from two columns containing lat/lng information and will create Point coordinates from the two columns.
There exists two types of Geospatial Data, Raster and Vector. Both have different file formats.
This lab will only cover vector data.
Vector Data: Individual points stored as (x,y) coordinates pairs. These points can be joined to create lines or polygons.
Format of Vector data
Esri Shapefile — .shp, .dbf, .shx Description - Industry standard, most widely used. The three files listed above are needed to make a shapefile. Additional file formats may be included.
Geographic JavaScript Object Notation — .geojson, .json Description — Second most popular, Geojson is typically used in web-based mapping used by storing the coordinates as JSON.
Geography Markup Language — .gml Description — Similar to Geojson, GML has more data for the same amount of information.
Google Keyhole Markup Language — .kml, .kmz Description — XML-based and predominantly used for google earth. KMZ is a the newer, zipped version of KML.
Raster Data: Cell-based data where each cell represent geographic information. An Aerial photograph is one such example where each pixel has a color value
Raster Data Files: GeoTIFF — .tif, .tiff, .ovr ERDAS Imagine — .img IDRISI Raster — .rst, .rdc
Information Sourced From: https://towardsdatascience.com/getting-started-with-geospatial-works-1f7b47955438
Vector Data: Census Geographic Data:
%%capture
! pip install geopy
! pip install geopandas
! pip install geoplot
! pip install dataplay
! pip install geopandas
import psycopg2
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# pd.set_option('display.expand_frame_repr', False)
# pd.set_option('display.precision', 2)
# pd.reset_option('max_colwidth')
pd.set_option('max_colwidth', 50)
# pd.reset_option('max_colwidth')
As mentioned earlier:
When you use a pandas function to 'read-in' a dataset, the returned value is of a datatype called a 'Dataframe'.
We need a 'Geo-Dataframe', however, to effectively work with spatial data.
While Pandas does not support Geo-Dataframes; Geo-pandas does.
Geopandas has everything you love about pandas, but with added support for geo-spatial data.
Principle benefits of using Geopandas over Pandas when working with spatial data:
There are many ways to have our spatial-data be read-in using geo-pandas into a geo-dataframe.
Namely, it means reading in Geo-Spatial-data from a:
We will review each one below
If you are using Geopandas, direct imports only work with geojson and shape files.
spatial coordinate data is properly encoded with these types of files soas to make them particularly easy to use.
You can perform this using geopandas' read_file()
function.
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/1/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")
As you can see, the resultant variable is of type GeoDataFrame.
type(csa_gdf)
GeoDataFrames are only possible when one of the columns are of a 'Geometry' Datatype
csa_gdf.dtypes
Awesome. So that means, now you can plot maps all prety like:
csa_gdf.plot(column='hhchpov15')
And now lets take a peak at the raw data:
csa_gdf.head(1)
I'll show you more ways to save the data later, but for our example in the next section to work, we need a csv.
We can make one by saving the geo-dataframe avove using the to_gdf
function.
The spatial data will be stored in an encoded form that will make it easy to re-open up in the future.
csa_gdf.to_csv('example.csv')
This approach loads a map using a geometry column
In our previous example, we saved a geo-dataframe as a csv.
Now lets re-open it up using pandas!
url = "example.csv"
geom = 'geometry'
# An example of loading in an internal BNIA file
crs = {'init' :'epsg:2248'}
# Read in the dataframe
csa_gdf = pd.read_csv(url)
Great!
But now what?
Well, for starters, regardless of the project you are working on: It's always a good idea to inspect your data.
This is particularly important if you don't know what you're working with.
csa_gdf.head(1)
Take notice of how the geometry column has a special.. foramatting.
All spatial data must take on a similar form encoding for it to be properly interpretted as a spatial data-type.
As far as I can tell, This is near-identical to the table I printed out in our last example.
BUT WAIT!
You'll notice, that if I run the plot function a pretty map will not de-facto appear
csa_gdf.plot()
Why is this? Because you're not working with a geo-dataframe but just a dataframe!
Take a look:
type(csa_gdf)
Okay... So thats not right..
What can we do about this?
Well for one, our spatial data (in the geometry-column) is not of the right data-type even though it takes on the right form.
csa_gdf.dtypes
Ok. So how do we change it? Well since it's already been properly encoded... You can convert a columns data-type from an object (or whatver) to a 'geometry' using the loads
function.
In the example below, we convert the datatypes for all records in the 'geometry' column
csa_gdf[geom] = csa_gdf[geom].apply(lambda x: loads( str(x) ))
Thats all! Now lets see the geometry columns data-type and the entire tables's data-type
csa_gdf.dtypes
type(csa_gdf)
As you can see, we have a geometry column of the right datatype, but our table is still only just a dataframe.
But now, you are ready to convert your entire pandas dataframe into a geo-dataframe.
You can do that by running the following function:
csa_gdf = GeoDataFrame(csa_gdf, crs=crs, geometry=geom)
Aaaand BOOM.
csa_gdf.plot(column='hhchpov18')
goes the dy-no-mite
type(csa_gdf)
This is the generic example but it will not work since no URL is given.
# If your data has coordinates in two columns run this cell
# It will create a geometry column from the two.
# A public dataset is not provided for this example and will not run.
# Load DF HERE. Accidently deleted the link. Need to refind.
# Just rely on example 2 for now.
"""
exe_df['x'] = pd.to_numeric(exe_df['x'], errors='coerce')
exe_df['y'] = pd.to_numeric(exe_df['y'], errors='coerce')
# exe_df = exe_df.replace(np.nan, 0, regex=True)
# An example of loading in an internal BNIA file
geometry=[Point(xy) for xy in zip(exe_df.x, exe_df.y)]
exe_gdf = gpd.GeoDataFrame( exe_df.drop(['x', 'y'], axis=1), crs=crs, geometry=geometry)
"""
Since I do not readily have a dataset with lat and long's I will have to make one.
We can split the coordinates from a geodataframe like so...
# Table: Geoloom,
# Columns:
geoloom_gdf = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
geoloom_gdf['POINT_X'] = geoloom_gdf['geometry'].centroid.x
geoloom_gdf['POINT_Y'] = geoloom_gdf['geometry'].centroid.y
# Now lets just drop the geometry column and save it to have our example dataset.
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
geoloom_gdf.to_csv('example.csv')
The first thing you will want to do when given a dataset with a coordinates column is ensure its datatype.
geoloom_df = pd.read_csv('example.csv')
# We already know the x and y columns because we just saved them as such.
geoloom_df['POINT_X'] = pd.to_numeric(geoloom_df['POINT_X'], errors='coerce')
geoloom_df['POINT_Y'] = pd.to_numeric(geoloom_df['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)
# And filter out for points only in Baltimore City.
geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] > 39.3 ]
geoloom_df = geoloom_df[ geoloom_df['POINT_Y'] < 39.5 ]
crs = {'init' :'epsg:2248'}
geometry=[Point(xy) for xy in zip(geoloom_df['POINT_X'], geoloom_df['POINT_Y'])]
geoloom_gdf = gpd.GeoDataFrame( geoloom_df.drop(['POINT_X', 'POINT_Y'], axis=1), crs=crs, geometry=geometry)
# 39.2904° N, 76.6122°
geoloom_gdf.head(1)
Heres a neat trick to make it more presentable, because those points mean nothing to me.
ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')
# now plot our points over it.
geoloom_gdf.plot(ax=ax, color='red')
plt.show()
When you want to merge two datasets that do not share a common column, it is often useful to create a 'crosswalk' file that 'maps' records between two datasets. We can do this to append spatial data when a direct merge is not readily evident.
Check out this next example where we pull ACS Census data and use its 'tract' column and map it to a community. We can then aggregate the points along a the communities they belong to and map it on a choropleth!
We will set up our ACS query variables right here for easy changing
# Change these values in the cell below using different geographic reference codes will change those parameters
tract = '*'
county = '510' # '059' # 153 '510'
state = '24' #51
# Specify the download parameters the function will receieve here
tableId = 'B19049' # 'B19001'
year = '17'
saveAcs = True
And now we will call the function with those variables and check out the result
import dataplay
from dataplay import acsDownload
import IPython
retrieve_acs_data = acsDownload.retrieve_acs_data
# from IPython.core.display import HTML
IPython.core.display.HTML("<style>.rendered_html th {max-width: 200px; overflow:auto;}</style>")
# state, county, tract, tableId, year, saveOriginal, save
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
df.head(1)
df.to_csv('tracts_data.csv')
df['tract'].dtype
As you can see, the tract column
ls
!wget https://bniajfi.org/vs_resources/CSA-to-Tract-2010.csv
df['tract'].tail(10)
crosswalk = pd.read_csv('CSA-to-Tract-2010.csv')
crosswalk.tail(10)
Hhchpov = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/1/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson")
Hhchpov.head(1)
Hhchpov = Hhchpov[['CSA2010', 'hhchpov15', 'hhchpov16', 'hhchpov17', 'hhchpov18', 'geometry']]
Hhchpov.to_file("Hhchpov.geojson", driver='GeoJSON')
Hhchpov.to_csv('Hhchpov.csv')
gpd.read_file("Hhchpov.geojson").head(1)
df.merge(crosswalk, left_on='tract', right_on='TRACTCE10')
ls
from dataplay import merge
# The attributes are what we will use.
in_crs = 2248 # The CRS we recieve our data
out_crs = 4326 # The CRS we would like to have our data represented as
geom = 'geometry' # The column where our spatial information lives.
# To create this dataset I had to commit a full outer join.
# In this way geometries will be included even if there merge does not have a direct match.
# What this will do is that it means at least one (near) empty record for each community will exist that includes (at minimum) the geographic information and name of a Community.
# That way if no point level information existed in the community, that during the merge the geoboundaries are still carried over.
# Primary Table
# Description: I created a public dataset from a google xlsx sheet 'Bank Addresses and Census Tract'.
# Table: FDIC Baltimore Banks
# Columns: Bank Name, Address(es), Census Tract
left_ds = 'tracts_data.csv'
left_col = 'tract'
# Crosswalk Table
# Table: Crosswalk Census Communities
# 'TRACT2010', 'GEOID2010', 'CSA2010'
crosswalk_ds = 'CSA-to-Tract-2010.csv'
use_crosswalk = True
crosswalk_left_col = 'TRACTCE10'
crosswalk_right_col = 'CSA2010'
# Secondary Table
# Table: Baltimore Boundaries => HHCHPOV
# 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
right_ds = 'Hhchpov.geojson'
right_col ='CSA2010'
interactive = True
merge_how = 'outer'
# reutrns a pandas dataframe
mergedf = merge.mergeDatasets( left_ds=left_ds, left_col=left_col,
crosswalk_ds=crosswalk_ds,
crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
right_ds=right_ds, right_col=right_col,
merge_how=merge_how, interactive = interactive )
mergedf.dtypes
# mergedf[geom] = mergedf[geom].apply(lambda x: loads( str(x) ) )
# Process the dataframe as a geodataframe with a known CRS and geom column
mergedGdf = GeoDataFrame(mergedf, crs=in_crs, geometry=geom)
mergedGdf.plot()
Sometimes (usually) we just don't have the coordinates of a place, but we do know it's address or that it is an established landmark.
In such cases we attempt 'geo-coding' these points in an automated manner.
While convenient, this process is error prone, so be sure to check it's work!
For this next example to take place, we need a dataset that has a bunch of addresses.
We can use the geoloom dataset from before in this example. We'll just drop geo'spatial data.
geoloom = gpd.read_file("https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson");
geoloom = geoloom.dropna(subset=['geometry'])
geoloom = geoloom.drop(columns=['geometry','GlobalID', 'POINT_X', 'POINT_Y'])
geoloom.head(1)
But if for whatever reason the link is down, you can use this example dataframe mapping just some of the many malls in baltimore.
address_df = pd.DataFrame({
'Location' : pd.Series([
'100 N. Holliday St, Baltimore, MD 21202',
'200 E Pratt St, Baltimore, MD',
'2401 Liberty Heights Ave, Baltimore, MD',
'201 E Pratt St, Baltimore, MD',
'3501 Boston St, Baltimore, MD',
'857 E Fort Ave, Baltimore, MD',
'2413 Frederick Ave, Baltimore, MD'
]),
'Address' : pd.Series([
'Baltimore City Council',
'The Gallery at Harborplace',
'Mondawmin Mall',
'Harborplace',
'The Shops at Canton Crossing',
'Southside Marketplace',
'Westside Shopping Center'
])
})
address_df.head()
You can use either the Location or Address column to perform the geo-coding on.
address_df = geoloom.copy()
addrCol = 'Location'
This function takes a while. The less columns/data/records the faster it executes.
# In this example we retrieve and map a dataset with no lat/lng but containing an address
# In this example our data is stored in the 'STREET' attribute
geometry = []
geolocator = Nominatim(user_agent="my-application")
for index, row in address_df.iterrows():
# We will try and return an address for each Street Name
try:
# retrieve the geocoded information of our street address
geol = geolocator.geocode(row[addrCol], timeout=None)
# create a mappable coordinate point from the response object's lat/lang values.
pnt = Point(geol.longitude, geol.latitude)
# Append this value to the list of geometries
geometry.append(pnt)
except:
# If no street name was found decide what to do here.
# df.loc[index]['geom'] = Point(0,0) # Alternate method
geometry.append(Point(0,0))
# Finally, we stuff the geometry data we created back into the dataframe
address_df['geometry'] = geometry
address_df.head(1)
Awesome! Now convert the dataframe into a geodataframe and map it!
gdf = gpd.GeoDataFrame( address_df, geometry=geometry)
gdf = gdf[ gdf.centroid.y > 39.3 ]
gdf = gdf[ gdf.centroid.y < 39.5 ]
ax = csa_gdf.plot(column='hhchpov18', edgecolor='black')
# now plot our points over it.
geoloom_gdf.plot(ax=ax, color='red')
A litte later down, we'll see how to make this even-more interactive.
In the following example pulls point geodata from a Postgres database.
We will pull the postgres point data in two manners.
'''
conn = psycopg2.connect(host='', dbname='', user='', password='', port='')
# DB Import Method One
sql1 = 'SELECT the_geom, gid, geogcode, ooi, address, addrtyp, city, block, lot, desclu, existing FROM housing.mdprop_2017v2 limit 100;'
pointData = gpd.io.sql.read_postgis(sql1, conn, geom_col='the_geom', crs=2248)
pointData = pointData.to_crs(epsg=4326)
# DB Import Method Two
sql2 = 'SELECT ST_Transform(the_geom,4326) as the_geom, ooi, desclu, address FROM housing.mdprop_2017v2;'
pointData = gpd.GeoDataFrame.from_postgis(sql2, conn, geom_col='the_geom', crs=4326)
pointData.head()
pointData.plot()
'''
def geomSummary(gdf): return type(gdf), gdf.crs, gdf.columns;
# for p in df['Tract'].sort_values(): print(p)
geomSummary(csa_gdf)
# The gdf must be loaded with a known crs in order for the to_crs conversion to work
# We use this often to converting BNIAs custom CRS to the common type
out_crs = 4326
csa_gdf = csa_gdf.to_crs(epsg=out_crs)
filename = 'TEST_FILE_NAME'
csa_gdf.to_file(f"{filename}.geojson", driver='GeoJSON')
csa_gdf = csa_gdf.to_crs(epsg=2248) #just making sure
csa_gdf.to_file(filename+'.shp', driver='ESRI Shapefile')
csa_gdf = gpd.read_file(filename+'.shp')
Draw Tool
import folium
from folium.plugins import Draw
# Draw tool. Create and export your own boundaries
m = folium.Map()
draw = Draw()
draw.add_to(m)
m = folium.Map(location=[-27.23, -48.36], zoom_start=12)
draw = Draw(export=True)
draw.add_to(m)
# m.save(os.path.join('results', 'Draw1.html'))
m
Boundary
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.boundary
newcsa.plot(column='CSA2010' )
envelope
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.envelope
newcsa.plot(column='CSA2010' )
convex_hull
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.convex_hull
newcsa.plot(column='CSA2010' )
# , cmap='OrRd', scheme='quantiles'
# newcsa.boundary.plot( )
simplify
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.simplify(30)
newcsa.plot(column='CSA2010' )
buffer
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.buffer(0.01)
newcsa.plot(column='CSA2010' )
rotate
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.rotate(30)
newcsa.plot(column='CSA2010' )
scale
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.scale(3, 2)
newcsa.plot(column='CSA2010' )
skew
newcsa = csa_gdf.copy()
newcsa['geometry'] = csa_gdf.skew(1, 10)
newcsa.plot(column='CSA2010' )
Operations:
Input(s):
Output: File
This function will handle common geo spatial exploratory methods. It covers everything discussed in the basic operations and more!
Processing Geometry is tedius enough to merit its own handler
THIS WILL NEED UPDATING TO SO
As you can see we have a lot of points. Lets see if there is any better way to visualize this.
The red dots from when we mapped the geoloom points above were a bit too noisy.
Lets create a choropleth instead!
We can do this by aggregating by CSA.
To do this, start of by finding which points are inside of which polygons!
Since the geoloom data does not have a CSA dataset, we will need merge it to one that does!
Lets use the childhood poverty link from example one and load it up because it contains the geometry data and the csa labels.
# BNIA ArcGIS Homepage: https://data-bniajfi.opendata.arcgis.com/
csa_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
csa_gdf_url = readInGeometryData(url=csa_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=2248, out_crs=False)
And now lets pull in our geoloom data. But to be sure, drop the empty geometry columns or the function directly below will now work.
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
geoloom_gdf = readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False)
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
# geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y'])
geoloom_gdf.head(1)
And now use a point in polygon method 'ponp' to get the CSA2010 column from our CSA dataset added as a column to each geoloom record.
geoloom_w_csas = workWithGeometryData(method='pinp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')
You'll see you have a 'pointsinpolygons' column now.
geoloom_w_csas[13:].head(1)
# This can be done programmatically, but i havent added the code.
# The column needs to be changed from CSA to whatever this new tallied column is named.
geoloom_w_csas.plot( column='pointsinpolygon', legend=True)
geoloom_w_csas.head(1)
Alternately, you can run the ponp function and have returned the geoloom dataset
geoloom_w_csas = workWithGeometryData(method='ponp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')
We can count the totals per CSA using value_counts
geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x
geoloom_w_csas.head(1)
geoloom_w_csas['CSA2010'].value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
Alternately, we could map the centroid of boundaries within another boundary to find boundaries within boundaries
geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)
# And filter out for points only in Baltimore City.
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3 ]
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5 ]
geoloom_w_csas = geoloom_w_csas.dropna(subset=['POINT_X', 'POINT_Y'])
map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False,
pt_radius=15, draw_heatmap=True, heat_map_weights_col='POINT_X', heat_map_weights_normalize=True,
heat_map_radius=15, popup='CSA2010')
But if that doesn't do it for you, we can also create heat maps
!apt install libspatialindex-dev
!pip install rtree
import os
import folium
import geopandas as gpd
import pandas as pd
import numpy as np
from branca.colormap import linear
from folium.plugins import TimeSliderChoropleth
from folium.plugins import MarkerCluster
geoloom_w_csas.head(1)
geoloom_w_csas.head(1)
geoloom_w_csas.plot()
geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x
# We already know the x and y columns because we just saved them as such.
geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)
# And filter out for points only in Baltimore City.
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3 ]
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5 ]
map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=True,
pt_radius=15, draw_heatmap=True, heat_map_weights_col='POINT_X', heat_map_weights_normalize=True,
heat_map_radius=15, popup='CSA2010')
geoloom_w_csas.head(1)
# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb
m = folium.Map(location=[39.28759453969165, -76.61278931706487], zoom_start=12)
marker_cluster = MarkerCluster().add_to(m)
stations = geoloom_w_csas.apply(lambda p: folium.Marker( location=[p.y,p.x], popup='Add popup text here.', icon=None ).add_to(marker_cluster) )
m
And Time Sliders
To simulate that data is sampled at different times we random sample data for n_periods rows of data. Note that the geodata and random sampled data is linked through the feature_id, which is the index of the GeoDataFrame.
periods = 10
datetime_index = pd.date_range('2010', periods=periods, freq='Y')
dt_index_epochs = ( datetime_index.astype(int) ).astype('U10')
datetime_index
styledata = {}
for country in geoloom.index:
df = pd.DataFrame(
{'color': np.random.normal(size=periods),
'opacity': [1,2,3,4,5,6,7,8,9,1] },
index=dt_index_epochs
)
df = df.cumsum()
styledata[country] = df
ax = df.plot()
df.head()
We see that we generated two series of data for each country; one for color and one for opacity. Let's plot them to see what they look like.
max_color, min_color, max_opacity, min_opacity = 0, 0, 0, 0
for country, data in styledata.items():
max_color = max(max_color, data['color'].max())
min_color = min(max_color, data['color'].min())
max_opacity = max(max_color, data['opacity'].max())
max_opacity = min(max_color, data['opacity'].max())
linear.PuRd_09.scale(min_color, max_color)
We want to map the column named color to a hex color. To do this we use a normal colormap. To create the colormap, we calculate the maximum and minimum values over all the timeseries. We also need the max/min of the opacity column, so that we can map that column into a range [0,1].
max_color, min_color, max_opacity, min_opacity = 0, 0, 0, 0
for country, data in styledata.items():
max_color = max(max_color, data['color'].max())
min_color = min(max_color, data['color'].min())
max_opacity = max(max_color, data['opacity'].max())
max_opacity = min(max_color, data['opacity'].max())
from branca.colormap import linear
cmap = linear.PuRd_09.scale(min_color, max_color)
def norm(x): return (x - x.min()) / (x.max() - x.min())
for country, data in styledata.items():
data['color'] = data['color'].apply(cmap)
data['opacity'] = norm(data['opacity'])
Finally we use pd.DataFrame.to_dict() to convert each dataframe into a dictionary, and place each of these in a map from country id to data.
from folium.plugins import TimeSliderChoropleth
m = folium.Map([39.28759453969165, -76.61278931706487], zoom_start=12)
g = TimeSliderChoropleth(
geoloom.to_json(),
styledict={
str(country): data.to_dict(orient='index') for
country, data in styledata.items()
}
).add_to(m)
m