This package was made to help with data handling.
Included
Dataplay uses functions found in our VitalSigns Module and vice-versa.
You use can use these docs to learn from or as documentation when using the attached library.
Content covered in previous tutorials will be used in later tutorials.
New code and or information should have explanations and or descriptions attached.
Concepts or code covered in previous tutorials will be used without being explaining in entirety.
The Dataplay Handbook development techniques covered in the Datalabs Guidebook
If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.
This notebook has been optimized for Google Colabs ran on a Chrome Browser.
Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing extend throughout the tutorial.
By the end of this tutorial users should have an understanding of:
The code is on PyPI so you can install the scripts as a python library using the command:
!pip install dataplay geopandas
Important: Contributers should follow the maintanance instructions and will not need to run this step.
Their modules will be retrieved from the VitalSigns-GDrive repo they have mounted into their Colabs Enviornment.
Then...
from VitalSigns.acsDownload import retrieve_acs_data
retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
Now you could do something like merge it to another dataset!
from dataplay.merge import mergeDatasets
mergeDatasets(left_ds=False, right_ds=False, crosswalk_ds=False, use_crosswalk = True, left_col=False, right_col=False, crosswalk_left_col = False, crosswalk_right_col = False, merge_how=False, interactive=True)
You can get information on the package, modules, and methods by using the help command.
Here we look at the package's modules:
import dataplay help(dataplay)Help on package dataplay: NAME dataplay PACKAGE CONTENTS _nbdev corr geoms gifmap html intaker merge VERSION 0.0.28 FILE /usr/local/lib/python3.7/dist-packages/dataplay/__init__.pyLets take a look at what functions the geoms module provides:
import dataplay.geoms help(dataplay.geoms)Help on module dataplay.geoms in dataplay: NAME dataplay.geoms - # AUTOGENERATED! DO NOT EDIT! File to edit: notebooks/03_Map_Basics_Intake_and_Operations.ipynb (unless otherwise specified). FUNCTIONS map_points(data, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False, pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15, popup=False) Creates a map given a dataframe of points. Can also produce a heatmap overlay Arg: df: dataframe containing points to maps lat_col: Column containing latitude (string) lon_col: Column containing longitude (string) zoom_start: Integer representing the initial zoom of the map plot_points: Add points to map (boolean) pt_radius: Size of each point draw_heatmap: Add heatmap to map (boolean) heat_map_weights_col: Column containing heatmap weights heat_map_weights_normalize: Normalize heatmap weights (boolean) heat_map_radius: Size of heatmap point Returns: folium map object readInGeometryData(url=False, porg=False, geom=False, lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False) # reverseGeoCode, readFile, getGeoParams, main workWithGeometryData(method=False, df=False, polys=False, ptsCoordCol=False, polygonsCoordCol=False, polyColorCol=False, polygonsLabel='polyOnPoint', pntsClr='red', polysClr='white', interactive=False) # Cell # # Work With Geometry Data # Description: geomSummary, getPointsInPolygons, getPolygonOnPoints, mapPointsInPolygons, getCentroids DATA __all__ = ['workWithGeometryData', 'map_points', 'readInGeometryData'] FILE /usr/local/lib/python3.7/dist-packages/dataplay/geoms.pyAnd here we can look at an individual function and what it expects:
import VitalSigns.acsDownload help(VitalSigns.acsDownload.retrieve_acs_data)Help on function retrieve_acs_data in module VitalSigns.acsDownload: retrieve_acs_data(state, county, tract, tableId, year, save)So heres an example:
Import your modules
%%capture import pandas as pd from VitalSigns.acsDownload import retrieve_acs_data from dataplay.geoms import workWithGeometryData from dataplay.geoms import map_points from dataplay.intaker import Intake#hide pd.set_option('display.max_rows', 10) pd.set_option('display.max_columns', 6) pd.set_option('display.width', 10) pd.set_option('max_colwidth', 20)Read in some data
Define our download parameters.
More information on these parameters can be found in the tutorials!
tract = '*' county = '510' state = '24' tableId = 'B19001' year = '17' saveAcs = FalseAnd download the Baltimore City ACS data using the imported VitalSigns library.
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)B19001_001E_Total | B19001_002E_Total_Less_than_$10,000 | B19001_003E_Total_$10,000_to_$14,999 | ... | state | county | tract | |
---|---|---|---|---|---|---|---|
NAME | |||||||
Census Tract 2710.02 | 1510 | 209 | 73 | ... | 24 | 510 | 271002 |
1 rows × 20 columns
Here we can import and display a geospatial dataset with special intake requirements.
Here we pull a list of Baltimore Cities CSA's
help(csa_gdf.plot)Now in this example we will load in a bunch of coorinates
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson" geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False) geoloom_gdf = geoloom_gdf.dropna(subset=['geometry']) # geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y']) geoloom_gdf.head(1)OBJECTID | Data_type | Attach | ... | POINT_Y | GlobalID | geometry | |
---|---|---|---|---|---|---|---|
0 | 1 | Artists & Resources | None | ... | 4.762932e+06 | e59b4931-e0c8-4d... | POINT (-76.60661... |
1 rows × 14 columns
And here we get the number of points in each of our corresponding CSAs (polygons)
geoloom_w_csas = dataplay.geoms.workWithGeometryData(method='pinp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')And we plot it with a legend
geoloom_w_csas.plot( column='pointsinpolygon', legend=True)What were to happen if I wanted to create a interactive click map with the label of each csa (polygon) on each point?
Well we just run the reverse operation!
geoloom_w_csas = workWithGeometryData(method='ponp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')And then we can visualize it like:
outp = map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=12, plot_points=True, cluster_points=False, pt_radius=1, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15, popup='CSA2010')These interactive visualizations can be exported to html using a tool found later in this document.
Its how I made this page!
If you like what you see, there is more in the package you will just have to explore.
Disclaimer
Views Expressed: All views expressed in this tutorial are the authors own and do not represent the opinions of any entity whatsover with which they have been, are now, or will be affiliated.
Responsibility, Errors and Ommissions: The author makes no assurance about the reliability of the information. The author makes takes no responsibility for updating the tutorial nor maintaining it porformant status. Under no circumstances shall the Author or its affiliates be liable for any indirect incedental, consequential, or special and or exemplary damages arising out of or in connection with this tutorial. Information is provided 'as is' with distinct plausability of errors and ommitions. Information found within the contents is attached with an MIT license. Please refer to the License for more information.
Use at Risk: Any action you take upon the information on this Tutorial is strictly at your own risk, and the author will not be liable for any losses and damages in connection with the use of this tutorial and subsequent products.
Fair Use this site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. While no intention is made to unlawfully use copyrighted work, circumstanes may arise in which such material is made available in effort to advance scientific literacy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Titile 17 U.S.C. Section 108, the material on this tutorial is distributed without profit to those who have expressed a prior interest in receiving the included information for research and education purposes.
for more information go to: http://www.law.cornell.edu/uscode/17/107.shtml. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.
License
Copyright © 2019 BNIA-JFI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
From a local copy of the git repo: 0. Clone the repo local onto GDrive
git clone https://github.com/BNIA/dataplay.git
!nbdev_build_lib
to build .py modules.%load_ext autoreload %autoreload 2
!nbdev_update_lib
and !relimport2name
!nbdev_build_docs --force_all True --mk_readme True
!git commit -m ...
%%capture ! pip install twine
!nbdev_bump_version
! make pypi
Not shown.
#hide !pip install nbdev from google.colab import drive drive.mount('/content/drive') %cd /content/drive/My Drive/'Software Development Documents'/ %cd dataplay %ls#hide # this will reload imported modules whenever the .py file changes. # whenever the .py file changes via nbdev_build_lib or _update_lib. %load_ext autoreload %autoreload 2#hide # !nbdev_build_lib # !nbdev_build_docs --force_all True --mk_readme True # !nbdev_nb2md 'notebooks/index.ipynb' > README.md#hide # https://nbdev.fast.ai/tutorial.html#Add-in-notebook-export-cell # https://nbdev.fast.ai/sync#nbdev_update_lib # first. builds the .py files from from .ipynbs # ____ !nbdev_build_lib # --fname filename.ipynb # second. Push .pu changes back to their original .ipynbs # ____ !nbdev_update_lib # sometimes. Update .ipynb import statements if the .py filename.classname changes. # ____ !relimport2name # nbdev_build_docs builds the documentation from the notebooks # ____ !nbdev_build_docs --force_all True --mk_readme True #hide """ ! git add * ! git config --global user.name "bnia" ! git config --global user.email "charles.karpati@gmail.com" ! git commit -m "initial commit" # git push -f origin master ! git push -u ORIGIN main """#hide ! pip install twine # ! nbdev_bump_version ! make pypi