# default_exp intaker

⚠️ This writing is a work in progress. The functions work. ⚠️

This Coding Notebook is the 1.5th in a series.

An Interactive version can be found here Open In Colab.

This colab and more can be found on our webpage.

Binder Binder Binder Open Source Love svg3

NPM License Active Python Versions GitHub last commit No Maintenance Intended

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

About this Tutorial:

Whats inside?

The Tutorial

In this notebook, the basics of data-intake are introduced.

Objectives

By the end of this tutorial users should have an understanding of:

Background

Importing Data with Colabs:

Instructions: Read all text and execute all code in order.

How XYZ :

If you would like to ...

For this next example to work, we will need to import hypothetical csv files

Try It! Go ahead and try running the cell below.

#hide !pip install nbdev from nbdev.showdoc import *

Advanced

#export import geopandas as gpd import numpy as np import pandas as pd from dataplay import geoms# hide pd.set_option('max_colwidth', 20) pd.set_option('display.expand_frame_repr', False) pd.set_option('display.precision', 2)# Can read in a CSV URL but uses dataplay.geom.readInGeometryData() for Geojson endpoints. # Otherwise this tool assumes shp or pgeojson files have geom='geometry', in_crs=2248. # Depending on interactivity the values should be # coerce fillna(-1321321321321325) # Returns # export class Intake: # 1. Recursively calls self/getData until something valid is given. # Returns df or False. Calls readInGeometryData. or pulls csv directly. # Returns df or False. def getData(url, interactive=False): escapeQuestionFlags = ["no", '', 'none'] if ( Intake.isPandas(url) ): return url if (str(url).lower() in escapeQuestionFlags ): return False if interactive: print('Getting Data From: ', url) try: if ([ele for ele in ['pgeojson', 'shp', 'geojson'] if(ele in url)]): df = geoms.readInGeometryData(url=url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=2248, out_crs=False) elif ('csv' in url): df = pd.read_csv( url ) return df except: if interactive: return Intake.getData(input("Error: Try Again? ( URL/ PATH or 'NO'/ ) " ), interactive) return False # 1ai. A misnomer. Returns Bool. def isPandas(df): return isinstance(df, pd.DataFrame) or isinstance(df, gpd.GeoDataFrame) or isinstance(df, tuple) # a1. Used by Merge Lib. Returns valid (df, column) or (df, False) or (False, False). def getAndCheck(url, col='geometry', interactive=False): df = Intake.getData(url, interactive) # Returns False or df if ( not Intake.isPandas(df) ): if(interactive): print('No data was retrieved.', df) return False, False if (isinstance(col, list)): for colm in col: if not Intake.getAndCheckColumn(df, colm): if(interactive): print('Exiting. Error on the column: ', colm) return df, False newcol = Intake.getAndCheckColumn(df, col, interactive) # Returns False or col if (not newcol): if(interactive): print('Exiting. Error on the column: ', col) return df, col return df, newcol # a2. Returns Bool def checkColumn(dataset, column): return {column}.issubset(dataset.columns) # b1. Used by Merge Lib. Returns Both Datasets and Coerce Status def coerce(ds1, ds2, col1, col2, interactive): ds1, ldt, lIsNum = Intake.getdTypeAndFillNum(ds1, col1, interactive) ds2, rdt, rIsNum = Intake.getdTypeAndFillNum(ds2, col2, interactive) ds2 = Intake.coerceDtypes(lIsNum, rdt, ds2, col2, interactive) ds1 = Intake.coerceDtypes(rIsNum, ldt, ds1, col1, interactive) # Return the data and the coerce status return ds1, ds2, (ds1[col1].dtype == ds2[col2].dtype) # b2. Used by Merge Lib. fills na with crazy number def getdTypeAndFillNum(ds, col, interactive): dt = ds[col].dtype isNum = dt == 'float64' or dt == 'int64' if isNum: ds[col] = ds[col].fillna(-1321321321321325) return ds, dt, isNum # b3. Used by Merge Lib. def coerceDtypes(isNum, dt, ds, col, interactive): if isNum and dt == 'object': if(interactive): print('Converting Key from Object to Int' ) ds[col] = pd.to_numeric(ds[col], errors='coerce') if interactive: print('Converting Key from Int to Float' ) ds[col] = ds[col].astype(float) return ds # a3. Returns False or col. Interactive calls self def getAndCheckColumn(df, col, interactive): if Intake.checkColumn(df, col) : return col if (not interactive): return False else: print("Invalid column given: ", col); print(df.columns); print("Please enter a new column fom the list above."); col = input("Column Name: " ) return Intake.getAndCheckColumn(df, col, interactive);u = Intake rdf = Intake.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson') rdf.head(1)
OBJECTID CSA2010 hhchpov15 hhchpov16 hhchpov17 hhchpov18 hhchpov19 Shape__Area Shape__Length geometry
0 1 Allendale/Irving... 38.93 34.73 32.77 35.27 32.6 6.38e+07 38770.17 POLYGON ((-76.65...

Here we can save the data so that it may be used in later tutorials.

# string = 'test_save_data_with_geom_and_csa' # .to_csv(string+'.csv', encoding="utf-8", index=False, quoting=csv.QUOTE_ALL)

Download data by:

You can upload this data into the next tutorial in one of two ways.

OR.

Here are some examples:

Using Esri and the Geoms handler directly:

import dataplay geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson" geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False) geoloom_gdf = geoloom_gdf.dropna(subset=['geometry']) geoloom_gdf.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

Again but with the Intake class:

u = Intake Geoloom_Crowd, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson') Geoloom_Crowd.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

This getAndCheck function is usefull for checking for a required field.

Hhpov, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson', 'hhpov19', True) Hhpov = Hhpov[['CSA2010', 'hhpov15', 'hhpov16', 'hhpov17', 'hhpov18', 'hhpov19']] # Hhpov.to_csv('Hhpov.csv') Hhpov.head()Getting Data From: https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson
CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 Canton 3.00 2.26 3.66 2.05 2.22

We could also retrieve from a file.

u = Intake # rdf = u.getData('Hhpov.csv') rdf.head()
Unnamed: 0 CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 4 Canton 3.00 2.26 3.66 2.05 2.22
# default_exp intaker

⚠️ This writing is a work in progress. The functions work. ⚠️

This Coding Notebook is the 1.5th in a series.

An Interactive version can be found here Open In Colab.

This colab and more can be found on our webpage.

Binder Binder Binder Open Source Love svg3

NPM License Active Python Versions GitHub last commit No Maintenance Intended

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

About this Tutorial:

Whats inside?

The Tutorial

In this notebook, the basics of data-intake are introduced.

Objectives

By the end of this tutorial users should have an understanding of:

Background

Importing Data with Colabs:

Instructions: Read all text and execute all code in order.

How XYZ :

If you would like to ...

For this next example to work, we will need to import hypothetical csv files

Try It! Go ahead and try running the cell below.

#hide !pip install nbdev from nbdev.showdoc import *

Advanced

#export import geopandas as gpd import numpy as np import pandas as pd from dataplay import geoms# hide pd.set_option('max_colwidth', 20) pd.set_option('display.expand_frame_repr', False) pd.set_option('display.precision', 2)# Can read in a CSV URL but uses dataplay.geom.readInGeometryData() for Geojson endpoints. # Otherwise this tool assumes shp or pgeojson files have geom='geometry', in_crs=2248. # Depending on interactivity the values should be # coerce fillna(-1321321321321325) # Returns # export class Intake: # 1. Recursively calls self/getData until something valid is given. # Returns df or False. Calls readInGeometryData. or pulls csv directly. # Returns df or False. def getData(url, interactive=False): escapeQuestionFlags = ["no", '', 'none'] if ( Intake.isPandas(url) ): return url if (str(url).lower() in escapeQuestionFlags ): return False if interactive: print('Getting Data From: ', url) try: if ([ele for ele in ['pgeojson', 'shp', 'geojson'] if(ele in url)]): df = geoms.readInGeometryData(url=url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=2248, out_crs=False) elif ('csv' in url): df = pd.read_csv( url ) return df except: if interactive: return Intake.getData(input("Error: Try Again? ( URL/ PATH or 'NO'/ ) " ), interactive) return False # 1ai. A misnomer. Returns Bool. def isPandas(df): return isinstance(df, pd.DataFrame) or isinstance(df, gpd.GeoDataFrame) or isinstance(df, tuple) # a1. Used by Merge Lib. Returns valid (df, column) or (df, False) or (False, False). def getAndCheck(url, col='geometry', interactive=False): df = Intake.getData(url, interactive) # Returns False or df if ( not Intake.isPandas(df) ): if(interactive): print('No data was retrieved.', df) return False, False if (isinstance(col, list)): for colm in col: if not Intake.getAndCheckColumn(df, colm): if(interactive): print('Exiting. Error on the column: ', colm) return df, False newcol = Intake.getAndCheckColumn(df, col, interactive) # Returns False or col if (not newcol): if(interactive): print('Exiting. Error on the column: ', col) return df, col return df, newcol # a2. Returns Bool def checkColumn(dataset, column): return {column}.issubset(dataset.columns) # b1. Used by Merge Lib. Returns Both Datasets and Coerce Status def coerce(ds1, ds2, col1, col2, interactive): ds1, ldt, lIsNum = Intake.getdTypeAndFillNum(ds1, col1, interactive) ds2, rdt, rIsNum = Intake.getdTypeAndFillNum(ds2, col2, interactive) ds2 = Intake.coerceDtypes(lIsNum, rdt, ds2, col2, interactive) ds1 = Intake.coerceDtypes(rIsNum, ldt, ds1, col1, interactive) # Return the data and the coerce status return ds1, ds2, (ds1[col1].dtype == ds2[col2].dtype) # b2. Used by Merge Lib. fills na with crazy number def getdTypeAndFillNum(ds, col, interactive): dt = ds[col].dtype isNum = dt == 'float64' or dt == 'int64' if isNum: ds[col] = ds[col].fillna(-1321321321321325) return ds, dt, isNum # b3. Used by Merge Lib. def coerceDtypes(isNum, dt, ds, col, interactive): if isNum and dt == 'object': if(interactive): print('Converting Key from Object to Int' ) ds[col] = pd.to_numeric(ds[col], errors='coerce') if interactive: print('Converting Key from Int to Float' ) ds[col] = ds[col].astype(float) return ds # a3. Returns False or col. Interactive calls self def getAndCheckColumn(df, col, interactive): if Intake.checkColumn(df, col) : return col if (not interactive): return False else: print("Invalid column given: ", col); print(df.columns); print("Please enter a new column fom the list above."); col = input("Column Name: " ) return Intake.getAndCheckColumn(df, col, interactive);u = Intake rdf = Intake.getData('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson') rdf.head(1)
OBJECTID CSA2010 hhchpov15 hhchpov16 hhchpov17 hhchpov18 hhchpov19 Shape__Area Shape__Length geometry
0 1 Allendale/Irving... 38.93 34.73 32.77 35.27 32.6 6.38e+07 38770.17 POLYGON ((-76.65...

Here we can save the data so that it may be used in later tutorials.

# string = 'test_save_data_with_geom_and_csa' # .to_csv(string+'.csv', encoding="utf-8", index=False, quoting=csv.QUOTE_ALL)

Download data by:

You can upload this data into the next tutorial in one of two ways.

OR.

Here are some examples:

Using Esri and the Geoms handler directly:

import dataplay geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson" geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=4326, out_crs=False) geoloom_gdf = geoloom_gdf.dropna(subset=['geometry']) geoloom_gdf.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

Again but with the Intake class:

u = Intake Geoloom_Crowd, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson') Geoloom_Crowd.head(1)
OBJECTID Data_type Attach ProjNm Descript Location URL Name PhEmail Comments POINT_X POINT_Y GlobalID geometry
0 1 Artists & Resources None Joe Test 123 Market Pl, B... -8.53e+06 4.76e+06 e59b4931-e0c8-4d... POINT (-76.60661...

This getAndCheck function is usefull for checking for a required field.

Hhpov, rcol = u.getAndCheck('https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson', 'hhpov19', True) Hhpov = Hhpov[['CSA2010', 'hhpov15', 'hhpov16', 'hhpov17', 'hhpov18', 'hhpov19']] # Hhpov.to_csv('Hhpov.csv') Hhpov.head()Getting Data From: https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson
CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 Canton 3.00 2.26 3.66 2.05 2.22

We could also retrieve from a file.

u = Intake # rdf = u.getData('Hhpov.csv') rdf.head()
Unnamed: 0 CSA2010 hhpov15 hhpov16 hhpov17 hhpov18 hhpov19
0 0 Allendale/Irving... 24.15 21.28 20.70 23.00 19.18
1 1 Beechfield/Ten H... 11.17 11.59 10.47 10.90 8.82
2 2 Belair-Edison 18.61 19.59 20.27 22.83 22.53
3 3 Brooklyn/Curtis ... 28.36 26.33 24.21 21.54 24.60
4 4 Canton 3.00 2.26 3.66 2.05 2.22