--- title: Utils keywords: fastai sidebar: home_sidebar summary: "General utilities. Should probably split up into `utils.time` and `utils.download`" description: "General utilities. Should probably split up into `utils.time` and `utils.download`" nb_path: "notebooks/01_utils.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Time format strings

First, we define the different format strings these utils convert from and to.

An identifier with xxx_dt_format_xxx in its name signifies a full datetime format as compared to dates only.

{% raw %}
{% endraw %}

NASA date to datetime and ISO

What we call NASA data, is the often used YYYY-JJJ based format in the Planetary Data System identifying dates via the running number of the day in the year, e.g. "2010-240".

{% raw %}
{% endraw %} {% raw %}

nasa_time_to_datetime[source]

nasa_time_to_datetime(inputstr)

User function to convert all kinds of NASA PDS datestrings with day_of_year into datetimes.
Type Default Details
inputstr inputstr of format YYYY-jjj, YYYY-jjjTHH:MM:SS or YYYY-jjjTHH:MM:SS.ffffff
Returns datetime
{% endraw %} {% raw %}
{% endraw %}

Example dates and times to test:

{% raw %}
nasa_date = "2010-110"
iso_date = "2010-4-20"
nasa_datetime = "2010-110T10:12:14"
nasa_datetime_with_ms = nasa_datetime + ".123000"
iso_datetime = "2010-04-20T10:12:14"
iso_datetime_with_ms = iso_datetime + ".123000"
nasa_times = [nasa_date, nasa_datetime, nasa_datetime_with_ms]
iso_times = [iso_date, iso_datetime, iso_datetime_with_ms]
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_date) == dt.datetime(2010, 4, 20, 0, 0)
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_datetime) == dt.datetime(2010, 4, 20, 10, 12, 14)
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_datetime_with_ms) == dt.datetime(
    2010, 4, 20, 10, 12, 14, 123000
)
{% endraw %} {% raw %}

nasa_time_to_iso[source]

nasa_time_to_iso(inputstr:str, with_hours:bool=False)

Convert the day-number based NASA datetime format to ISO
Type Default Details
inputstr str No Content
with_hours bool False Switch if return is wanted with hours (i.e. isoformat)
Returns str Datestring in ISO-format.
{% endraw %} {% raw %}
{% endraw %}

Conversions to ISO format, but not providing hours if they are not in input:

{% raw %}
for t in nasa_times:
    print("Input:", t)
    print(nasa_time_to_iso(t))
Input: 2010-110
2010-04-20
Input: 2010-110T10:12:14
2010-04-20T10:12:14
Input: 2010-110T10:12:14.123000
2010-04-20T10:12:14.123000
{% endraw %}

If hours are wanted in the isostring, use with_hours=True:

{% raw %}
for t in nasa_times:
    print("Input:", t)
    print(nasa_time_to_iso(t, with_hours=True))
Input: 2010-110
2010-04-20T00:00:00
Input: 2010-110T10:12:14
2010-04-20T10:12:14
Input: 2010-110T10:12:14.123000
2010-04-20T10:12:14.123000
{% endraw %} {% raw %}
assert nasa_time_to_iso(nasa_date, with_hours=True) == "2010-04-20T00:00:00"
assert nasa_time_to_iso(nasa_date) == "2010-04-20"
{% endraw %}

ISO date to "NASA-format"

Again, with NASA-format, we mean the ofen used (in PDS and mission files) YYYY-JJJ format, e.g. "2010-240".

{% raw %}

iso_to_nasa_time[source]

iso_to_nasa_time(inputstr:str)

Convert iso date to day-number based NASA date.
Type Default Details
inputstr str Date string of the form Y-m-d
Returns str Datestring in NASA standard yyyy-jjj
{% endraw %} {% raw %}
{% endraw %} {% raw %}

iso_to_nasa_datetime[source]

iso_to_nasa_datetime(dtimestr:str)

Convert iso datetime to day-number based NASA datetime.
Type Default Details
dtimestr str Datetime string of the form yyyy-mm-ddTHH-MM-SS
{% endraw %} {% raw %}
{% endraw %} {% raw %}
for t in iso_times:
    print("Input:", t)
    print(iso_to_nasa_time(t))
Input: 2010-4-20
2010-110
Input: 2010-04-20T10:12:14
2010-110T10:12:14
Input: 2010-04-20T10:12:14.123000
2010-110T10:12:14.123000
{% endraw %} {% raw %}
assert iso_to_nasa_time(iso_date) == nasa_date
{% endraw %} {% raw %}
assert nasa_time_to_iso(nasa_datetime) == iso_datetime
assert nasa_time_to_iso(nasa_datetime_with_ms) == iso_datetime_with_ms
{% endraw %} {% raw %}
assert iso_to_nasa_time(iso_datetime) == nasa_datetime
assert iso_to_nasa_time(iso_datetime_with_ms) == nasa_datetime_with_ms
{% endraw %} {% raw %}

replace_all_nasa_times[source]

replace_all_nasa_times(df:DataFrame)

Find all NASA times in dataframe and replace with ISO.

Changes will be implemented on incoming dataframe!

This will be done for all columns with the word TIME in the column name.
Type Default Details
df DataFrame DataFrame with NASA time columns
{% endraw %} {% raw %}
{% endraw %}

Network utils

{% raw %}

parse_http_date[source]

parse_http_date(text:str)

Parse date string retrieved via urllib.request.
Type Default Details
text str datestring from urllib.request
Returns datetime dt.datetime object from given datetime string
{% endraw %} {% raw %}

get_remote_timestamp[source]

get_remote_timestamp(url:str)

Get the timestamp of a remote file.

Useful for checking if there's an updated file available.
Type Default Details
url str URL to check timestamp for
Returns datetime
{% endraw %} {% raw %}

check_url_exists[source]

check_url_exists(url)

{% endraw %} {% raw %}

url_retrieve[source]

url_retrieve(url:str, outfile:str, chunk_size:int=128)

Improved urlretrieve with progressbar, timeout and chunker.

This downloader has built-in progress bar using tqdm and using the `requests`
package it improves standard `urllib` behavior by adding time-out capability.

I tested different chunk_sizes and most of the time 128 was actually fastest, YMMV.

Inspired by https://stackoverflow.com/a/61575758/680232
Type Default Details
url str The URL to download
outfile str The path where to store the downloaded file.
chunk_size int 128 The size of the chunk for the request.iter_content call. Default: 128
{% endraw %} {% raw %}

have_internet[source]

have_internet()

Fastest way to check for active internet connection.

From https://stackoverflow.com/a/29854274/680232
{% endraw %} {% raw %}
{% endraw %}

Image processing helpers

{% raw %}

height_from_shadow[source]

height_from_shadow(shadow_in_pixels:float, sun_elev:float)

Calculate height of an object from its shadow length.

Note, that your image might have been binned.
You need to correct `shadow_in_pixels` for that.
Type Default Details
shadow_in_pixels float Measured length of shadow in pixels
sun_elev float Ange of sun over horizon in degrees
Returns float Height [meter]
{% endraw %} {% raw %}

get_gdal_center_coords[source]

get_gdal_center_coords(imgpath:Union[str, Path])

Get center rows/cols pixel coordinate for GDAL-readable dataset.

Check CLI `gdalinfo --formats` to see all formats that GDAL can open.
Type Default Details
imgpath Path] Path to raster image that is readable by GDLA
Returns Tuple[int, int] center row/col coordinates.
{% endraw %} {% raw %}

file_variations[source]

file_variations(filename:Union[str, Path], extensions:list)

Create a variation of file names.

Generate a list of variations on a filename by replacing the extension with
the provided list.

Adapted from T. Olsens `file_variations of the pysis module for using pathlib.
Type Default Details
filename Path] The original filename to use as a base.
extensions list No Content
Returns list list of Paths
{% endraw %} {% raw %}
{% endraw %} {% raw %}
fname = "abc.txt"
{% endraw %} {% raw %}
extensions = ".cub .cal.cub .map.cal.cub".split()
{% endraw %} {% raw %}
file_variations(fname, extensions)
[Path('abc.cub'), Path('abc.cal.cub'), Path('abc.map.cal.cub')]
{% endraw %} {% raw %}
assert len(extensions) == len(file_variations(fname, extensions))
{% endraw %}