--- title: Utils keywords: fastai sidebar: home_sidebar summary: "General utilities. Should probably split up into `utils.time` and `utils.download`" description: "General utilities. Should probably split up into `utils.time` and `utils.download`" nb_path: "notebooks/01_utils.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Time format strings

First, we define the different format strings these utils convert from and to.

An identifier with xxx_dt_format_xxx in its name signifies a full datetime format as compared to dates only.

{% raw %}
{% endraw %}

NASA date to datetime and ISO

What we call NASA data, is the often used YYYY-JJJ based format in the Planetary Data System identifying dates via the running number of the day in the year, e.g. "2010-240".

{% raw %}
{% endraw %} {% raw %}

nasa_time_to_datetime[source]

nasa_time_to_datetime(inputstr)

User function to convert all kinds of NASA PDS datestrings with day_of_year into datetimes.
{% endraw %} {% raw %}
{% endraw %}

Example dates and times to test:

{% raw %}
nasa_date = "2010-110"
iso_date = "2010-4-20"
nasa_datetime = "2010-110T10:12:14"
nasa_datetime_with_ms = nasa_datetime + ".123000"
iso_datetime = "2010-04-20T10:12:14"
iso_datetime_with_ms = iso_datetime + ".123000"
nasa_times = [nasa_date, nasa_datetime, nasa_datetime_with_ms]
iso_times = [iso_date, iso_datetime, iso_datetime_with_ms]
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_date) == dt.datetime(2010, 4, 20, 0, 0)
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_datetime) == dt.datetime(2010, 4, 20, 10, 12, 14)
{% endraw %} {% raw %}
assert nasa_time_to_datetime(nasa_datetime_with_ms) == dt.datetime(
    2010, 4, 20, 10, 12, 14, 123000
)
{% endraw %} {% raw %}

nasa_time_to_iso[source]

nasa_time_to_iso(inputstr:str, with_hours:bool=False)

Convert the day-number based NASA datetime format to ISO
{% endraw %} {% raw %}
{% endraw %}

Conversions to ISO format, but not providing hours if they are not in input:

{% raw %}
for t in nasa_times:
    print("Input:", t)
    print(nasa_time_to_iso(t))
Input: 2010-110
2010-04-20
Input: 2010-110T10:12:14
2010-04-20T10:12:14
Input: 2010-110T10:12:14.123000
2010-04-20T10:12:14.123000
{% endraw %}

If hours are wanted in the isostring, use with_hours=True:

{% raw %}
for t in nasa_times:
    print("Input:", t)
    print(nasa_time_to_iso(t, with_hours=True))
Input: 2010-110
2010-04-20T00:00:00
Input: 2010-110T10:12:14
2010-04-20T10:12:14
Input: 2010-110T10:12:14.123000
2010-04-20T10:12:14.123000
{% endraw %} {% raw %}
assert nasa_time_to_iso(nasa_date, with_hours=True) == "2010-04-20T00:00:00"
assert nasa_time_to_iso(nasa_date) == "2010-04-20"
{% endraw %}

ISO date to "NASA-format"

Again, with NASA-format, we mean the ofen used (in PDS and mission files) YYYY-JJJ format, e.g. "2010-240".

{% raw %}

iso_to_nasa_time[source]

iso_to_nasa_time(inputstr:str)

Convert iso date to day-number based NASA date.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

iso_to_nasa_datetime[source]

iso_to_nasa_datetime(dtimestr:str)

Convert iso datetime to day-number based NASA datetime.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
for t in iso_times:
    print("Input:", t)
    print(iso_to_nasa_time(t))
Input: 2010-4-20
2010-110
Input: 2010-04-20T10:12:14
2010-110T10:12:14
Input: 2010-04-20T10:12:14.123000
2010-110T10:12:14.123000
{% endraw %} {% raw %}
assert iso_to_nasa_time(iso_date) == nasa_date
{% endraw %} {% raw %}
assert nasa_time_to_iso(nasa_datetime) == iso_datetime
assert nasa_time_to_iso(nasa_datetime_with_ms) == iso_datetime_with_ms
{% endraw %} {% raw %}
assert iso_to_nasa_time(iso_datetime) == nasa_datetime
assert iso_to_nasa_time(iso_datetime_with_ms) == nasa_datetime_with_ms
{% endraw %} {% raw %}

replace_all_nasa_times[source]

replace_all_nasa_times(df:DataFrame)

Find all NASA times in dataframe and replace with ISO.

Changes will be implemented on incoming dataframe!

This will be done for all columns with the word TIME in the column name.
{% endraw %} {% raw %}
{% endraw %}

Network utils

{% raw %}

parse_http_date[source]

parse_http_date(text:str)

Parse date string retrieved via urllib.request.
{% endraw %} {% raw %}

get_remote_timestamp[source]

get_remote_timestamp(url:str)

Get the timestamp of a remote file.

Useful for checking if there's an updated file available.
{% endraw %} {% raw %}

url_retrieve[source]

url_retrieve(url:str, outfile:str, chunk_size:int=128)

Improved urlretrieve with progressbar, timeout and chunker.

This downloader has built-in progress bar using tqdm and using the `requests`
package it improves standard `urllib` behavior by adding time-out capability.

I tested different chunk_sizes and most of the time 128 was actually fastest, YMMV.

Inspired by https://stackoverflow.com/a/61575758/680232
{% endraw %} {% raw %}

have_internet[source]

have_internet()

Fastest way to check for active internet connection.

From https://stackoverflow.com/a/29854274/680232
{% endraw %} {% raw %}
{% endraw %}

Image processing helpers

{% raw %}

height_from_shadow[source]

height_from_shadow(shadow_in_pixels:float, sun_elev:float)

Calculate height of an object from its shadow length.

Note, that your image might have been binned.
You need to correct `shadow_in_pixels` for that.
{% endraw %} {% raw %}

get_gdal_center_coords[source]

get_gdal_center_coords(imgpath:Union[str, Path])

Get center rows/cols pixel coordinate for GDAL-readable dataset.

Check CLI `gdalinfo --formats` to see all formats that GDAL can open.
{% endraw %} {% raw %}

file_variations[source]

file_variations(filename:Union[str, Path], extensions:list)

Create a variation of file names.

Generate a list of variations on a filename by replacing the extension with
the provided list.

Adapted from T. Olsens `file_variations of the pysis module for using pathlib.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
fname = "abc.txt"
{% endraw %} {% raw %}
extensions = ".cub .cal.cub .map.cal.cub".split()
{% endraw %} {% raw %}
file_variations(fname, extensions)
[Path('abc.cub'), Path('abc.cal.cub'), Path('abc.map.cal.cub')]
{% endraw %} {% raw %}
assert len(extensions) == len(file_variations(fname, extensions))
{% endraw %}