API Reference

intake_parquet.source.ParquetSource(urlpath) Source to load parquet datasets.
class intake_parquet.source.ParquetSource(urlpath, metadata=None, storage_options=None, **parquet_kwargs)[source]

Source to load parquet datasets.

Produces a dataframe.

A parquet dataset may be a single file, a set of files in a single directory or a nested set of directories containing data-files.

Current implementation uses fastparquet: URL should either point to a single file, a directory containing a _metadata file, or a list of data files.

Keyword parameters accepted by this Source:

  • columns: list of str or None
    column names to load. If None, loads all
  • index: str or None
    column to make into the index of the dataframe. If None, may be inferred from the saved matadata in certain cases.
  • filters: list of tuples
    row-group level filtering; a tuple like ('x', '>', 1) would mean that if a row-group has a maximum value less than 1 for the column x, then it will be skipped. Row-level filtering is not performed.
Attributes:
cache_dirs
datashape
description
hvplot

Returns a hvPlot object to provide a high-level plotting API.

plot

Returns a hvPlot object to provide a high-level plotting API.

Methods

close() Close open resources corresponding to this data source.
discover() Open resource and populate the source attributes.
read() Create single pandas dataframe from the whole data-set
read_chunked() Return iterator over container fragments of data source
read_partition(i) Return a (offset_tuple, container) corresponding to i-th partition.
to_dask() Create a lazy dask-dataframe from the parquet data
yaml([with_plugin]) Return YAML representation of this data-source
set_cache_dir  
read()[source]

Create single pandas dataframe from the whole data-set

to_dask()[source]

Create a lazy dask-dataframe from the parquet data