API Reference¶
intake_parquet.source.ParquetSource (urlpath) |
Source to load parquet datasets. |
-
class
intake_parquet.source.
ParquetSource
(urlpath, metadata=None, storage_options=None, **parquet_kwargs)[source]¶ Source to load parquet datasets.
Produces a dataframe.
A parquet dataset may be a single file, a set of files in a single directory or a nested set of directories containing data-files.
Current implementation uses fastparquet: URL should either point to a single file, a directory containing a _metadata file, or a list of data files.
Keyword parameters accepted by this Source:
- columns: list of str or None
- column names to load. If None, loads all
- index: str or None
- column to make into the index of the dataframe. If None, may be inferred from the saved matadata in certain cases.
- filters: list of tuples
- row-group level filtering; a tuple like
('x', '>', 1)
would mean that if a row-group has a maximum value less than 1 for the columnx
, then it will be skipped. Row-level filtering is not performed.
Attributes: - cache_dirs
- datashape
- description
hvplot
Returns a hvPlot object to provide a high-level plotting API.
plot
Returns a hvPlot object to provide a high-level plotting API.
Methods
close
()Close open resources corresponding to this data source. discover
()Open resource and populate the source attributes. read
()Create single pandas dataframe from the whole data-set read_chunked
()Return iterator over container fragments of data source read_partition
(i)Return a (offset_tuple, container) corresponding to i-th partition. to_dask
()Create a lazy dask-dataframe from the parquet data yaml
([with_plugin])Return YAML representation of this data-source set_cache_dir