transparentpath.gcsutils package¶
Submodules¶
transparentpath.gcsutils.methodtranslator module¶
-
class
transparentpath.gcsutils.methodtranslator.
MethodTranslator
(first_name: str, second_name: str, kwarg_names: Dict[str, str] = None)¶ Bases:
object
-
translate
(*args: Tuple, **kwargs: Dict) → [<class ‘str’>, typing.Tuple, typing.Dict]¶ translate the method
- Parameters
*args (Tuple) –
**kwargs (Dict) –
- Returns
The translated method name along with the given args and the translated kwargs
- Return type
[str, Tuple, Dict]
-
translate_str
(*args: Tuple, **kwargs: Dict) → str¶ Tranlate the method as a string
- Parameters
*args (Tuple) –
**kwargs (Dict) –
- Returns
The string of the translated method new_method(arg1, arg2…, kwargs1=val1, translated_kwargs2=val2…)
- Return type
str
-
-
class
transparentpath.gcsutils.methodtranslator.
MultiMethodTranslator
(first_name: str, cases: List[str], second_names: List[str], kwargs_names: [typing.List[typing.Dict[str, str]]] = None)¶ Bases:
object
-
translate
(case: str, *args: Tuple, **kwargs: Dict) → [<class ‘str’>, typing.Tuple, typing.Dict]¶ translate the method according to a case
- Parameters
case (str) – The name of the translation case to use
*args (Tuple) –
**kwargs (Dict) –
- Returns
The translated method name along with the given args and the translated kwargs
- Return type
[str, Tuple, Dict]
-
translate_str
(case: str, *args: Tuple, **kwargs: Dict) → str¶ Tranlate the method as a string according to a case
- Parameters
case (str) – The name of the translation case to use
*args (Tuple) –
**kwargs (Dict) –
- Returns
The string of the translated method new_method(arg1, arg2…, kwargs1=val1, translated_kwargs2=val2…)
- Return type
str
-
transparentpath.gcsutils.transparentpath module¶
-
exception
transparentpath.gcsutils.transparentpath.
MultipleExistenceError
(path, ls)¶ Bases:
Exception
Exception raised when a path’s destination already contain more than one element.
-
class
transparentpath.gcsutils.transparentpath.
MyHDFFile
(*args, remote: Optional[transparentpath.gcsutils.transparentpath.TransparentPath] = None, **kwargs)¶ Bases:
h5py._hl.files.File
Class to override h5py.File to handle files on GCS.
This allows to do : >>> from transparentpath import TransparentPath >>> import numpy as np >>> TransparentPath.set_global_fs(“gcs”, bucket=”bucket_name”, project=”project_name”) >>> path = TransparentPath(“chien.hdf5”) >>> >>> with path.write() as ifile: >>> ifile[“data”] = np.array([1, 2])
-
class
transparentpath.gcsutils.transparentpath.
MyHDFStore
(*args, remote: Optional[transparentpath.gcsutils.transparentpath.TransparentPath] = None, **kwargs)¶ Bases:
pandas.io.pytables.HDFStore
Same as MyHDFFile but for pd.HDFStore objects
-
class
transparentpath.gcsutils.transparentpath.
Myzipfile
(path, *args, **kwargs)¶ Bases:
zipfile.ZipFile
Overload of ZipFile class to handle files on GCS
-
class
transparentpath.gcsutils.transparentpath.
TransparentPath
(path: Union[pathlib.Path, transparentpath.gcsutils.transparentpath.TransparentPath, str] = '.', nocheck: bool = False, collapse: bool = True, fs: Optional[str] = None, bucket: Optional[str] = None, project: Optional[str] = None, **kwargs)¶ Bases:
os.PathLike
Class that allows one to use a path in a local file system or a gcs file system (more or less) in almost the same way one would use a pathlib.Path object. All instances of TransparentPath are absolute, even if created with relative paths.
Doing ‘isinstance(path, str)’ with a TransparentPath will return True (required to allow ‘open(TransparentPath()’ to work). If you want to check whether path is actually a TransparentPath and nothing else, use ‘type(path) == TransparentPath’ instead.
If using a local file system, you do not have to set anything, just instantiate your paths like that:
>>> # noinspection PyShadowingNames >>> from transparentpath import TransparentPath as Path >>> mypath = Path("foo") / "bar" >>> other_path = mypath / "stuff"
If using GCS, you will have to provide a bucket, and a project name. You can either use the class ‘set_global_fs’ method, or specify the appropriate keywords when calling your first path. Then all the other paths will use the same file system.
So either do >>> # noinspection PyShadowingNames >>> from transparentpath import TransparentPath as Path >>> Path.set_global_fs(‘gcs’, bucket=”my_bucket_name”, project=”my_project”) >>> mypath = Path(“foo”) # will use GCS >>> other_path = Path(“foo2”) # will use GCS too Or >>> # noinspection PyShadowingNames >>> from transparentpath import TransparentPath as Path >>> mypath = Path(“foo”, fs=’gcs’, bucket=”my_bucket_name”, project=”my_project”) >>> other_path = Path(“foo2”) # will use GCS too
Note that your script must be able to log to GCS somehow. I generally use a service account with credentials stored in a json file, and add the envirronement variable ‘GOOGLE_APPLICATION_CREDENTIALS=path_to_project_cred.json’ in my .bashrc. I haven’t tested any other method, but I guess that as long as gsutil works, TransparentPath will too.
Since the bucket name is provided in set_fs or set_global_fs, you must not specify it in your paths. Do not specify ‘gs://’ either, it is added when/if needed. Also, you should never create a directory with the same name as your current bucket.
If your directories architecture on GCS is the same than localy up to some root directory, you can do: >>> # noinspection PyShadowingNames >>> from transparentpath import TransparentPath as Path >>> Path.nas_dir = “/media/SERVEUR” # it is the default value, but reset here for example >>> Path.set_global_fs(“gcs”, bucket=”my_bucket”, project=”my_project”) >>> p = Path(“/media/SERVEUR”) / “chien” / “chat” # Will be gs://my_bucket/chien/chat If the line ‘Path.set_global_fs(…’ is not commented out, the resulting path will be ‘gs://my_bucket/chien/chat’. If the line ‘Path.set_global_fs(…’ is commented out, the resulting path will be ‘/media/SERVEUR/chien/chat’. This allows you to create codes that can run identically both localy and on gcs, the only difference being the line ‘Path.set_global_fs(…’.
Any method or attribute valid in fsspec.implementations.local.LocalFileSystem, gcs.GCSFileSystem or pathlib.Path can be used on a TransparentPath object. However, setting an attribute is not transparent : if, for example, you want to change the path’s name, you need to do
>>> p.path.name = "new_name"
instead of ‘p.name = “new_name”’. ‘p.path’ points to the underlying pathlib.Path object.
TransparentPath has built-in read and write methods that recognize the file’s suffix to call the appropriate method (csv, parquet, hdf5, json or open). It has a built-in override of open, which allows you to pass a TransparentPath to python’s open method.
- WARNINGS if you use GCS:
1: Remember that directories are not a thing on GCS.
2: The is_dir() method exists but, on GCS, only makes sense if tested on a part of an existing path, i.e not on a leaf.
3: You do not need the parent directories of a file to create the file : they will be created if they do not exist (that is not true localy however).
4: If you delete a file that was alone in its parent directories, those directories disapear.
5: Since most of the times we use is_dir() we want to check whether a directry exists to write in it, by default the is_dir() method will return True if the directory does not exists on GCS (see point 3)(will still return false if using a local file system).
6: The only case is_dir() will return False is if a file with the same name exists (localy, behavior is straightforward).
7: To actually check whether the directory exists (for, like, reading from it), add the kwarg ‘exist=True’ to is_dir() if using GCS.
8: If a file exists with the same path than a directory, then the class is not able to know which one is the file and which one is the directory, and will raise a MultipleExistenceError at object creation. Will also check for multiplicity at almost every method in case an exterior source created a duplicate of the file/directory.
If a method in a package you did not create uses the os.open(), you will have to create a class to override this method and anything using its ouput. Indeed os.open returns a file descriptor, not an IO, and I did not find a way to access file descriptors on gcs. For example, in the FileLock package, the acquire() method calls the _acquire() method which calls os.open(), so I had to do that:
>>> from filelock import FileLock >>> from transparentpath import TransparentPath as Path >>> >>> class MyFileLock(FileLock): >>> def _acquire(self): >>> tmp_lock_file = self._lock_file >>> if not type(tmp_lock_file) == Path: >>> tmp_lock_file = Path(tmp_lock_file) >>> try: >>> fd = tmp_lock_file.open("x") >>> except (IOError, OSError, FileExistsError): >>> pass >>> else: >>> self._lock_file_fd = fd >>> return None
The original method was:
>>> def _acquire(self): >>> open_mode = os.O_WRONLY | os.O_CREAT | os.O_EXCL | os.O_TRUNC >>> try: >>> fd = os.open(self._lock_file, open_mode) >>> except (IOError, OSError): >>> pass >>> else: >>> self._lock_file_fd = fd >>> return None
I tried to implement a working version of any method valid in pathlib.Path or in file systems, but futur changes in any of those will not be taken into account quickly.
-
append
(other: str) → transparentpath.gcsutils.transparentpath.TransparentPath¶
-
bucket
= None¶
-
cast_fast
(path: str) → transparentpath.gcsutils.transparentpath.TransparentPath¶
-
cast_slow
(path: str) → transparentpath.gcsutils.transparentpath.TransparentPath¶
-
cd
(path: Optional[str] = None) → transparentpath.gcsutils.transparentpath.TransparentPath¶ cd-like command
Will collapse double-dots (‘..’), so not compatible with symlinks. If path is absolute (starts with ‘/’ or bucket name or is empty), will return a path starting from root directory if FileSystem is local, from bucket if it is GCS. If passing None or “” , will have the same effect than “/” on GCS, will return the current working directory on local. If passing “.”, will return a path at the location of self. Will raise an error if trying to access a path before root or bucket.
- Parameters
path (str) – The path to cd to. Absolute, or relative to self. (Default value = None)
- Returns
newpath
- Return type
the absolute TransparentPath we cded to.
-
check_multiplicity
() → None¶ Checks if several objects correspond to the path. Raises MultipleExistenceError if so, does nothing if not.
-
cwd
= '/home/pcotte/Documents/git/transparentpath'¶
-
do_nothing
() → None¶ does nothing (you don’t say)
-
exist
()¶ To prevent typo of ‘exist()’ without an -s
-
exists
()¶
-
fs_kind
= ''¶
-
fss
= {}¶
-
get
(loc: Union[str, pathlib.Path, transparentpath.gcsutils.transparentpath.TransparentPath])¶ used to get a remote file to local.
self must be a gcs TransparentPath. If loc is a TransparentPath, it must be local. If it is a pathlib.Path or a str, it will be casted into a local TransparentPath.
-
get_absolute
() → transparentpath.gcsutils.transparentpath.TransparentPath¶ Returns self, since all TransparentPaths are absolute
- Returns
self
- Return type
-
glob
(wildcard: str = '/*', fast: bool = False) → Iterator[transparentpath.gcsutils.transparentpath.TransparentPath]¶ Returns a list of TransparentPath matching the wildcard pattern
By default, the wildcard is ‘/*’. The ‘/’ is important if your path is a dir and you want to glob inside the dir.
- Parameters
wildcard (str) – The wilcard pattern to match, relative to self (Default value = “/*”)
fast (bool) – If True, does not check multiplicity when converting output paths to TransparentPath, significantly speeding up the process (Default value = False)
- Returns
The list of items matching the pattern
- Return type
Iterator[TransparentPath]
-
is_dir
(exist: bool = False) → bool¶ Check if self is a directory
- Parameters
exist (bool) – If False and if using GCS, is_dir() returns True if the directory does not exist and no file with the same path exist. Otherwise, only returns True if the directory really exists (Default value = False).
- Returns
- Return type
bool
-
is_file
() → bool¶ Check if self is a file On GCS, leaves are always files even if created with mkdir.
- Returns
- Return type
bool
-
ls
(path: str = '', fast: bool = False) → Iterator[transparentpath.gcsutils.transparentpath.TransparentPath]¶ ls-like method. Returns an Iterator of absolute TransparentPaths.
- Parameters
path (str) – relative path to ls. (Default value = “”)
fast (bool) – If True, does not check multiplicity when converting output paths to TransparentPath, significantly speeding up the process (Default value = False)
- Returns
- Return type
Iterator[TransparentPath]
-
method_path_concat
= []¶
-
method_without_self_path
= ['end_transaction', 'get_mapper', 'read_block', 'start_transaction', 'connect', 'load_tokens']¶
-
mkbucket
(name: Optional[str] = None) → None¶
-
mkdir
(present: str = 'ignore', **kwargs) → None¶ Creates the directory corresponding to self if does not exist
Remember that leaves are always files on GCS, so can not create a directory on GCS. Thus, the function will have no effect on GCS.
- Parameters
present (str) – What to do if there is already something at self. Can be “raise” or “ignore” (Default value = “ignore”)
kwargs – The kwargs to pass to file system’s mkdir method
- Returns
- Return type
None
-
mv
(other: Union[str, pathlib.Path, transparentpath.gcsutils.transparentpath.TransparentPath])¶ Used to move two files on the same file system.
-
nas_dir
= '/media/SERVEUR'¶
-
open
(*arg, **kwargs) → IO¶ Uses the file system open method
- Parameters
arg – Any args valid for the builtin open() method
kwargs – Any kwargs valid for the builtin open() method
- Returns
The IO buffer object
- Return type
IO
-
project
= None¶
-
put
(dst: Union[str, pathlib.Path, transparentpath.gcsutils.transparentpath.TransparentPath])¶ used to push a local file to the cloud.
self must be a local TransparentPath. If dst is a TransparentPath, it must be on GCS. If it is a pathlib.Path or a str, it will be casted into a GCS TransparentPath, so a gcs file system must have been set up once before.
-
read
(*args, get_obj: bool = False, use_pandas: bool = False, update_cache: bool = True, **kwargs) → Any¶ Method used to read the content of the file located at self
Will raise FileNotFound error if there is no file. Calls a specific method to read self based on the suffix of self.path:
1: .csv : will use pandas’s read_csv
2: .parquet : will use pandas’s read_parquet with pyarrow engine
3: .hdf5 or .h5 : will use h5py.File or pd.HDFStore (if use_pandas = True). Since it does not support remote file systems, the file will be downloaded localy in a tmp file read, then removed.
4: .json : will use open() method to get file content then json.loads to get a dict
5: .xlsx : will use pd.read_excel
6: any other suffix : will return a IO buffer to read from, or the string contained in the file if get_obj is False.
- Parameters
get_obj (bool) – Only relevant for files that are not csv, parquet nor HDF5. If True returns the IO Buffer, else the string contained in the IO Buffer (Default value = False)
use_pandas (bool) – Must pass it as True if hdf5 file was written using HDFStore and not h5py.File (Default value = False)
update_cache (bool) – FileSystem objects do not necessarily follow changes on the system in real time if they were not perfermed by them directly. If update_cache is True, the FileSystem will update its cache before trying to read anything. If False, it won’t, potentially saving some time but this might result in a FileNotFoundError. (Default value = True)
args – any args to pass to the underlying reading method
kwargs – any kwargs to pass to the underlying reading method
- Returns
- Return type
Any
-
read_csv
(update_cache: bool = True, **kwargs) → pandas.core.frame.DataFrame¶
-
read_excel
(update_cache: bool = True, **kwargs) → pandas.core.frame.DataFrame¶
-
read_hdf5
(update_cache: bool = True, use_pandas: bool = False, **kwargs) → Union[h5py._hl.files.File, pandas.io.pytables.HDFStore]¶ Reads a HDF5 file. Must have been created by h5py.File or pd.HDFStore (specify use_pandas=True if so)
Since h5py.File/pd.HDFStore does not support GCS, first copy it in a tmp file.
- Parameters
update_cache (bool) – FileSystem objects do not necessarily follow changes on the system if they were not perfermed by them directly. If update_cache is True, the FileSystem will update its cache before trying to read anything. If False, it won’t, potentially saving some time but this might result in a FileNotFoundError. (Default value = True)
use_pandas (bool) – To use HDFStore instead of h5py.File (Default value = False)
kwargs – The kwargs to pass to h5py.File/pd.HDFStore method
- Returns
Union[h5py.File, pd.HDFStore]
Opened h5py.File/pd.HDFStore
-
read_parquet
(update_cache: bool = True, **kwargs) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series]¶
-
read_text
(*args, get_obj: bool = False, update_cache: bool = True, **kwargs) → Union[str, IO]¶
-
rm
(absent: str = 'raise', ignore_kind: bool = False, **kwargs) → None¶ Removes the object pointed to by self if exists. Remember that leaves are always files on GCS, so rm will remove the path if it is a leaf on GCS
- Parameters
absent (str) – What to do if trying to remove an item that does not exist. Can be ‘raise’ or ‘ignore’ (Default value = ‘raise’)
ignore_kind (bool) – If True, will remove anything pointed by self. If False, will raise an error if self points to a file and ‘recursive’ was specified in kwargs, or if self point to a dir and ‘recursive’ was not specified (Default value = False)
kwargs – The kwargs to pass to file system’s rm method
- Returns
- Return type
None
-
rmbucket
(name: Optional[str] = None) → None¶
-
rmdir
(absent: str = 'raise', ignore_kind: bool = False) → None¶ Removes the directory corresponding to self if exists Remember that leaves are always files on GCS, so rmdir will never remove a leaf on GCS
- Parameters
absent (str) – What to do if trying to remove an item that does not exist. Can be ‘raise’ or ‘ignore’ (Default value = ‘raise’)
ignore_kind (bool) – If True, will remove anything pointed by self. If False, will raise an error if self points to a file and ‘recursive’ was specified in kwargs, or if self point to a dir and ‘recursive’ was not specified (Default value = False)
-
set_fs
(fs: str, bucket: Optional[str] = None, project: Optional[str] = None, nas_dir: Optional[Union[transparentpath.gcsutils.transparentpath.TransparentPath, pathlib.Path, str]] = None) → None¶ Can be called to set the file system, if ‘fs’ keyword was not given at object creation. If not called, default file system is that of TransparentPath. If TransparentPath has no file system yet, creates a local one by default. If the first parameter is ‘lcoal’, the file system is local, and bucket and project are not needed. If the first parameter is ‘gcs’, file system is GCS and bucket and project are needed.
- Parameters
fs (str) – ‘gcs’ will use GCSFileSystem, ‘local’ will use LocalFileSystem
bucket (str) – The bucket name if using gcs (Default value = None)
project (str) – The project name if using gcs (Default value = None)
nas_dir (Union[TransparentPath, Path, str]) – If specified, TransparentPath will delete any occurence of ‘nas_dir’ at the beginning of created paths if fs is gcs (Default value = none).
- Returns
- Return type
None
-
classmethod
set_global_fs
(fs: str, bucket: Optional[str] = None, project: Optional[str] = None, make_main: bool = True, nas_dir: Optional[Union[transparentpath.gcsutils.transparentpath.TransparentPath, pathlib.Path, str]] = None) → None¶ To call before creating any instance to set the file system.
If not called, default file system is local. If the first parameter is ‘local’, the file system is local, and ‘bucket’ and ‘project’ are not needed. If the first parameter is ‘gcs’, file system is GCS and ‘bucket’ and ‘project’ are needed.
- Parameters
fs (str) – ‘gcs’ will use GCSFileSystem, ‘local’ will use LocalFileSystem
bucket (str) – The bucket name if using gcs (Default value = None)
project (str) – The project name if using gcs (Default value = None)
make_main (bool) – If True, any instance created after this call to set_global_fs will be fs. If False, just add the new file system to cls.fss, but do not use it as default file system. (Default value = True)
nas_dir (Union[TransparentPath, Path, str]) – If specified, TransparentPath will delete any occurence of ‘nas_dir’ at the beginning of created paths if fs is gcs (Default value = None).
- Returns
- Return type
None
-
static
set_nas_dir
(obj, nas_dir)¶
-
stat
()¶ Calls file system’s stat method and translates the key to os.stat_result() keys
-
to_csv
(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs) → None¶
-
to_excel
(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs) → None¶
-
to_hdf5
(data: Any = None, set_name: str = None, update_cache: bool = True, use_pandas: bool = False, **kwargs) → Union[None, h5py._hl.files.File, pandas.io.pytables.HDFStore]¶ - Parameters
data (Any) – The data to store. Can be None, in that case an opened file is returned (Default value = None)
set_name (str) – The name of the dataset (Default value = None)
update_cache (bool) – FileSystem objects do not necessarily follow changes on the system if they were not perfermed by them directly. If update_cache is True, the FileSystem will update its cache before trying to read anything. If False, it won’t, potentially saving some time but this might result in a FileExistError. (Default value = True)
use_pandas (bool) – To use pd.HDFStore object instead of h5py.File (Default = False)
**kwargs –
- Returns
- Return type
Union[None, pd.HDFStore, h5py.File]
-
to_json
(data: Any, overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs)¶
-
to_parquet
(data: Union[pandas.core.frame.DataFrame, pandas.core.series.Series], overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, columns_to_string: bool = True, to_dataframe: bool = True, **kwargs) → None¶
-
touch
(present: str = 'ignore', create_parents: bool = True, **kwargs) → None¶ Creates the file corresponding to self if does not exist.
Raises FileExistsError if there already is an object that is not a file at self. Default behavior is to create parent directories of the file if needed. This can be canceled by passing ‘create_parents=False’, but only if not using GCS, since directories are not a thing on GCS.
- Parameters
present (str) – What to do if there is already something at self. Can be “raise” or “ignore” (Default value = “ignore”)
create_parents (bool) – If False, raises an error if parent directories are absent. Else, create them. Always True on GCS. ( Default value = True)
kwargs – The kwargs to pass to file system’s touch method
- Returns
- Return type
None
-
transform_path
(method_name: str, *args: Tuple) → Tuple¶ File system methods take self.path as first argument, so add its absolute path as first argument of args. Some, like ls or glob, are given a relative path to append to self.path, so we need to change the first element of args from args[0] to self.path / args[0]
- Parameters
method_name (str) – The method name, to check whether it needs to append self.path or not
args (Tuple) – The args to pass to the method
- Returns
Either the unchanged args, or args with the first element prepended by self, or args with a new first element (self)
- Return type
Tuple
-
translations
= {'mkdir': <transparentpath.gcsutils.methodtranslator.MultiMethodTranslator object>}¶
-
unlink
(**kwargs)¶ Alias of rm, to match pathlib.Path method
-
unset
= True¶
-
update_cache
()¶ Calls FileSystem’s invalidate_cache() to discard the cache then calls a non-distruptive method (fs.info( bucket)) to update it.
If local, on need to update the chache. Not even sure it needs to be invalidated…
-
with_suffix
(suffix: str) → transparentpath.gcsutils.transparentpath.TransparentPath¶ Returns a new TransparentPath object with a changed suffix Uses the with_suffix method of pathlib.Path
- Parameters
suffix (str) – suffix to use, with the dot (‘.pdf’, ‘.py’, etc ..)
- Returns
- Return type
-
write
(*args, data: Any = None, set_name: str = 'data', use_pandas: bool = False, overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs) → Union[None, pandas.io.pytables.HDFStore, h5py._hl.files.File]¶ Method used to write the content of the file located at self
Calls a specific method to write data based on the suffix of self.path:
1: .csv : will use pandas’s to_csv
2: .parquet : will use pandas’s to_parquet with pyarrow engine
3: .hdf5 or .h5 : will use h5py.File. Since it does not support remote file systems, the file will be created localy in a tmp filen written to, then uploaded and removed localy.
4: .json : will use jsonencoder.JSONEncoder class. Works with DataFrames and np.ndarrays too.
5: .xlsx : will use pandas’s to_excel
5: any other suffix : uses self.open to write to an IO Buffer
- Parameters
data (Any) – The data to write
set_name (str) – Name of the dataset to write. Only relevant if using HDF5 (Default value = ‘data’)
use_pandas (bool) – Must pass it as True if hdf file must be written using HDFStore and not h5py.File
overwrite (bool) – If True, any existing file will be overwritten. Only relevant for csv, hdf5 and parquet files, since others use the ‘open’ method, which args already specify what to do (Default value = True).
present (str) – Indicates what to do if overwrite is False and file is present. Here too, only relevant for csv, hsf5 and parquet files.
update_cache (bool) – FileSystem objects do not necessarily follow changes on the system if they were not perfermed by them directly. If update_cache is True, the FileSystem will update its cache before trying to read anything. If False, it won’t, potentially saving some time but this might result in a FileExistError. (Default value = True)
args – any args to pass to the underlying writting method
kwargs – any kwargs to pass to the underlying reading method
- Returns
- Return type
Union[None, pd.HDFStore, h5py.File]
-
write_bytes
(data: Any, *args, overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs)¶
-
write_stuff
(data: Any, *args, overwrite: bool = True, present: str = 'ignore', update_cache: bool = True, **kwargs)¶
-
transparentpath.gcsutils.transparentpath.
collapse_ddots
(path: Union[pathlib.Path, transparentpath.gcsutils.transparentpath.TransparentPath, str]) → transparentpath.gcsutils.transparentpath.TransparentPath¶ Collapses the double-dots (..) in the path
- Parameters
path (Union[Path, TransparentPath, str]) – The path containing double-dots
- Returns
The collapsed path.
- Return type
-
transparentpath.gcsutils.transparentpath.
get_fs
(gcs: str, project: str, bucket: str) → Union[gcsfs.core.GCSFileSystem, fsspec.implementations.local.LocalFileSystem]¶ Gets the FileSystem object of either gcs or local (Default)
- Parameters
gcs (str) – Returns GCSFileSystem if ‘gcs’’, LocalFilsSystem if ‘local’.
project (str) – project name for GCS
bucket (str) – bucket name for GCS
- Returns
The FileSystem object and the string ‘gcs’ or ‘local’
- Return type
Union[gcsfs.GCSFileSystem, LocalFileSystem]
-
transparentpath.gcsutils.transparentpath.
myisinstance
(obj1: Any, obj2) → bool¶ Will return True when testing whether a TransparentPath is a str (required to use open(TransparentPath())) and False when testing whether a pathlib.Path is a TransparentPath.
-
transparentpath.gcsutils.transparentpath.
myopen
(*args, **kwargs) → IO¶ Method overloading builtins’ ‘open’ method, allowing to open files on GCS using TransparentPath.
-
transparentpath.gcsutils.transparentpath.
mysmallisinstance
(obj1: Any, obj2) → bool¶ Will return True when testing whether a TransparentPath is a str (required to use open(TransparentPath())) or a TransparentPath, and False in every other cases (even pathlib.Path).