Module: collector.py
- Purpose:
This module provides file collection functionality to the project.
Specifically, this module is called by
badsnakes.badsnakes.BadSnakes.main
to populate the ‘files list’ which holds all files to be analysed.The CLI argument
PATH
is passed into this module, which then traverses either the list of files, the directory or extracts the wheel, in efforts to determine the files which should be analysed. These files are passed back to the caller via thefiles
property.- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- Examples:
Collect plain-text files from a given directory:
>>> from badsnakes.libs.collector import Collector >>> c = Collector(paths=['/path/to/files']) >>> c.collect() >>> c.files [['/path/to/files/project.py', '/path/to/files/script.sh']]
Collect plain-text files from a Python wheel:
>>> from badsnakes.libs.collector import Collector >>> c = Collector(paths=['/path/to/project-0.7.3-py3-none-any.whl']) >>> c.collect() >>> c.files [['/tmp/tmpqnm6yka2/project/module00.py', '/tmp/tmpqnm6yka2/project/module01.py', '/tmp/tmpqnm6yka2/project/module02.py', ..., '/tmp/tmpqnm6yka2/project/script.sh', '/tmp/tmpqnm6yka2/project/file.txt', ..., '/tmp/tmpqnm6yka2/project/module08.py', '/tmp/tmpqnm6yka2/project/module09.py', '/tmp/tmpqnm6yka2/project/module10.py']]
- exception badsnakes.libs.collector.MixedTypesError[source]
Bases:
Exception
Custom error class raised for mixed
PATH
type errors.- add_note()
Exception.add_note(note) – add a note to the exception
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class badsnakes.libs.collector._CollectorBase(path: str)[source]
Bases:
object
Private base class providing file collection functionality.
- Parameters:
path (str) – Full path to the module, directory or wheel for collection.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector._CollectorDirectory(path: str)[source]
Bases:
_CollectorBase
Collect all files for analysis from the given directory.
This private class is not part of the public interface. Please call the
Collector
class instead.- collect(path: str = None)[source]
Collect all files for this class-type.
- Parameters:
path (str, optional) – Directory path. This argument was originally implemented for use by
_CollectorWheel
to enable directory traversal using existing logic. Defaults to None.- Logic:
Using
glob.glob
recursively, all files (including hidden files) are collected.Next, using
filter
remove any files which match the exclusion pattern and are not plain-text. See Tip below.Map
os.path.realpath
to all files to expand the filepaths.
Tip
The excluded directories are maintained by the list in
config.toml
under thesystem.exclude_dirs
key.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector._CollectorWheel(path: str)[source]
Bases:
_CollectorBase
Collect all files for analysis from a Python wheel.
This private class is not part of the public interface. Please call the
Collector
class instead.- Parameters:
path (str) – Full path to the wheel file.
- property tmpdir: TemporaryDirectory
Accessor to the temporary directory object.
- collect()[source]
Unzip a wheel file and collect files.
- Logic:
Create a temporary directory object (using
tempfile
).Using
zipfile
, unzip the wheel into the temporary directory.Create an instance of the
_CollectorDirectory
class and pass the path to the temp directory into the class for file collection.Store the list of collected files into the
_files
attribute.
- Temp Directory:
The
tempfile.TemporaryDirectory
object created by this method is not explicitly closed, as the directory must exist for analysing the files. Therefore, the temp directory is removed when thetmpdir
object has been destroyed, generally on program completion.For this reason, the object must be kept ‘alive’ in the class instance, and therefore cannot be a local variable. To keep the object alive, the class’ instance of the temp directory object is appended to a list in the parent class.
- property files: list
Accessor to the list of collected files.
- class badsnakes.libs.collector.Collector(paths: list)[source]
Bases:
object
Primary file collection interface class.
- Parameters:
paths (list) – A list of file paths or directories from the argument parser.
Note
On instantiation, all elements in the
paths
list argument are expanded to their realpath and tested to ensure they exist.- property files: list
Accessor to the list of Python files to be analysed.
Note
This property is a list of lists.
Each outer list represents a wheel or a directory, with each inner list representing the files contained therein.
- collect()[source]
Collect files for analysis from the provided paths.
- Criteria:
Using the private
_identify()
method, the file collection is routed to the appropriate file collector based on the type of path provided to thepaths
argument on instantiation.Directory: All paths in the
_paths
attribute must be directories.Module: All paths in the
_paths
attribute must be plain-text files.Wheel: All paths in the
_paths
attributes must be Python wheels, or zip files.
Only files of the same type (directory, module or wheel) can be collected at the same time, otherwise a
ValueError
is raised.- Raises:
MixedTypesError – Raised if the
_paths
attribute contains a mix of the types listed above.
- _checks()[source]
Perform pre-collection checks.
- Checks:
All files exist.
- Raises:
FileNotFoundError – Raised if any file in
paths
does not exist.
- _collect_from_directory()[source]
Collect all plain-text files from a directory.
Before this method is called, all paths are tested to ensure they are directories.
- _collect_from_files()[source]
Collect all plain-text files.
As the realpath conversion and file exists check have already been performed, this method can simply append the
_paths
argument to_files
, for the caller’s use.
- _collect_from_wheel()[source]
Collect all plain-text files from wheels.
Before this method is called, all paths are tested to ensure they are wheels (or .zip files).
- _identify() str [source]
Identify the type of collection to take place.
- Returns:
One of the following strings are returned, based on the content of the
paths
argument:Directory: ‘dir’
Python modules: ‘modules’
Wheel: ‘wheel’
Anything else: ‘invalid’
- Return type:
str
- _isdir() bool [source]
Test if all elements of
paths
are directories.- Returns:
True if all paths are directories, otherwise False.
- Return type:
bool
- _istext() bool [source]
Test if all elements of
paths
are plain-text files.- Returns:
True if all elements of
paths
are plain-text files, otherwise False.- Return type:
bool
- _iswheel() bool [source]
Test if all elements of
paths
are Python wheels.Note
A file is tested as a wheel by checking the first four bytes of the file itself, not using the file extension. As such a
.zip
file will pass this test as well.- Returns:
True if all elements of
paths
are Python wheels (or ZIP archives), otherwise False.- Return type:
bool