Module: collector.py

Purpose:

This module provides file collection functionality to the project.

Specifically, this module is called by badsnakes.badsnakes.BadSnakes.main to populate the ‘files list’ which holds all files to be analysed.

The CLI argument PATH is passed into this module, which then traverses either the list of files, the directory or extracts the wheel, in efforts to determine the files which should be analysed. These files are passed back to the caller via the files property.

Platform:

Linux/Windows | Python 3.10+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

Examples:

Collect plain-text files from a given directory:

>>> from badsnakes.libs.collector import Collector

>>> c = Collector(paths=['/path/to/files'])
>>> c.collect()
>>> c.files

[['/path/to/files/project.py',
  '/path/to/files/script.sh']]

Collect plain-text files from a Python wheel:

>>> from badsnakes.libs.collector import Collector

>>> c = Collector(paths=['/path/to/project-0.7.3-py3-none-any.whl'])
>>> c.collect()
>>> c.files

[['/tmp/tmpqnm6yka2/project/module00.py',
  '/tmp/tmpqnm6yka2/project/module01.py',
  '/tmp/tmpqnm6yka2/project/module02.py',
  ...,
  '/tmp/tmpqnm6yka2/project/script.sh',
  '/tmp/tmpqnm6yka2/project/file.txt',
  ...,
  '/tmp/tmpqnm6yka2/project/module08.py',
  '/tmp/tmpqnm6yka2/project/module09.py',
  '/tmp/tmpqnm6yka2/project/module10.py']]
exception badsnakes.libs.collector.MixedTypesError[source]

Bases: Exception

Custom error class raised for mixed PATH type errors.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class badsnakes.libs.collector._CollectorBase(path: str)[source]

Bases: object

Private base class providing file collection functionality.

Parameters:

path (str) – Full path to the module, directory or wheel for collection.

property files: list

Accessor to the list of collected files.

collect()[source]

Collect all files for this class-type.

class badsnakes.libs.collector._CollectorDirectory(path: str)[source]

Bases: _CollectorBase

Collect all files for analysis from the given directory.

This private class is not part of the public interface. Please call the Collector class instead.

collect(path: str = None)[source]

Collect all files for this class-type.

Parameters:

path (str, optional) – Directory path. This argument was originally implemented for use by _CollectorWheel to enable directory traversal using existing logic. Defaults to None.

Logic:
  1. Using glob.glob recursively, all files (including hidden files) are collected.

  2. Next, using filter remove any files which match the exclusion pattern and are not plain-text. See Tip below.

  3. Map os.path.realpath to all files to expand the filepaths.

Tip

The excluded directories are maintained by the list in config.toml under the system.exclude_dirs key.

property files: list

Accessor to the list of collected files.

class badsnakes.libs.collector._CollectorWheel(path: str)[source]

Bases: _CollectorBase

Collect all files for analysis from a Python wheel.

This private class is not part of the public interface. Please call the Collector class instead.

Parameters:

path (str) – Full path to the wheel file.

property tmpdir: TemporaryDirectory

Accessor to the temporary directory object.

collect()[source]

Unzip a wheel file and collect files.

Logic:
  1. Create a temporary directory object (using tempfile).

  2. Using zipfile, unzip the wheel into the temporary directory.

  3. Create an instance of the _CollectorDirectory class and pass the path to the temp directory into the class for file collection.

  4. Store the list of collected files into the _files attribute.

Temp Directory:

The tempfile.TemporaryDirectory object created by this method is not explicitly closed, as the directory must exist for analysing the files. Therefore, the temp directory is removed when the tmpdir object has been destroyed, generally on program completion.

For this reason, the object must be kept ‘alive’ in the class instance, and therefore cannot be a local variable. To keep the object alive, the class’ instance of the temp directory object is appended to a list in the parent class.

property files: list

Accessor to the list of collected files.

class badsnakes.libs.collector.Collector(paths: list)[source]

Bases: object

Primary file collection interface class.

Parameters:

paths (list) – A list of file paths or directories from the argument parser.

Note

On instantiation, all elements in the paths list argument are expanded to their realpath and tested to ensure they exist.

property files: list

Accessor to the list of Python files to be analysed.

Note

This property is a list of lists.

Each outer list represents a wheel or a directory, with each inner list representing the files contained therein.

collect()[source]

Collect files for analysis from the provided paths.

Criteria:

Using the private _identify() method, the file collection is routed to the appropriate file collector based on the type of path provided to the paths argument on instantiation.

  • Directory: All paths in the _paths attribute must be directories.

  • Module: All paths in the _paths attribute must be plain-text files.

  • Wheel: All paths in the _paths attributes must be Python wheels, or zip files.

Only files of the same type (directory, module or wheel) can be collected at the same time, otherwise a ValueError is raised.

Raises:

MixedTypesError – Raised if the _paths attribute contains a mix of the types listed above.

_checks()[source]

Perform pre-collection checks.

Checks:
  • All files exist.

Raises:

FileNotFoundError – Raised if any file in paths does not exist.

_collect_from_directory()[source]

Collect all plain-text files from a directory.

Before this method is called, all paths are tested to ensure they are directories.

_collect_from_files()[source]

Collect all plain-text files.

As the realpath conversion and file exists check have already been performed, this method can simply append the _paths argument to _files, for the caller’s use.

_collect_from_wheel()[source]

Collect all plain-text files from wheels.

Before this method is called, all paths are tested to ensure they are wheels (or .zip files).

_identify() str[source]

Identify the type of collection to take place.

Returns:

One of the following strings are returned, based on the content of the paths argument:

  • Directory: ‘dir’

  • Python modules: ‘modules’

  • Wheel: ‘wheel’

  • Anything else: ‘invalid’

Return type:

str

_isdir() bool[source]

Test if all elements of paths are directories.

Returns:

True if all paths are directories, otherwise False.

Return type:

bool

_istext() bool[source]

Test if all elements of paths are plain-text files.

Returns:

True if all elements of paths are plain-text files, otherwise False.

Return type:

bool

_iswheel() bool[source]

Test if all elements of paths are Python wheels.

Note

A file is tested as a wheel by checking the first four bytes of the file itself, not using the file extension. As such a .zip file will pass this test as well.

Returns:

True if all elements of paths are Python wheels (or ZIP archives), otherwise False.

Return type:

bool