Module: analysers.py

Purpose:

This module provides the AST node analysers for the project.

The analyser classes in this module iterate over the AST-node-specific badsnakes.libs.containers objects which store the details extracted from each AST node. Using the values defined in config.toml the severity flag in each container object is set according to the severity of the node analysis.

Platform:

Linux/Windows | Python 3.10+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

class badsnakes.libs.analysers._BaseAnalyser[source]

Bases: object

Base analyser class.

This class contains base functionality which is designed to be inherited and specialised as required by the node-type-specific classes.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

analyse(nodes: list)[source]

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list[source]

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()[source]

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()[source]

Report that no suspect items were found in the quick search.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This is a generalised quick search method which looks at the node.value attribute only.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

class badsnakes.libs.analysers.ArgumentAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Argument node class.

_dangerous()

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This is a generalised quick search method which looks at the node.value attribute only.

_suspect()

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.AssignmentAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Assignment node class.

_dangerous()

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This is a generalised quick search method which looks at the node.value attribute only.

_suspect()

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.AttributeAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Attribute node class.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This method is specific to the node type, which looks at the node.name and node.value attributes.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This method is specific to the node type, which looks at the node.name and node.value attributes.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This method is specific to the node type, which looks at the node.name and node.value attributes.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.CallAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Call node class.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous (which are not in the whitelist) are written to the _d attribute.

Note

This method is specific to the node type, which looks at the node.name and [node.module].[node.name] attributes.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This method is specific to the node type, which looks at the node.name and (node.module and node.name) attributes.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This method is specific to the node type, which looks at the node.module and node.name attributes.

Both the [.function] name and the [module.function] are tested. Additionally, if a function name is only a single character in length, it is flagged.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.CodeTextAnalyser[source]

Bases: _BaseAnalyser

Analyse the textual code itself for suspect activity.

This analyser class is different from the other analysers in this module as the checks performed are based on the textual code base, rather than on parsed AST node extractions.

Checks:

The checks performed by this class are:

The docstring for each of the linked methods provides a description of the analysis.

As these techniques are often used by threat actors to obfuscate code, these checks should pass for well-written / PEP oriented Python code, and are expected to trigger for suspect or poorly written code.

analyse(module: Module)[source]

Run the code text analyser against this module.

Parameters:

module (module.Module) – A Module class containing the stream of textual code to be analysed, and module.nodeclasses object for further inspection.

Code analysis:

Please refer to the docstring for the CodeTextAnalyser for description of the various analyses carried out.

_get_line_count()[source]

Count the lines of code in the module.

The number of lines is determined by traversing the top level of the AST and storing the node.lineno and node.end_lineno attributes into a set (to prevent duplication), then summing the difference in last-first+1, to get the total number of coded lines in a module.

This approach is used as this skips comments and blank lines and only accounts the actual lines of code.

_no_function_defs() bool[source]

Test if the code has no function definitions.

Returns:

True if the badsnakes.libs.module.Module._NodeFunctionDefs.items list is empty, otherwise False.

Return type:

bool

_no_imports() bool[source]

Test if the code has no imports.

Returns:

True if the badsnakes.libs.module.Module._NodeImports.items list is empty, otherwise False.

Return type:

bool

_set_items()[source]

Set the .items attribute with the results.

Typically, the module object will set the items on build with container object populated by the extractor. However, as code text analysis operates differently, the items are set here using the containers populated on analysis.

_test_high_pct_long_lines()[source]

Test if the code has a high percentage of long lines.

The maximum line length is defined in config.toml along with the percentage used by the test.

Logic:
  • Rewind the code stream to the beginning.

  • Create a filter object containing lines whose length is greater than the maximum allowed, and obtain the length of the result.

  • Divide the result by the number of lines in the module and test if the ratio is greater than or equal to the allowed limit.

  • If the result of the test is True, a badsnakes.libs.containers.CodeText container is appended to the _s suspect attribute with the relevant details.

Note

For an overview of how a module’s lines are counted, refer to the docstring for the _get_line_count() method.

_test_high_pct_semi_colons()[source]

Test for the use of semi-colons relative to line count.

This test is designed to detect semi-colons which are used as statement separators in the code itself, as the ConstantAnalyser class detects semi-colons in strings.

The percentage of semi-colons relative to the number of lines is defined in the config.toml file.

Logic:
  • Rewind the code stream to the beginning.

  • Iterate through the code stream and count the number of semi-colons on each line, and sum the results.

  • Divide the summed result by the number of lines in the module and test if the ratio is greater than or equal to the allowed limit.

  • If the result of the test is True, a badsnakes.libs.containers.CodeText container is appended to the _d dangerous attribute with the relevant details.

Note

For an overview of how a module’s lines are counted, refer to the docstring for the _get_line_count() method.

_test_no_function_defs_no_imports()[source]

Test if the code has no function definitions or imports.

Logic:
_test_single_line_module()[source]

Test if a module only contains a single line of code.

Generally, this is an indication of a runner script, or a long statement (or string) used to obfuscate malicious code.

Logic:
Rationale:

For rationale on how lines of code are counted, refer to the docstring for the _get_line_count() method.

_dangerous()

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This is a generalised quick search method which looks at the node.value attribute only.

_suspect()

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This is a generalised dangerous search method which looks at the node.value attribute only.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.ConstantAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Constant node class.

Note

All function/method docstrings have been excluded from the containers enabling the search for simple strings such as ';' and '()' in constants, without the presence of these strings in the docstring flagging a false-positive.

The docstring extraction is performed by the extractor._extract_docstrings() method.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This method is specific to the node type, which looks at the node.value attribute.

The constants tests do not rely on the quick search, as we are searching for blacklisted strings in constants (strings). Therefore, as each constant is iterated a full blacklist search is occurring.

Granted, it’s not efficient - but we must be thorough.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This method is specific to the node type, which looks at the node.value attribute.

The constants tests do not rely on the quick search, as we are searching for blacklisted strings in constants (strings). Therefore, as each constant is iterated a full blacklist search is occurring.

Granted, it’s not efficient - but we must be thorough.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This is a generalised quick search method which looks at the node.value attribute only.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.FunctionDefAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the FunctionDef node class.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This method is specific to the node type, which looks at the node.name attribute.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This method is specific to the node type, which looks at the node.name attribute.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This method is specific to the node type, which looks at the node.name attribute.

As all function names are tested (specifically in search of single character function names) the quick search is bypassed.

Additionally, any function names containing '0x' in the name are flagged as this suggests an attempt at obfuscation.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.

class badsnakes.libs.analysers.ImportAnalyser[source]

Bases: _BaseAnalyser

Specialised analyser class for the Import node class.

_dangerous()[source]

Search for dangerous code elements for this node type.

Any statements flagged as dangerous are written to the _d attribute.

Note

This method is specific to the node type, which looks at the node.module and (node.module.``node.name``) attributes.

Search through extracted attributes for blacklisted items.

A set of blacklisted items for this node type is compared against a set of extracted attributes for this node type. Further analysis is only carried out if there is an intersection in the two sets.

Note

This method is specific to the node type, which looks at the node.module and (node.module.``node.name``) attributes.

_suspect()[source]

Search for suspect code elements for this node type.

Any statements flagged as suspect are written to the _s attribute.

Note

This method is specific to the node type, which looks at the node.module and (node.module.``node.name``) attributes.

_find_long_strings(items: list, category: Severity = Severity.SUSPECT) list

Search for long strings in the node values.

Parameters:
  • items (list) – A list of badsnakes.libs.containers objects containing the node classes to be analysed.

  • category (Severity, optional) – The Severity class enum to be assigned to flagged strings. Defaults to Severity.SUSPECT.

Returns:

A list of badsnakes container objects containing long string values.

Return type:

list

_no_dangerous_items_found()

Report that no dangerous items were found in the quick search.

_no_suspect_items_found()

Report that no suspect items were found in the quick search.

analyse(nodes: list)

Run the analyser against this module.

For most AST nodes, the analyser first performs a ‘quick search’ to determine if any of the listed keywords from the config file are found in the AST extraction. If no keywords are present, no further analysis is performed. If keywords are present, the extracted statements are analysed further and flagged accordingly.

Parameters:

nodes (list) – A list of badsnakes.libs.containers objects containing the values extracted from the AST nodes.

Implementation note:

The call to _suspect() must be made before the call to _dangerous().

There is a check in some _dangerous() methods which test if the AST node has already been flagged as ‘suspect’. If so, a copy of the node class container is made so as not to re-classify suspect statements as dangerous; a copy of the container will be classified as dangerous. Obviously, this will result is double-reporting, but that’s OK.

property dangerous: list

Public accessor to the AST nodes flagged as dangerous.

property dangerous_longstring: list

Public accessor to long strings flagged as dangerous.

property suspect: list

Public accessor to the AST nodes flagged as suspect.

property suspect_longstring: list

Public accessor to long strings flagged as suspect.