intelmq.bots.parsers.shadowserver package

Submodules

intelmq.bots.parsers.shadowserver.config module

Copyright (c)2016-2018 by Bundesamt für Sicherheit in der Informationstechnik (BSI)

Software engineering by BSI & Intevation GmbH

This is a configuration File for the shadowserver parser

In the following, intelmqkey are arbitrary keys from intelmq’s harmonization and shadowkey is a column name from shadowserver’s data.

Every bot-type is defined by a dictionary with three values: - required_fields: A list of tuples containing intelmq’s field name, field

name from data and an optional conversion function. Errors are raised, if the field does not exists in data.

  • optional_fields: Same format as above, but does not raise errors if the field does not exist. If there’s no mapping to an intelmq field, you can set the intelmqkey to extra. and the field will be added to the extra field using the original field name. See section below for possible tuple-values.

  • constant_fields: A dictionary with a static mapping of field name to data, e.g. to set classifications or protocols.

The tuples can be of following format:

  • (‘intelmqkey’, ‘shadowkey’), the data from the column shadowkey will be saved in the event’s field intelmqkey. Logically equivalent to: event[`*intelmqkey*] = row[*shadowkey*]`.

  • (‘intelmqkey’, ‘shadowkey’, conversion_function), the given function will be used to convert and/or validate the data. Logically equivalent to: event[`*intelmqkey*] = conversion_function(row[*shadowkey*)]`.

  • (‘intelmqkey’, ‘shadowkey’, conversion_function, True), the function gets two parameters here, the second one is the full row (as dictionary). Logically equivalent to: event[`*intelmqkey*] = conversion_function(row[*shadowkey*, row)]`.

  • (‘extra.’, ‘shadowkey’, conversion_function), the data will be added to extra in this case, the resulting name is extra.[shadowkey]. The conversion_function is optional. Logically equivalent to: event[extra.`*intelmqkey*] = conversion_function(row[*shadowkey*)]`.

  • (False, ‘shadowkey’), the column will be ignored.

Mappings are “straight forward” each mapping is a dict of at least three keys:

  1. required fields: the parser will work this keys first.

  2. optional fields: the parser will try to interpret these values. if it fails, the value is written to the extra field

  3. constant fields: Some information about an event may not be explicitly stated in a feed because it is implicit in the nature of the feed. For instance a feed that is exclusively about HTTP may not have a field for the protocol because it’s always TCP.

The first value is the IntelMQ key, the second value is the row in the shadowserver csv.

Reference material:
TODOs:

There is a bunch of inline todos. Most of them show lines of code were the mapping has to be validated

@ Check-Implementation Tags for parser configs. dmth thinks it’s not sufficient. Some CERT-Expertise is needed to check if the mappings are correct.

feed_idx is not complete.

intelmq.bots.parsers.shadowserver.config.add_UTC_to_timestamp(value)
intelmq.bots.parsers.shadowserver.config.convert_bool(value)
intelmq.bots.parsers.shadowserver.config.convert_date(value)
intelmq.bots.parsers.shadowserver.config.convert_date_utc(value)

Parses a datetime from the value and assumes UTC by appending the TZ to the value. Not the same as add_UTC_to_timestamp, as convert_date_utc also does the sanitiation

intelmq.bots.parsers.shadowserver.config.convert_float(value)

Returns an float or None for empty strings.

intelmq.bots.parsers.shadowserver.config.convert_http_host_and_url(value, row)

URLs are split into hostname and path. The column names differ in reports. Compromised-Website: http_host, url Drone: cc_dns, url IPv6-Sinkhole-HTTP-Drone: http_host, http_url Microsoft-Sinkhole: http_host, url Sinkhole-HTTP-Drone: http_host, url With some reports, url/http_url holds only the path, with others the full HTTP request.

intelmq.bots.parsers.shadowserver.config.convert_int(value)

Returns an int or None for empty strings.

intelmq.bots.parsers.shadowserver.config.get_feed_by_feedname(given_feedname)
intelmq.bots.parsers.shadowserver.config.get_feed_by_filename(given_filename)
intelmq.bots.parsers.shadowserver.config.invalidate_zero(value)

Returns an int or None for empty strings or ‘0’.

intelmq.bots.parsers.shadowserver.config.scan_exchange_identifier(field)
intelmq.bots.parsers.shadowserver.config.scan_exchange_taxonomy(field)
intelmq.bots.parsers.shadowserver.config.scan_exchange_type(field)
intelmq.bots.parsers.shadowserver.config.set_tor_node(value)
intelmq.bots.parsers.shadowserver.config.validate_fqdn(value)
intelmq.bots.parsers.shadowserver.config.validate_ip(value)

Remove “invalid” IP.

intelmq.bots.parsers.shadowserver.config.validate_network(value)
intelmq.bots.parsers.shadowserver.config.validate_to_none(value)

intelmq.bots.parsers.shadowserver.parser module

Copyright (C) 2016 by Bundesamt für Sicherheit in der Informationstechnik Software engineering by Intevation GmbH

This is an “all-in-one” parser for a lot of shadowserver feeds. It depends on the configuration in the file “config.py” which holds information on how to treat certain shadowserverfeeds. It uses the report field extra.file_name to determine which config should apply, so this field is required.

This parser will only work with csv files named like 2019-01-01-scan_http-country-geo.csv.

Optional parameters:
overwrite: Bool, default False. If True, it keeps the report’s

feed.name and does not override it with the corresponding feed name.

feedname: The fixed feed name to use if it should not automatically detected.

intelmq.bots.parsers.shadowserver.parser.BOT

alias of intelmq.bots.parsers.shadowserver.parser.ShadowserverParserBot

class intelmq.bots.parsers.shadowserver.parser.ShadowserverParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: intelmq.lib.bot.ParserBot

csv_params = {'dialect': 'unix'}
feedname = None
init()
mode = None
parse(report)

A generator yielding the single elements of the data.

Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).

Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:

parse = ParserBot.parse_csv
You should do that for recovering lines too.

recover_line = ParserBot.recover_line_csv

parse_line(row, report)

A generator which can yield one or more messages contained in line.

Report has the full message, thus you can access some metadata. Override for your use.

recover_line(line: str)

Converts dictionaries to csv. self.csv_fieldnames must be list of fields.

sparser_config = None

Module contents