daisy.data_sources.network_traffic package

Submodules

daisy.data_sources.network_traffic.demo_202303 module

Content used for the Dataset Demo from March 6th 2023.

daisy.data_sources.network_traffic.demo_202303.demo_202303_label_data_point(client_id: int, d_point: dict) dict[source]

Labels the data points according to the events for the demo 202303.

Parameters:
  • client_id – Client ID.

  • d_point – Data point as dictionary.

Returns:

Labeled data point.

daisy.data_sources.network_traffic.pyshark_handler module

Implementations of the data source interface that allows the processing and provisioning of pyshark packets, either via file inputs, live capture, or a remote source that generates packets in either fashion.

class daisy.data_sources.network_traffic.pyshark_handler.LivePysharkDataSource(name: str = '', interfaces: list = 'any', bpf_filter: str = '')[source]

Bases: DataSource

The wrapper implementation to support and handle pyshark live captures as data sources. Considered infinite in nature, as it allows the generation of pyshark packets, until the capture is stopped. Beware that you might have to use root privileges to obtain data from this data source. If privileges are missing pyshark might not return any data points or warnings.

close()[source]

Stops the live caption, essentially disabling the generator. Note that the generator might block if one tries to retrieve an object from it after that point.

open()[source]

Starts the pyshark live caption, initializing the wrapped generator.

class daisy.data_sources.network_traffic.pyshark_handler.PcapDataSource(*file_names: str, try_counter: int = 3, name: str = '')[source]

Bases: DataSource

The wrapper implementation to support and handle any number of pcap files as data sources. Finite: finishes after all files have been processed. Warning: Not entirely compliant with the data source abstract class: Neither fully thread safe, nor does its __iter__() method shut down after close() has been called. Due to its finite nature acceptable however, as this data source is nearly always only closed once all data points have been retrieved.

close()[source]

Closes any file of the pcap file data source.

open()[source]

Opens and resets the pcap file data source to the very beginning of the file list.

daisy.data_sources.network_traffic.pyshark_processor module

Implementation of the data processor for supporting processing steps used for pyshark packets, i.e. a pre-packaged extension of the data processor base class for ease of use.

class daisy.data_sources.network_traffic.pyshark_processor.PysharkProcessor(name: str = '')[source]

Bases: DataProcessor

Extension of the data processor base class with pre-built processing steps specifically for pyshark packets.

classmethod create_simple_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>) Self[source]

Creates a simple pyshark processor selecting specific features from each data point (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.

Parameters:
  • name – Name of processor for logging purposes.

  • f_features – Features to extract from the packets.

  • nn_aggregator – Aggregator, which should map non-numerical features to

integers / floats.

packet_to_dict() Self[source]

Adds a function to the processor that takes a data point which is a pyshark packet and converts it into a dictionary.

daisy.data_sources.network_traffic.pyshark_processor.create_pyshark_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>)[source]

Creates a DataProcessor using functions specifically for pyshark packets, selecting specific features from each data pont (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.

Parameters:
  • name – The name for logging purposes

  • f_features – The features to extract from the packets

  • nn_aggregator – The aggregator, which should map features to integers

daisy.data_sources.network_traffic.pyshark_processor.dict_to_json(dictionary: dict) str[source]

Takes a dictionary and returns a json object in form of a string.

Parameters:

dictionary – The dictionary to convert to json string.

Returns:

A JSON string from the dictionary.

daisy.data_sources.network_traffic.pyshark_processor.dict_to_numpy_array(d_point: dict, nn_aggregator: Callable[[str, object], object]) ndarray[source]

Transform the pyshark data point directly into a numpy array without further processing, aggregating any value that is list into a singular value.

Parameters:
  • d_point – Data point as dictionary.

  • nn_aggregator – Aggregator, which maps non-numerical features to integers

or floats. :return: Data point as vector.

daisy.data_sources.network_traffic.pyshark_processor.packet_to_dict(p: Packet) dict[source]

Takes a single pyshark packet and converts it into a dictionary.

Parameters:

p – Packet to convert.

Returns:

Dictionary generated from the packet.

daisy.data_sources.network_traffic.pyshark_processor.pcap_nn_aggregator(key: str, value: object) int | float[source]

Simple, exemplary value aggregator. Takes a non-numerical (i.e. string) key-value pair and attempts to converted it into an integer / float. This example does not take the key into account, but only checks the types of the value to proceed. Note, that ipv6 are lazily converted to 32 bit (collisions may occur).

Parameters:
  • key – Name of pair, which always a string.

  • value – Arbitrary non-numerical value to be converted.

Returns:

Converted numerical value.

Raises:

ValueError – If value cannot be converted.

Module contents

Implementations of the data handler helper interface that allows the processing and provisioning of pyshark packets, either via file inputs, live capture, or a remote source that generates packets in either fashion.

  • LivePysharkDataSource - DataSource which simply yields captured packets from a

list of interfaces. * PcapDataSource - DataSource which is able to load pcap files sequentially and yield their packets. * PysharkProcessor - Offers additional processing step options to process pyshark packet objects.

There is also a module specialized for traffic of cohda boxes (V2X), that offers additional functionalities:

  • demo_202303- Event tags for labeling purposes for the March23 dataset.

class daisy.data_sources.network_traffic.LivePysharkDataSource(name: str = '', interfaces: list = 'any', bpf_filter: str = '')[source]

Bases: DataSource

The wrapper implementation to support and handle pyshark live captures as data sources. Considered infinite in nature, as it allows the generation of pyshark packets, until the capture is stopped. Beware that you might have to use root privileges to obtain data from this data source. If privileges are missing pyshark might not return any data points or warnings.

close()[source]

Stops the live caption, essentially disabling the generator. Note that the generator might block if one tries to retrieve an object from it after that point.

open()[source]

Starts the pyshark live caption, initializing the wrapped generator.

class daisy.data_sources.network_traffic.PcapDataSource(*file_names: str, try_counter: int = 3, name: str = '')[source]

Bases: DataSource

The wrapper implementation to support and handle any number of pcap files as data sources. Finite: finishes after all files have been processed. Warning: Not entirely compliant with the data source abstract class: Neither fully thread safe, nor does its __iter__() method shut down after close() has been called. Due to its finite nature acceptable however, as this data source is nearly always only closed once all data points have been retrieved.

close()[source]

Closes any file of the pcap file data source.

open()[source]

Opens and resets the pcap file data source to the very beginning of the file list.

class daisy.data_sources.network_traffic.PysharkProcessor(name: str = '')[source]

Bases: DataProcessor

Extension of the data processor base class with pre-built processing steps specifically for pyshark packets.

classmethod create_simple_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>) Self[source]

Creates a simple pyshark processor selecting specific features from each data point (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.

Parameters:
  • name – Name of processor for logging purposes.

  • f_features – Features to extract from the packets.

  • nn_aggregator – Aggregator, which should map non-numerical features to

integers / floats.

packet_to_dict() Self[source]

Adds a function to the processor that takes a data point which is a pyshark packet and converts it into a dictionary.

daisy.data_sources.network_traffic.create_pyshark_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>)[source]

Creates a DataProcessor using functions specifically for pyshark packets, selecting specific features from each data pont (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.

Parameters:
  • name – The name for logging purposes

  • f_features – The features to extract from the packets

  • nn_aggregator – The aggregator, which should map features to integers

daisy.data_sources.network_traffic.demo_202303_label_data_point(client_id: int, d_point: dict) dict[source]

Labels the data points according to the events for the demo 202303.

Parameters:
  • client_id – Client ID.

  • d_point – Data point as dictionary.

Returns:

Labeled data point.

daisy.data_sources.network_traffic.dict_to_json(dictionary: dict) str[source]

Takes a dictionary and returns a json object in form of a string.

Parameters:

dictionary – The dictionary to convert to json string.

Returns:

A JSON string from the dictionary.

daisy.data_sources.network_traffic.dict_to_numpy_array(d_point: dict, nn_aggregator: Callable[[str, object], object]) ndarray[source]

Transform the pyshark data point directly into a numpy array without further processing, aggregating any value that is list into a singular value.

Parameters:
  • d_point – Data point as dictionary.

  • nn_aggregator – Aggregator, which maps non-numerical features to integers

or floats. :return: Data point as vector.

daisy.data_sources.network_traffic.packet_to_dict(p: Packet) dict[source]

Takes a single pyshark packet and converts it into a dictionary.

Parameters:

p – Packet to convert.

Returns:

Dictionary generated from the packet.

daisy.data_sources.network_traffic.pcap_nn_aggregator(key: str, value: object) int | float[source]

Simple, exemplary value aggregator. Takes a non-numerical (i.e. string) key-value pair and attempts to converted it into an integer / float. This example does not take the key into account, but only checks the types of the value to proceed. Note, that ipv6 are lazily converted to 32 bit (collisions may occur).

Parameters:
  • key – Name of pair, which always a string.

  • value – Arbitrary non-numerical value to be converted.

Returns:

Converted numerical value.

Raises:

ValueError – If value cannot be converted.