daisy.data_sources.network_traffic package
Submodules
daisy.data_sources.network_traffic.demo_202303 module
Content used for the Dataset Demo from March 6th 2023.
daisy.data_sources.network_traffic.pyshark_handler module
Implementations of the data source interface that allows the processing and provisioning of pyshark packets, either via file inputs, live capture, or a remote source that generates packets in either fashion.
- class daisy.data_sources.network_traffic.pyshark_handler.LivePysharkDataSource(name: str = '', interfaces: list = 'any', bpf_filter: str = '')[source]
Bases:
DataSource
The wrapper implementation to support and handle pyshark live captures as data sources. Considered infinite in nature, as it allows the generation of pyshark packets, until the capture is stopped. Beware that you might have to use root privileges to obtain data from this data source. If privileges are missing pyshark might not return any data points or warnings.
- class daisy.data_sources.network_traffic.pyshark_handler.PcapDataSource(*file_names: str, try_counter: int = 3, name: str = '')[source]
Bases:
DataSource
The wrapper implementation to support and handle any number of pcap files as data sources. Finite: finishes after all files have been processed. Warning: Not entirely compliant with the data source abstract class: Neither fully thread safe, nor does its __iter__() method shut down after close() has been called. Due to its finite nature acceptable however, as this data source is nearly always only closed once all data points have been retrieved.
daisy.data_sources.network_traffic.pyshark_processor module
Implementation of the data processor for supporting processing steps used for pyshark packets, i.e. a pre-packaged extension of the data processor base class for ease of use.
- class daisy.data_sources.network_traffic.pyshark_processor.PysharkProcessor(name: str = '')[source]
Bases:
DataProcessor
Extension of the data processor base class with pre-built processing steps specifically for pyshark packets.
- classmethod create_simple_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>) Self [source]
Creates a simple pyshark processor selecting specific features from each data point (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.
- Parameters:
name – Name of processor for logging purposes.
f_features – Features to extract from the packets.
nn_aggregator – Aggregator, which should map non-numerical features to
integers / floats.
- daisy.data_sources.network_traffic.pyshark_processor.create_pyshark_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>)[source]
Creates a DataProcessor using functions specifically for pyshark packets, selecting specific features from each data pont (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.
- Parameters:
name – The name for logging purposes
f_features – The features to extract from the packets
nn_aggregator – The aggregator, which should map features to integers
- daisy.data_sources.network_traffic.pyshark_processor.dict_to_json(dictionary: dict) str [source]
Takes a dictionary and returns a json object in form of a string.
- Parameters:
dictionary – The dictionary to convert to json string.
- Returns:
A JSON string from the dictionary.
- daisy.data_sources.network_traffic.pyshark_processor.dict_to_numpy_array(d_point: dict, nn_aggregator: Callable[[str, object], object]) ndarray [source]
Transform the pyshark data point directly into a numpy array without further processing, aggregating any value that is list into a singular value.
- Parameters:
d_point – Data point as dictionary.
nn_aggregator – Aggregator, which maps non-numerical features to integers
or floats. :return: Data point as vector.
- daisy.data_sources.network_traffic.pyshark_processor.packet_to_dict(p: Packet) dict [source]
Takes a single pyshark packet and converts it into a dictionary.
- Parameters:
p – Packet to convert.
- Returns:
Dictionary generated from the packet.
- daisy.data_sources.network_traffic.pyshark_processor.pcap_nn_aggregator(key: str, value: object) int | float [source]
Simple, exemplary value aggregator. Takes a non-numerical (i.e. string) key-value pair and attempts to converted it into an integer / float. This example does not take the key into account, but only checks the types of the value to proceed. Note, that ipv6 are lazily converted to 32 bit (collisions may occur).
- Parameters:
key – Name of pair, which always a string.
value – Arbitrary non-numerical value to be converted.
- Returns:
Converted numerical value.
- Raises:
ValueError – If value cannot be converted.
Module contents
Implementations of the data handler helper interface that allows the processing and provisioning of pyshark packets, either via file inputs, live capture, or a remote source that generates packets in either fashion.
LivePysharkDataSource - DataSource which simply yields captured packets from a
list of interfaces. * PcapDataSource - DataSource which is able to load pcap files sequentially and yield their packets. * PysharkProcessor - Offers additional processing step options to process pyshark packet objects.
There is also a module specialized for traffic of cohda boxes (V2X), that offers additional functionalities:
demo_202303- Event tags for labeling purposes for the March23 dataset.
- class daisy.data_sources.network_traffic.LivePysharkDataSource(name: str = '', interfaces: list = 'any', bpf_filter: str = '')[source]
Bases:
DataSource
The wrapper implementation to support and handle pyshark live captures as data sources. Considered infinite in nature, as it allows the generation of pyshark packets, until the capture is stopped. Beware that you might have to use root privileges to obtain data from this data source. If privileges are missing pyshark might not return any data points or warnings.
- class daisy.data_sources.network_traffic.PcapDataSource(*file_names: str, try_counter: int = 3, name: str = '')[source]
Bases:
DataSource
The wrapper implementation to support and handle any number of pcap files as data sources. Finite: finishes after all files have been processed. Warning: Not entirely compliant with the data source abstract class: Neither fully thread safe, nor does its __iter__() method shut down after close() has been called. Due to its finite nature acceptable however, as this data source is nearly always only closed once all data points have been retrieved.
- class daisy.data_sources.network_traffic.PysharkProcessor(name: str = '')[source]
Bases:
DataProcessor
Extension of the data processor base class with pre-built processing steps specifically for pyshark packets.
- classmethod create_simple_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>) Self [source]
Creates a simple pyshark processor selecting specific features from each data point (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.
- Parameters:
name – Name of processor for logging purposes.
f_features – Features to extract from the packets.
nn_aggregator – Aggregator, which should map non-numerical features to
integers / floats.
- daisy.data_sources.network_traffic.create_pyshark_processor(name: str = '', f_features: list[str, ...] = ('meta.len', 'meta.time', 'meta.time_epoch', 'meta.protocols', 'ip.addr', 'sll.halen', 'sll.pkttype', 'sll.eth', 'sll.hatype', 'sll.unused', 'ipv6.tclass', 'ipv6.flow', 'ipv6.nxt', 'ipv6.src_host', 'ipv6.host', 'ipv6.hlim', 'sll.ltype', 'cohda.Type', 'cohda.Ret', 'cohda.llc.MKxIFMsg.Ret', 'ipv6.addr', 'ipv6.dst', 'ipv6.plen', 'tcp.stream', 'tcp.payload', 'tcp.urgent_pointer', 'tcp.port', 'tcp.options.nop', 'tcp.options.timestamp', 'tcp.flags', 'tcp.window_size_scalefactor', 'tcp.dstport', 'tcp.len', 'tcp.checksum', 'tcp.window_size', 'tcp.srcport', 'tcp.checksum.status', 'tcp.nxtseq', 'tcp.status', 'tcp.analysis.bytes_in_flight', 'tcp.analysis.push_bytes_sent', 'tcp.ack', 'tcp.hdr_len', 'tcp.seq', 'tcp.window_size_value', 'data.data', 'data.len', 'tcp.analysis.acks_frame', 'tcp.analysis.ack_rtt', 'eth.src.addr', 'eth.src.eth.src_resolved', 'eth.src.ig', 'eth.src.src_resolved', 'eth.src.addr_resolved', 'ip.proto', 'ip.dst_host', 'ip.flags', 'ip.len', 'ip.checksum', 'ip.checksum.status', 'ip.version', 'ip.host', 'ip.status', 'ip.id', 'ip.hdr_len', 'ip.ttl'), nn_aggregator: ~typing.Callable[[str, object], object] = <function pcap_nn_aggregator>)[source]
Creates a DataProcessor using functions specifically for pyshark packets, selecting specific features from each data pont (nan if not existing) and transforms them into numpy vectors, ready for to be further processed by detection models.
- Parameters:
name – The name for logging purposes
f_features – The features to extract from the packets
nn_aggregator – The aggregator, which should map features to integers
- daisy.data_sources.network_traffic.demo_202303_label_data_point(client_id: int, d_point: dict) dict [source]
Labels the data points according to the events for the demo 202303.
- Parameters:
client_id – Client ID.
d_point – Data point as dictionary.
- Returns:
Labeled data point.
- daisy.data_sources.network_traffic.dict_to_json(dictionary: dict) str [source]
Takes a dictionary and returns a json object in form of a string.
- Parameters:
dictionary – The dictionary to convert to json string.
- Returns:
A JSON string from the dictionary.
- daisy.data_sources.network_traffic.dict_to_numpy_array(d_point: dict, nn_aggregator: Callable[[str, object], object]) ndarray [source]
Transform the pyshark data point directly into a numpy array without further processing, aggregating any value that is list into a singular value.
- Parameters:
d_point – Data point as dictionary.
nn_aggregator – Aggregator, which maps non-numerical features to integers
or floats. :return: Data point as vector.
- daisy.data_sources.network_traffic.packet_to_dict(p: Packet) dict [source]
Takes a single pyshark packet and converts it into a dictionary.
- Parameters:
p – Packet to convert.
- Returns:
Dictionary generated from the packet.
- daisy.data_sources.network_traffic.pcap_nn_aggregator(key: str, value: object) int | float [source]
Simple, exemplary value aggregator. Takes a non-numerical (i.e. string) key-value pair and attempts to converted it into an integer / float. This example does not take the key into account, but only checks the types of the value to proceed. Note, that ipv6 are lazily converted to 32 bit (collisions may occur).
- Parameters:
key – Name of pair, which always a string.
value – Arbitrary non-numerical value to be converted.
- Returns:
Converted numerical value.
- Raises:
ValueError – If value cannot be converted.