Tools & Dataset

UPDATE (18/11/2022): For the most recent version of CICIDS2017 (improved ground-truth labelling and additional features) as well as a fixed version of CSECICIDS2018, please check out our latest work here.

When using the fixed CICFlowMeter tool, the improved regenerated CICIDS2017 dataset and/or our labelling and benchmarking code, please cite our paper:

@inproceedings{engelen2021troubleshooting,
title={Troubleshooting an Intrusion Detection Dataset: the CICIDS2017 Case Study},
author={Engelen, Gints and Rimmer, Vera and Joosen, Wouter},
booktitle={2021 IEEE Security and Privacy Workshops (SPW)},
pages={7--12},
year={2021},
organization={IEEE}
}

CICFlowMeter tool

Our fixed version of the CICFlowMeter tool can be found at https://github.com/GintsEngelen/CICFlowMeter.

Important changes

Note that a flow can still terminate by timing out (the total duration of the flow exceeds X seconds). This has been left unchanged.

Improved CICIDS2017 dataset for flow-based network intrusion detection

The latest version of our improved version of the CICIDS2017 flow-based dataset can be downloaded here.

20/10/2021 UPDATE: Dataset files reuploaded (fixed error in Idle Time features). Note that the fixed version of the CICFlowMeter tool is not affected.
22/10/2021 UPDATE: CICFlowMeter fixed as per this GitHub pull request (Fwd and Bwd Bulk features affected). Dataset CSV files regenerated and reuploaded.
24/11/2021 UPDATE: CICFlowMeter fixed as per this and this GitHub pull request (Down/Up ratio fixed and several features related to flow length affected). Dataset CSV files regenerated and reuploaded.

Important changes

Dataset composition

The final regenerated dataset is composed of the following flows:

Label Effective flow count "Attempted" flow count
Benign 1657069 N/A
FTP-Patator 3973 11
SSH-Patator 2980 8
DoS GoldenEye 7567 80
DoS Hulk 158469 579
DoS SlowHttpTest 1742 3367
DoS Slowloris 4001 1706
Heartbleed 11 0
Web Attack - Brute Force 151 1214
Web Attack - XSS 27 652
Web Attack - SQL Injection 12 0
Infiltration 32 16
Bot 738 1470
Portscan 159023 N/A
DDoS 95123 0

These flow counts (as well as the numbers reported in the paper) were obtained after removing all corrupted entries as well as all entries whose numerical features contained NaN values.

Note that the table in the paper has an error: the correct total amount of Attempted labels is 9103, and the correct amount of Benign flows for both the Intermediate and Final dataset version is 1657069.

Labelling and Benchmarking code

We describe the labelling logic in the Extended Documentation and the ML experiments in the paper. Our labelling and benchmarking code can be found in the GitHub repository.