datasets

Open datasets released by ByteDefend Cyber Lab and the Ariel Cyber Innovation Center for cybersecurity research.

Open datasets released by ByteDefend Cyber Lab & the Ariel Cyber Innovation Center for advancing cybersecurity research. We encourage the community to build on these resources.

🔑

API Traffic Research Dataset Framework (ATRDF)

Ariel Cyber Innovation Center <> Cisco Competition • Released 2023 • Shmuel Lavian & Ariel University ACIC

A multi-level benchmark dataset of HTTP API traffic for binary and multi-class classification of malicious vs. benign API requests. Contains four progressively harder datasets covering diverse attack types, endpoint complexities, and parameter structures. Includes training, test, and held-out validation splits with a baseline Jupyter notebook.

4 Datasets
Progressive difficulty
7 Attack Types
SQLi, XSS, RCE, Log4J…
HTTP API Traffic
Real-world requests
Baseline Notebook
Python / scikit-learn

Datasets Overview

Dataset Description Task Baseline F1
Dataset 1 Basic API traffic, fewest attacks & endpoints Binary (Benign / Malicious) 0.968
Dataset 2 More attacks, ~2× endpoints, higher randomization Binary 0.978
Dataset 3 Complex parameters, authentic traffic patterns Binary + Attack Type (7 classes) 0.940 / 0.940
Dataset 4 Advanced: API redirection, deeper access, more types Binary + Attack Type (7 classes) 0.837 / 0.866

Attack Types Covered

SQL Injection Directory Traversal Remote Code Execution (RCE) Cookie Injection Cross-Site Scripting (XSS) Log4J Log Forging

Citation

@misc{Lavian_ATRDF_2023,
  author = {Lavian, Shmuel and {Ariel University, Ariel Cyber Innovation Center (ACIC)}},
  title = ,
  url = {https://github.com/ArielCyber/Cisco_Ariel_Uni_API_security_challenge},
  year = {2023}
}

Also used in: Aharon, Dubin, Dvir & Hajaj — Computers & Security, 2025