This framework generates fuzz targets for real-world C
/C++/Java/Python
projects with
various Large Language Models (LLM) and benchmarks them via the
OSS-Fuzz
platform.
More details available in AI-Powered Fuzzing: Breaking the Bug Hunting Barrier:
Current supported models are:
- Vertex AI code-bison
- Vertex AI code-bison-32k
- Gemini Pro
- Gemini Ultra
- Gemini Experimental
- Gemini 1.5
- OpenAI GPT-3.5-turbo
- OpenAI GPT-4
- OpenAI GPT-4o
- OpenAI GPT-4o-mini
- OpenAI GPT-4-turbo
- OpenAI GPT-3.5-turbo (Azure)
- OpenAI GPT-4 (Azure)
- OpenAI GPT-4o (Azure)
Generated fuzz targets are evaluated with four metrics against the most up-to-date data from production environment:
- Compilability
- Runtime crashes
- Runtime coverage
- Runtime line coverage diff against existing human-written fuzz targets in
OSS-Fuzz
.
Here is a sample experiment result from 2024 Jan 31. The experiment included 1300+ benchmarks from 297 open-source projects.
Overall, this framework manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
Note that these reports are not public as they may contain undisclosed vulnerabilities.
Check our detailed usage guide for instructions on how to run this framework and generate reports based on the results.
Interested in research or open-source community collaborations? Please feel free to create an issue or email us: [email protected].
So far, we have reported 30 new bugs/vulnerabilities found by automatically generated targets built by this framework:
Project | Bug | LLM | Prompt Builder | Target oracle |
---|---|---|---|---|
cJSON |
OOB read | Vertex AI | Default | Far reach, low coverage |
libplist |
OOB read | Vertex AI | Default | Far reach, low coverage |
hunspell |
OOB read | Vertex AI | default | Far reach, low coverage |
zstd |
OOB write | Vertex AI | default | Far reach, low coverage |
gdbm |
Stack buffer underflow | Vertex AI | default | Far reach, low coverage |
hoextdown |
Use of uninitialised memory | Vertex AI | default | Far reach, low coverage |
pjsip |
OOB read | Vertex AI | Default | Low coverage with fuzz keyword + easy params far reach |
pjsip |
OOB read | Vertex AI | Default | Low coverage with fuzz keyword + easy params far reach |
gpac |
OOB read | Vertex AI | Default | Low coverage with fuzz keyword + easy params far reach |
gpac |
OOB read/write | Vertex AI | Default | All |
gpac |
OOB read | Vertex AI | Default | All |
gpac |
OOB read | Vertex AI | Default | All |
sqlite3 |
OOB read | Vertex AI | Default | All |
htslib |
OOB read | Vertex AI | Default | All |
libical |
OOB read | Vertex AI | Default | All |
croaring |
OOB read | Vertex AI | Test-to-harness | All |
openssl |
CVE-2024-9143 - OOB read/write | Vertex AI | Default | All |
liblouis ] |
Use of uninitialised memory | Vertex AI | Test-to-harness | Test identifier |
libucl |
OOB read | Vertex AI | Default | Low coverage with fuzz keyword + easy params far reach |
openbabel |
Use after free | Vertex AI | Default | Low coverage with fuzz keyword + easy params far reach |
libyang |
OOB read | Vertex AI | Default | All |
openbabel |
OOB read | Vertex AI | Default | All |
exiv2 |
OOB read | Vertex AI | Default | All |
Undisclosed | Java RCE (pending maintainer triage) | Vertex AI | Default | Far reach, low coverage |
Undisclosed | Regexp DoS (pending maintainer triage) | Vertex AI | Default | Far reach, low coverage |
Undisclosed | OOB read | Vertex AI | Default | All |
Undisclosed | OOB write | Vertex AI | Default | All |
Undisclosed | OOB read | Vertex AI | Default | All |
Undisclosed | OOB read | Vertex AI | Default | All |
Undisclosed | Use after free | Vertex AI | Agent prompt | All |
These bugs could only have been discovered with newly generated targets. They were not reachable with existing OSS-Fuzz targets.
Project | Total coverage gain | Total relative gain | OSS-Fuzz-gen total covered lines | OSS-Fuzz-gen new covered lines | Existing covered lines | Total project lines |
---|---|---|---|---|---|---|
phmap | 98.42% | 205.75% | 1601 | 1181 | 574 | 1120 |
usbguard | 97.62% | 26.04% | 24550 | 5463 | 20979 | 3564 |
onednn | 96.67% | 7057.14% | 5434 | 5434 | 77 | 210 |
avahi | 82.06% | 155.90% | 3358 | 2814 | 1805 | 3046 |
pugixml | 72.98% | 194.95% | 9015 | 6646 | 3409 | 7662 |
librdkafka | 66.88% | 845.57% | 5019 | 4490 | 531 | 1169 |
casync | 66.75% | 903.23% | 1171 | 1120 | 124 | 1678 |
tomlplusplus | 61.06% | 331.10% | 4755 | 3652 | 1103 | 5981 |
astc-encoder | 59.35% | 177.88% | 2726 | 1745 | 981 | 2940 |
mruby | 48.56% | 0.00% | 34493 | 34493 | 0 | 71038 |
arduinojson | 42.10% | 85.80% | 3344 | 1800 | 2098 | 4276 |
json | 41.13% | 66.51% | 5051 | 3339 | 5020 | 8119 |
double-conversion | 40.40% | 88.12% | 1663 | 779 | 884 | 1928 |
tinyobjloader | 38.26% | 77.01% | 1157 | 717 | 931 | 1874 |
glog | 38.18% | 58.69% | 895 | 331 | 564 | 867 |
cppitertools | 35.78% | 45.07% | 253 | 151 | 335 | 422 |
eigen | 35.38% | 190.70% | 2643 | 1947 | 1021 | 5503 |
glaze | 34.55% | 30.06% | 2920 | 2416 | 8036 | 6993 |
rapidjson | 31.83% | 148.07% | 1585 | 958 | 647 | 3010 |
libunwind | 30.58% | 83.25% | 2899 | 1342 | 1612 | 4388 |
openh264 | 30.07% | 50.14% | 6607 | 5751 | 11470 | 19123 |
* "Total project lines" measures the source code of the project-under-test compiled and linked by the preexisting human-written fuzz targets from OSS-Fuzz.
* "Total coverage gain" is calculated using a denominator of the "Total project lines". "Total relative gain" is the increase in coverage compared to the old number of covered lines.
* Additional code from the project-under-test maybe included when compiling the new fuzz targets and result in high percentage gains.
Please click on the 'Cite this repository' button located on the right-hand side of this GitHub page for citation details.