Back to the Roots Assessing Mining Techniques for Java Vulnerability-Contributing Commits /

Context : Vulnerability-contributing commits (VCCs) are code changes that introduce vulnerabilities. Mining historical VCCs relies on SZZ-based algorithms that trace from known vulnerability-fixing commits. Objective : Although these techniques have been used, e.g., to train just-in-time vulnerabili...

Teljes leírás

Elmentve itt :
Bibliográfiai részletek
Szerzők: Hinrichs Torge
Iannone Emanuele
Aladics Tamás
Hegedűs Péter
De Lucia Andrea
Palomba Fabio
Scandariato Riccardo
Dokumentumtípus: Cikk
Megjelent: 2025
Sorozat:ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY 1
Tárgyszavak:
doi:10.1145/3769105

mtmt:36362772
Online Access:http://publicatio.bibl.u-szeged.hu/38520
LEADER 02373nab a2200277 i 4500
001 publ38520
005 20251216083840.0
008 251216s2025 hu o 000 eng d
022 |a 1049-331X 
024 7 |a 10.1145/3769105  |2 doi 
024 7 |a 36362772  |2 mtmt 
040 |a SZTE Publicatio Repozitórium  |b hun 
041 |a eng 
100 1 |a Hinrichs Torge 
245 1 0 |a Back to the Roots  |h [elektronikus dokumentum] :  |b Assessing Mining Techniques for Java Vulnerability-Contributing Commits /  |c  Hinrichs Torge 
260 |c 2025 
490 0 |a ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY  |v 1 
520 3 |a Context : Vulnerability-contributing commits (VCCs) are code changes that introduce vulnerabilities. Mining historical VCCs relies on SZZ-based algorithms that trace from known vulnerability-fixing commits. Objective : Although these techniques have been used, e.g., to train just-in-time vulnerability predictors, they lack systematic benchmarking to evaluate their precision, recall, and error sources. Method : We empirically assessed 12 VCC mining techniques in Java repositories using two benchmark datasets (one from the literature and one newly curated). We also explored combinations of techniques, through intersections, voting schemes, and machine learning, to improve performance. Results : Individual techniques achieved at most 0.60 precision but up to 0.89 recall. The precision rose to 0.75 when the outputs were combined with the logical AND, at the expense of recall. Machine learning ensembles reached 0.80 precision with a better precision–recall balance. Performance varied significantly by dataset. Analyzing “fixing commits” showed that certain fix types (e.g., filtering or sanitization) affect retrieval accuracy, and failure patterns highlighted weaknesses when fixes involve external data handling. Conclusion : Such results help software security researchers select the most suitable mining technique for their studies and understand new ways to design more accurate solutions. 
650 4 |a Számítás- és információtudomány 
700 0 1 |a Iannone Emanuele  |e aut 
700 0 1 |a Aladics Tamás  |e aut 
700 0 1 |a Hegedűs Péter  |e aut 
700 0 2 |a De Lucia Andrea  |e aut 
700 0 2 |a Palomba Fabio  |e aut 
700 0 2 |a Scandariato Riccardo  |e aut 
856 4 0 |u http://publicatio.bibl.u-szeged.hu/38520/1/j6.pdf  |z Dokumentum-elérés