Performance Bugs

Performance bugs are program source code that is unnecessarily inefficient and that affects the perceived software quality similarly to functional bugs. However, in comparison to functional bugs there are (as of 2019) fewer empirical studies on performance bugs and they cover significantly fewer subjects. As a consequnce, while many approaches for detecting and localizing a variety of performance bugs have been developed in recent years, their efficacy has usually been evaluated on a relatively small set of bug instances. Therefore, we investigated more than 700 commits across 13 C/C++ projects to provide a dataset of real-world performance bugs, grouped by projects here and by patterns here. The patterns provide an abstract semantic classification how performance bugs are fixed. A detailed discussion of this classification can be found in our paper that will shortly appear in the conference proceedings of ISSRE 2019.

The dataset on these pages can be used 1) to assess the alignment of the current state of the art in performance bug detection and localization with performance bugs that get fixed in practice, 2) as a larger corpus to evaluate performance bug detection and localization approaches against, and 3) as the basis for further research, such as the simulation of performance bugs via code mutation.

More details can be found in our paper, which we will link from here as soon as it gets published.

Projects

The 13 projects investigated for our study are:

Project	Repository
NetworkManager	https://github.com/NetworkManager/NetworkManager
pulseaudio	https://github.com/pulseaudio/pulseaudio
grep	http://git.savannah.gnu.org/cgit/grep.git/
rsyslog	https://github.com/rsyslog/rsyslog
lvm2	https://github.com/lvmteam/lvm2
llvm	https://github.com/llvm-mirror/llvm¹
git	https://github.com/git/git
clang	https://github.com/llvm-mirror/clang¹
gecko-dev²	https://github.com/mozilla/gecko-dev
openssl	https://github.com/openssl/openssl
systemd	https://github.com/systemd/systemd
libgcrypt	https://github.com/gpg/libgcrypt
linux	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

1: The investigation was started when llvm community has not been migrated from svn to github and uses an unofficial mirror on github. The up-to-date official repository is at https://github.com/llvm/llvm-project.

2: Firefox

The total number of commits matched by each keyword (as discussed in our paper) is:

Project	fast	latenc	optimiz	accelerate	efficient	contention	performance	speed up	slow	Total
NetworkManager	38	0	121	0	20	0	15	3	21	209
pulseaudio	6	75	16	0	3	0	6	1	6	106
grep	38	1	31	0	5	0	59	16	20	123
rsyslog	37	0	51	0	2	3	25	3	21	136
lvm2	35	2	19	1	12	3	23	0	39	123
llvm	1017	222	2389	3	399	3	625	39	328	4567
git	433	14	284	2	127	9	287	36	126	1107
clang	195	1	457	2	72	3	148	5	48	860
gecko-dev	946	109	1230	87	261	11	1231	81	633	4329
openssl	24	2	68	0	15	1	68	3	13	169
systemd	69	12	132	0	16	0	31	7	89	327
libgcrypt	70	0	50	17	4	0	24	5	11	145
linux	5936	1782	3746	249	1411	565	4844	351	3392	18975

Threats to validity

The categorization of performance bugs according to semantic pattern has been derived during a continuous work period spanning 3 months. The concept of each category has also constantly undergone changes during these 3 months. While we hope that the stability of our categorization towards the end of that period remains far beyond that, we cannot rule out risks of error, both in our system of patterns and the classification of bugs according to these patterns. If you encounter any issues with the provided data set, such as categorization or other errors, please feel free to contact us. Similarly, if you use our data set please let us know so that we can refer back to your work. We also welcome community contributions to our data set of any kind. If you would like to add more performance bugs, report on or investigate the reproducibility of our results, or add information on the effects the bugs in our data set impose, please get in touch with us. Our contact data can be found here.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search