AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
To create a random program generator with high bug-finding power, the key problem we solved was the expressive generation of C programs that are free of undefined behavior and independent of unspecified behavior
Finding and understanding bugs in C compilers
PLDI, no. 6 (2011): 283-294
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently g...More
PPT (Upload PPT)
- The theory of compilation is well developed, and there are compiler frameworks in which many optimizations have been proved correct.
- Miscompilations often happen because optimization safety checks are inadequate, static analyses are unsound, or transformations are flawed.
- These bugs are out of reach for current and future automated program-verification tools because the specifications that need to be checked were never written down in a precise way, if they were written down at all.
- Other methods for improving compiler quality can succeed.
- This paper reports the experience in using testing to make C compilers better
- The theory of compilation is well developed, and there are compiler frameworks in which many optimizations have been proved correct
- Our first experiment was uncontrolled and unstructured: over a three-year period, we opportunistically found and reported bugs in a variety of C compilers
- Delta debugging  automates test-case reduction, but all existing variants that are intended for reducing C programs—such as hierarchical delta debugging  and Wilkerson’s implementation —introduce undefined behavior
- We found and reported hundreds of previously unknown bugs in widely used C compilers, both commercial and open source
- Most of our reported defects have been fixed, meaning that compiler implementers found them important enough to track down, and 25 of the bugs we reported against GCC were classified as release-blocking. All of this evidence suggests that there is substantial room for improvement in the state of the art for compiler quality assurance
- To create a random program generator with high bug-finding power, the key problem we solved was the expressive generation of C programs that are free of undefined behavior and independent of unspecified behavior
- Design Goals
Csmith has two main design goals. First and most important, every generated program must be well formed and have a single meaning according to the C standard.
- The principal side effect of a Csmith-generated program is to print a value summarizing the computation performed by the program.1.
- This value is a checksum of the program’s non-pointer global variables at the end of the program’s execution.
- Allow implementation-defined behavior An ideally portable test program would be “strictly conforming” to the C language standard.
- Using Csmith, the authors can perform differential testing within an equivalence class but not across classes
- The authors conducted five experiments using Csmith, the random program generator. This section summarizes the findings.
The authors' first experiment was uncontrolled and unstructured: over a three-year period, the authors opportunistically found and reported bugs in a variety of C compilers.
- The authors conducted five experiments using Csmith, the random program generator.
- The authors compiled and ran one million random programs using several years’ worth of versions of GCC and LLVM, to understand how their robustness is evolving over time.
- As measured by the tests over the programs that Csmith produces, the quality of both compilers is generally improving.
- The authors evaluated Csmith’s bug-finding power as a function of the size of the generated C programs.
- The largest number of bugs is found at a surprisingly large program size: about 81 KB.
- The largest number of bugs is found at a surprisingly large program size: about 81 KB. (§3.3)
- Are the authors finding bugs that matter? One might suspect that random testing finds bugs that do not matter in practice.
- Most of the reported defects have been fixed, meaning that compiler implementers found them important enough to track down, and 25 of the bugs the authors reported against GCC were classified as release-blocking
- All of this evidence suggests that there is substantial room for improvement in the state of the art for compiler quality assurance.
- The authors' program generator, uses both static analysis and dynamic checks to avoid these hazards
- Table1: Summary of Csmith’s strategies for avoiding undefined and unspecified behaviors. When both a code-generation-time and code-execution-time solution are listed, Csmith uses both
- Table2: Crash and wrong-code bugs found by Csmith that manifest when compiler optimizations are disabled (i.e., when the –O0 command-line option is used)
- Table3: Augmenting the GCC and LLVM test suites with 10,000 randomly generated programs did not improve code coverage much and otherwise-idle machines, using one CPU on each host. Each generator repeatedly produced programs that we compiled and tested using the same compilers and optimization options that were used for the experiments in Section 3.2
- Table4: Distribution of bugs across compiler stages. A bug is unclassified either because it has not yet been fixed or the developer who fixed the bug did not indicate what files were changed
- Table5: Top ten buggy files in GCC
- Table6: Top ten buggy files in LLVM
- We also thank Hans Boehm, Xavier Leroy, Michael Norrish, Bryan Turner, and the GCC and LLVM development teams for their technical assistance in various aspects of our work. This research was primarily supported by an award from DARPA’s Computer Science Study Group
- ACE Associated Computer Experts. SuperTest C/C++ compiler test and validation suite. http://www.ace.nl/compiler/supertest.html.
- F. Bellard. TCC: Tiny C compiler, ver. 0.9.25, May 2009. http://bellard.org/tcc/.
- C. L. Biffle. Undefined behavior in Google NaCl, Jan. 2010. http://code.google.com/p/nativeclient/issues/detail?id=245.
- A. S. Boujarwah and K. Saleh. Compiler test case generation methods: a survey and assessment. Information and Software Technology, 39(9):617–625, 1997.
- C. J. Burgess and M. Saidi. The automatic generation of test cases for optimizing Fortran compilers. Information and Software Technology, 38(2):111–119, 1996.
- E. Eide and J. Regehr. Volatiles are miscompiled, and what to do about it. In Proc. EMSOFT, pages 255–264, Oct. 2008.
- X. Feng and A. J. Hu. Cutpoints for formal equivalence verification of embedded software. In Proc. EMSOFT, pages 307–316, Sept. 2005.
- P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Proc. PLDI, pages 206–215, June 2008.
- R. Hamlet. Random testing. In J. Marciniak, editor, Encyclopedia of Software Engineering. Wiley, second edition, 2001.
- K. V. Hanford. Automatic generation of test cases. IBM Systems Journal, 9(4):242–257, Dec. 1970.
- International Organization for Standardization. ISO/IEC 9899:TC2: Programming Languages—C, May 2005. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf.
- G. Klein et al. seL4: Formal verification of an OS kernel. In Proc. SOSP, pages 207–220, Oct. 2009.
- J. C. Knight and N. G. Leveson. An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans. Software Eng., 12(1):96–109, Jan. 1986.
- X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, July 2009.
- C. Lindig. Random testing of C calling conventions. In Proc. AADEBUG, pages 3–12, Sept. 2005.
- W. M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100–107, Dec. 1998.
- B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Commun. ACM, 33(12):32–44, Dec. 1990.
- G. Misherghi and Z. Su. HDD: Hierarchical delta debugging. In Proc. ICSE, pages 142–151, May 2006.
- Perennial, Inc. ACVS ANSI/ISO/FIPS-160 C validation suite, ver. 4.5, Jan. 1998. http://www.peren.com/pages/acvs_set.htm.
- Plum Hall, Inc. The Plum Hall validation suite for C. http://www.plumhall.com/stec.html.
- P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3):366–375, 1972.
- R. L. Sauder. A general test data generator for COBOL. In AFIPS Joint Computer Conferences, pages 317–323, May 1962.
- F. Sheridan. Practical testing of a C99 compiler using output comparison. Software—Practice and Experience, 37(14):1475–1488, Nov. 2007.
- J. Souyris, V. Wiels, D. Delmas, and H. Delseny. Formal verification of avionics software products. In Proc. FM, pages 532–546, Nov. 2009.
- S. Summit. comp.lang.c frequently asked questions. http://c-faq.com/.
- Z. Tatlock and S. Lerner. Bringing extensibility to verified compilers. In Proc. PLDI, pages 111–121, June 2010.
- B. Turner. Random Program Generator, Jan. 2007. http://sites.google.com/site/brturn2/randomcprogramgenerator.
- B. White et al. An integrated experimental environment for distributed systems and networks. In Proc. OSDI, pages 255–270, Dec. 2002.
- D. S. Wilkerson. Delta ver. 2006.08.03, Aug. 2006. http://delta.tigris.org/.
- M. Wolfe. How compilers and tools differ for embedded systems. In Proc. CASES, Sept. 2005. Keynote address. http://www.pgroup.com/lit/articles/pgi_article_cases.pdf.
- A. Zeller and R. Hildebrandt. Simplifying and isolating failureinducing input. IEEE Trans. Software Eng., 28(2):183–200, Feb. 2002.
- C. Zhao et al. Automated test program generation for an industrial optimizing compiler. In Proc. ICSE Workshop on Automation of Software Test, pages 36–43, May 2009.