AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To create a random program generator with high bug-finding power, the key problem we solved was the expressive generation of C programs that are free of undefined behavior and independent of unspecified behavior

Finding and understanding bugs in C compilers

PLDI, no. 6 (2011): 283-294

Cited by: 532|Views167
EI

Abstract

Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently g...More

Code:

Data:

Introduction
  • The theory of compilation is well developed, and there are compiler frameworks in which many optimizations have been proved correct.
  • Miscompilations often happen because optimization safety checks are inadequate, static analyses are unsound, or transformations are flawed.
  • These bugs are out of reach for current and future automated program-verification tools because the specifications that need to be checked were never written down in a precise way, if they were written down at all.
  • Other methods for improving compiler quality can succeed.
  • This paper reports the experience in using testing to make C compilers better
Highlights
  • The theory of compilation is well developed, and there are compiler frameworks in which many optimizations have been proved correct
  • Our first experiment was uncontrolled and unstructured: over a three-year period, we opportunistically found and reported bugs in a variety of C compilers
  • Delta debugging [31] automates test-case reduction, but all existing variants that are intended for reducing C programs—such as hierarchical delta debugging [18] and Wilkerson’s implementation [29]—introduce undefined behavior
  • We found and reported hundreds of previously unknown bugs in widely used C compilers, both commercial and open source
  • Most of our reported defects have been fixed, meaning that compiler implementers found them important enough to track down, and 25 of the bugs we reported against GCC were classified as release-blocking. All of this evidence suggests that there is substantial room for improvement in the state of the art for compiler quality assurance
  • To create a random program generator with high bug-finding power, the key problem we solved was the expressive generation of C programs that are free of undefined behavior and independent of unspecified behavior
Methods
  • Design Goals

    Csmith has two main design goals. First and most important, every generated program must be well formed and have a single meaning according to the C standard.
  • The principal side effect of a Csmith-generated program is to print a value summarizing the computation performed by the program.1.
  • This value is a checksum of the program’s non-pointer global variables at the end of the program’s execution.
  • Allow implementation-defined behavior An ideally portable test program would be “strictly conforming” to the C language standard.
  • Using Csmith, the authors can perform differential testing within an equivalence class but not across classes
Results
  • The authors conducted five experiments using Csmith, the random program generator. This section summarizes the findings.

    The authors' first experiment was uncontrolled and unstructured: over a three-year period, the authors opportunistically found and reported bugs in a variety of C compilers.
  • The authors conducted five experiments using Csmith, the random program generator.
  • The authors compiled and ran one million random programs using several years’ worth of versions of GCC and LLVM, to understand how their robustness is evolving over time.
  • As measured by the tests over the programs that Csmith produces, the quality of both compilers is generally improving.
  • The authors evaluated Csmith’s bug-finding power as a function of the size of the generated C programs.
  • The largest number of bugs is found at a surprisingly large program size: about 81 KB.
  • The largest number of bugs is found at a surprisingly large program size: about 81 KB. (§3.3)
Conclusion
  • Are the authors finding bugs that matter? One might suspect that random testing finds bugs that do not matter in practice.
  • Most of the reported defects have been fixed, meaning that compiler implementers found them important enough to track down, and 25 of the bugs the authors reported against GCC were classified as release-blocking
  • All of this evidence suggests that there is substantial room for improvement in the state of the art for compiler quality assurance.
  • The authors' program generator, uses both static analysis and dynamic checks to avoid these hazards
Tables
  • Table1: Summary of Csmith’s strategies for avoiding undefined and unspecified behaviors. When both a code-generation-time and code-execution-time solution are listed, Csmith uses both
  • Table2: Crash and wrong-code bugs found by Csmith that manifest when compiler optimizations are disabled (i.e., when the –O0 command-line option is used)
  • Table3: Augmenting the GCC and LLVM test suites with 10,000 randomly generated programs did not improve code coverage much and otherwise-idle machines, using one CPU on each host. Each generator repeatedly produced programs that we compiled and tested using the same compilers and optimization options that were used for the experiments in Section 3.2
  • Table4: Distribution of bugs across compiler stages. A bug is unclassified either because it has not yet been fixed or the developer who fixed the bug did not indicate what files were changed
  • Table5: Top ten buggy files in GCC
  • Table6: Top ten buggy files in LLVM
Download tables as Excel
Funding
  • We also thank Hans Boehm, Xavier Leroy, Michael Norrish, Bryan Turner, and the GCC and LLVM development teams for their technical assistance in various aspects of our work. This research was primarily supported by an award from DARPA’s Computer Science Study Group
Reference
  • ACE Associated Computer Experts. SuperTest C/C++ compiler test and validation suite. http://www.ace.nl/compiler/supertest.html.
    Findings
  • F. Bellard. TCC: Tiny C compiler, ver. 0.9.25, May 2009. http://bellard.org/tcc/.
    Findings
  • C. L. Biffle. Undefined behavior in Google NaCl, Jan. 2010. http://code.google.com/p/nativeclient/issues/detail?id=245.
    Findings
  • A. S. Boujarwah and K. Saleh. Compiler test case generation methods: a survey and assessment. Information and Software Technology, 39(9):617–625, 1997.
    Google ScholarLocate open access versionFindings
  • C. J. Burgess and M. Saidi. The automatic generation of test cases for optimizing Fortran compilers. Information and Software Technology, 38(2):111–119, 1996.
    Google ScholarLocate open access versionFindings
  • E. Eide and J. Regehr. Volatiles are miscompiled, and what to do about it. In Proc. EMSOFT, pages 255–264, Oct. 2008.
    Google ScholarLocate open access versionFindings
  • X. Feng and A. J. Hu. Cutpoints for formal equivalence verification of embedded software. In Proc. EMSOFT, pages 307–316, Sept. 2005.
    Google ScholarLocate open access versionFindings
  • P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Proc. PLDI, pages 206–215, June 2008.
    Google ScholarLocate open access versionFindings
  • R. Hamlet. Random testing. In J. Marciniak, editor, Encyclopedia of Software Engineering. Wiley, second edition, 2001.
    Google ScholarLocate open access versionFindings
  • K. V. Hanford. Automatic generation of test cases. IBM Systems Journal, 9(4):242–257, Dec. 1970.
    Google ScholarLocate open access versionFindings
  • International Organization for Standardization. ISO/IEC 9899:TC2: Programming Languages—C, May 2005. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf.
    Findings
  • G. Klein et al. seL4: Formal verification of an OS kernel. In Proc. SOSP, pages 207–220, Oct. 2009.
    Google ScholarLocate open access versionFindings
  • J. C. Knight and N. G. Leveson. An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans. Software Eng., 12(1):96–109, Jan. 1986.
    Google ScholarLocate open access versionFindings
  • X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, July 2009.
    Google ScholarLocate open access versionFindings
  • C. Lindig. Random testing of C calling conventions. In Proc. AADEBUG, pages 3–12, Sept. 2005.
    Google ScholarLocate open access versionFindings
  • W. M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100–107, Dec. 1998.
    Google ScholarLocate open access versionFindings
  • B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Commun. ACM, 33(12):32–44, Dec. 1990.
    Google ScholarLocate open access versionFindings
  • G. Misherghi and Z. Su. HDD: Hierarchical delta debugging. In Proc. ICSE, pages 142–151, May 2006.
    Google ScholarLocate open access versionFindings
  • Perennial, Inc. ACVS ANSI/ISO/FIPS-160 C validation suite, ver. 4.5, Jan. 1998. http://www.peren.com/pages/acvs_set.htm.
    Findings
  • Plum Hall, Inc. The Plum Hall validation suite for C. http://www.plumhall.com/stec.html.
    Findings
  • P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3):366–375, 1972.
    Google ScholarLocate open access versionFindings
  • R. L. Sauder. A general test data generator for COBOL. In AFIPS Joint Computer Conferences, pages 317–323, May 1962.
    Google ScholarLocate open access versionFindings
  • F. Sheridan. Practical testing of a C99 compiler using output comparison. Software—Practice and Experience, 37(14):1475–1488, Nov. 2007.
    Google ScholarLocate open access versionFindings
  • J. Souyris, V. Wiels, D. Delmas, and H. Delseny. Formal verification of avionics software products. In Proc. FM, pages 532–546, Nov. 2009.
    Google ScholarLocate open access versionFindings
  • S. Summit. comp.lang.c frequently asked questions. http://c-faq.com/.
    Findings
  • Z. Tatlock and S. Lerner. Bringing extensibility to verified compilers. In Proc. PLDI, pages 111–121, June 2010.
    Google ScholarLocate open access versionFindings
  • B. Turner. Random Program Generator, Jan. 2007. http://sites.google.com/site/brturn2/randomcprogramgenerator.
    Findings
  • B. White et al. An integrated experimental environment for distributed systems and networks. In Proc. OSDI, pages 255–270, Dec. 2002.
    Google ScholarLocate open access versionFindings
  • D. S. Wilkerson. Delta ver. 2006.08.03, Aug. 2006. http://delta.tigris.org/.
    Findings
  • M. Wolfe. How compilers and tools differ for embedded systems. In Proc. CASES, Sept. 2005. Keynote address. http://www.pgroup.com/lit/articles/pgi_article_cases.pdf.
    Locate open access versionFindings
  • A. Zeller and R. Hildebrandt. Simplifying and isolating failureinducing input. IEEE Trans. Software Eng., 28(2):183–200, Feb. 2002.
    Google ScholarLocate open access versionFindings
  • C. Zhao et al. Automated test program generation for an industrial optimizing compiler. In Proc. ICSE Workshop on Automation of Software Test, pages 36–43, May 2009.
    Google ScholarLocate open access versionFindings
Author
Xuejun Yang
Xuejun Yang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科