AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
This paper presents the first study attempting client-side temporal API mining with static analysis beyond trivial alias analysis and history abstractions
Static specification mining using automata-based abstractions
IEEE Transactions on Software Engineering, no. 5 (2008): 651-666
We present a novel approach to client-side mining of temporal API specifications based on static analysis. Specifically, we present an interprocedural analysis over a combined domain that abstracts both aliasing and event sequences for individual objects. The analysis uses a new family of automata-based abstractions to represent unbounded...More
PPT (Upload PPT)
- Specifications of program behavior play a central role in many software engineering technologies.
- Most such research addresses dynamic analysis, inferring specifications from observed behavior of representative program runs.
- Dynamic analysis requires someone to build, deploy, and set up an appropriate environment for a program run.
- These tasks, difficult and time-consuming for a human, lie far beyond the reach of today’s automated technologies
- There is only one thing more painful than learning from experience and that is not learning from experience. – Archibald MacLeish
Specifications of program behavior play a central role in many software engineering technologies
- The amount of code available for inspection vastly exceeds the amount of code amenable to automated dynamic analysis
- We present a parameterized framework for history abstractions, based on intuition regarding the structure of API specifications
- We have implemented a prototype of our analysis based on the WALA analysis framework  and the typestate analysis framework of 
- When using Total merge, we only show results for Past history abstraction; results for Future would be similar under this aggressive merge criterion
- This paper presents the first study attempting client-side temporal API mining with static analysis beyond trivial alias analysis and history abstractions
- The naive approach outputs the union of all the automata in the author as the API specification, without any noise reduction.
- This approach treats all traces uniformly, regardless of their frequency.
- A better straightforward statistical approach uses a weighted union of the input automata to identify and eliminate infrequent behaviors.
- The authors have implemented a prototype of the analysis based on the WALA analysis framework  and the typestate analysis framework of .
- The authors' analysis builds on a general Reps-Horwitz-Sagiv (RHS) IFDS tabulation solver implementation .
- The authors extended the RHS solver to support dynamic changes and merges in the set of dataflow facts.
- Base/Future/Ext APF/Past/Total APF/Past/Ext APF/Future/Ext API Auth Channel ChannelMgr Cipher Connection
- Some APIs appear in several separate benchmarks, while others appear in several programs contained within the same benchstates edges avg. degree states edges avg. degree states edges avg. degree states edges avg. degree states edges avg. degree states edges avg. degree
- The authors' experiments indicate that having both a precise-enough heap abstraction and a precise-enough history abstraction are required to be able to mine a reasonable specification.
Without such abstractions, the collected abstract histories might deteriorate to a point in which no summarization algorithm will recover the lost information.
- The specification mined for the Photo API using the Base heap abstraction has a single state.
- This means that the specification does not contain any temporal information on the ordering of events.
- It is possible to employ the analysis with a predetermined timeout
- In such cases, the specification obtained using the analysis will not over-approximate code base behavior, but may still help understand some behaviors.
- The authors plan to conduct further research into modular analysis techniques and improved summarization heuristics, to move closer to practical application of this technology
- Table1: Results of mining the running example with varying heap abstractions and merge algorithms
- Table2: Benchmarks
- Table3: Characteristics of our mined specifications with varying data collectors. For every mined specification DFA, we show the number of states, edges, and the density of the DFA
- Dynamic Analysis. When it is feasible to run a program with adequate coverage, dynamic analysis represents the most attractive option for specification mining, since dynamic analysis does not suffer from the difficulties inherent to abstraction.
Cook and Wolf  consider the general problem of extracting an FSM model from an event trace, and reduce the problem to the well-known grammar inference  problem. Cook and Wolf discuss algorithmic, statistical, and hybrid approaches, and present an excellent overview of the approaches and fundamental challenges. This work considers mining automata from uninterpreted event traces, attaching no semantic meaning to events.
Ammons et al  infer temporal and data dependence specifications based on dynamic trace data. This work applies sophisticated probabalistic learning techniques to boil traces down to collections of finite automata which characterize the behavior.
- R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interface specifications for Java classes. SIGPLAN Not., 40(1):98–109, 2005.
- G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In POPL ’02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 4–16, New York, NY, USA, 200ACM Press.
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, Univ. of Copenhagen, May 1994. (DIKU report 94/19).
- D. Chase, M. Wegman, and F. Zadeck. Analysis of pointers and structures. In Proc. ACM Conf. on Programming Language Design and Implementation, pages 296–310, New York, NY, 1990. ACM Press.
- J. E. Cook and A. L. Wolf. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol., 7(3):215–249, 1998.
- P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL ’77: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 238–252, New York, NY, USA, 1977. ACM Press.
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: a general approach to inferring errors in systems code. In SOSP ’01: Proceedings of the eighteenth ACM symposium on Operating systems principles, pages 57–72, New York, NY, USA, 2001. ACM Press.
- M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2):99–123, Feb. 2001.
- S. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. Effective typestate verification in the presence of aliasing. In ISSTA ’06: Proceedings of the 2006 international symposium on Software testing and analysis, pages 133–144, New York, NY, USA, 2006. ACM Press.
- Gallery of mined specification. http://tinyurl.com/23qct8 or http://docs.google.com/View?docid=ddhtqgv6 10hbczjd.
- E. M. Gold. Language identification in the limit. Information and Control, 10:447–474, 1967.
- S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. May 2002.
- V. B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In Proceedings of the 13th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE-13), pages 296–305, Sept. 2005.
- D. Mandelin, L. Xu, R. Bodik, and D. Kimelman. Jungloid mining: helping to navigate the API jungle. In PLDI ’05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 48–61, New York, NY, USA, 2005. ACM Press.
- M. G. Nanda, C. Grothoff, and S. Chandra. Deriving object typestates in the presence of inter-object references. In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pages 77–96, New York, NY, USA, 2005. ACM Press.
- M. Pistoia, D. Reller, D. Gupta, M. Nagnur, and A. K. Ramani. Java 2 Network Security. Prentice Hall PTR, Upper Saddle River, NJ, USA, second edition, August 1999.
- T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In Proc. ACM Symp. on Principles of Programming Languages, pages 49–61, 1995.
- A. Salcianu and M. Rinard. Purity and side effect analysis for Java programs. In VMCAI’05: Proceedings of the 6th International Conference on Verification, Model Checking, and Abstract Interpretation, 2005.
- WALA: The T. J. Watson Libraries for Analysis. http://wala.sourceforge.net.
- W. Weimer and G. Necula. Mining temporal specifications for error detection. In TACAS, 2005.
- J. Whaley, M. C. Martin, and M. S. Lam. Automatic extraction of object-oriented component interfaces. In Proceedings of the International Symposium on Software Testing and Analysis, pages 218–228. ACM Press, July 2002.
- J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: mining temporal API rules from imperfect traces. In ICSE ’06: Proceeding of the 28th international conference on Software engineering, pages 282–291, New York, NY, USA, 2006. ACM Press.
- G. Yorsh, E. Yahav, and S. Chandra. Symbolic summarization with applications to typestate verification. Technical report, Tel Aviv University, 2007. www.cs.tau.ac.il/∼gretay.