Object Intersection Captures on Interactive Apps to Drive a Crowd-sourced Replay-based Compiler Optimization

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION(2022)

引用 0|浏览35
暂无评分
摘要
Traditional offline optimization frameworks rely on representative hardware, software, and inputs to compare different optimizations on. With application-specific optimization for mobile systems though, the idea of a representative testbench is unrealistic while creating offline inputs is non-trivial. Online approaches partially overcome these problems but they might expose users to suboptimal or even erroneous code. Therefore, our mobile code is poorly optimized, resulting in wasted performance and energy and user frustration. In this article, we introduce a novel compiler optimization approach designed for mobile applications. It requires no developer effort, it tunes applications for the user's device and usage patterns, and it has no negative impact on the user experience. It is based on a lightweight capture and replay mechanism. Our previous work [46] captures the state accessed by any targeted code region during its online stage. By repurposing existing OS capabilities, it keeps the overhead low. In its offline stage, it replays the code region but under different optimization decisions to enable sound comparisons of different optimizations under realistic conditions. In this article, we propose a technique that further decreases the storage sizes without any additional overhead. It captures only the intersection of reachable objects and accessed heap pages. We compare this with another new approach that has minimal runtime overheads at the cost of higher capture sizes. Coupled with a search heuristic for the compiler optimization space, our capture and replay mechanism allows us to discover optimization decisions that improve performance without testing these decisions directly on the user. Finally, with crowd-sourcing we split this offline evaluation effort between several users, allowing us to discover better code in less time. We implemented a prototype system in Android based on LLVM combined with a genetic search engine and a crowd-sourcing architecture. We evaluated it on both benchmarks and real Android applications. Online captures are infrequent and introduce similar to 5 ms or 15 ms on average, depending on the approach used. For this negligible effect on user experience, we achieve speedups of 44% on average over the Android compiler and 35% over LLVM -O3. Our collaborative search is just 5% short of that speedup, which is impressive given the acceleration gains. The user with the highest workload concluded the search 7x faster.
更多
查看译文
关键词
Iterative compilation,capture,replay,interactive
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要