AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
As the predominant form of time series visualizations, our approach exploits the semantics of line rasterization to drive the data reduction of high-volume time series data

M4: a visualization-oriented time series data aggregation

PVLDB, no. 10 (2014): 797-808

Cited by: 85|Views250
EI

Abstract

Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Exi...More

Code:

Data:

0
Introduction
  • Enterprises are gathering petabytes of data in public and private clouds, with time series data originating from various sources, including sensor networks [15], smart grids, financial markets, and many more.
  • Data analysts interact with the visualizations and their actions are transformed by the visual data analysis tools into a series of queries that are issued against the relational database, holding the original time series data.
  • When reading data from highvolume data sources, result sets often contain millions of rows.
  • This leads to very high bandwidth consumption between the visualization system and the database
Highlights
  • Enterprises are gathering petabytes of data in public and private clouds, with time series data originating from various sources, including sensor networks [15], smart grids, financial markets, and many more
  • Data analysts interact with the visualizations and their actions are transformed by the visual data analysis tools into a series of queries that are issued against the relational database, holding the original time series data
  • We marked the pixel errors for MinMax, RDP, and piece-wise aggregate approximation (PAA); black represents additional pixels and white the missing pixels compared to the base image
  • We introduced a visualization-driven query rewriting technique that facilitates a data-centric time series dimensionality reduction
  • As the predominant form of time series visualizations, our approach exploits the semantics of line rasterization to drive the data reduction of high-volume time series data
  • We introduced the novel M4 aggregation that selects the min, max, first, and last tuples from the time spans corresponding to the pixel columns of a line chart
Results
  • The authors will compare the data reduction efficiency of the M4 aggregation with state-of-theart line simplification approaches and with commonly used naive approaches, such as averaging, sampling, and rounding.
  • The authors consider three different data sets: the price of a single share on the Frankfurt stock exchange over 6 weeks (700k tuples), 71 minutes from a speed sensor of a soccer ball [22](ball number 8, 7M rows), and one week of sensor data from an electrical power sensor of a semiconductor manufacturing machine [15](sensor MF03, 55M rows).
  • A visual result of M4, MinMax, RDP, and averaging (PAA), applied to 400 seconds (40k tuples) of the machine data set, is shown in Figure 15.
Conclusion
  • The authors introduced a visualization-driven query rewriting technique that facilitates a data-centric time series dimensionality reduction.
  • The authors considered aggregation-based data reduction techniques and described how they integrate with the proposed query-rewriting.
  • The authors introduced the novel M4 aggregation that selects the min, max, first, and last tuples from the time spans corresponding to the pixel columns of a line chart.
  • Using M4 the authors were able to reduce data volumes by two orders of magnitude and latencies by one order of magnitude, while ensuring pixel-perfect line visualizations
Related work
  • In this section, we discuss existing visualization systems and provide an overview of related data reduction techniques, discussing the differences to our approach.

    7.1 Visualization Systems

    Regarding visualization-related data reduction, current stateof-the-art visualization systems and tools fall into three categories. They (A) do not use any data reduction, or (B) compute and send images instead of data to visualization clients, or (C) rely on additional data reduction outside of the database. In Figure 16, we compare these systems to our solution (D), showing how each type of system applies and reduces a relational query Q on a time series relation T . Note that thin arrows indicate low-volume data flow, and thick arrows indicate that raw data needs to be transferred between the system’s components or to the client.

    Visual Analytics Tools. Many visual analytics tools are systems of type A that do not apply any visualizationrelated data reduction, even though they often contain stateof-the-art (relational) data engines [28] that could be used for this purpose. For our visualization needs, we already evaluated four common candidates for such tools: Tableau Desktop 8.1 (tableausoftware.com), SAP Lumira 1.13 (saplumira.com), QlikView 11.20 (clickview.com), and Datawatch Desktop 12.2 (datawatch.com). But none of these tools was able to quickly and easily visualize high-volume time series data, having 1 million rows or more. Since all tools allow working on data from a database or provide a tool-internal data engine, we see a great opportunity for our approach to be implemented in such systems. For brevity, we cannot provide a more detailed evaluation of these tools.
Reference
  • S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it’s done: Interactive queries on very large data. PVLDB, 5(12):1902–1905, 2012.
    Google ScholarLocate open access versionFindings
  • J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1):25–30, 1965.
    Google ScholarLocate open access versionFindings
  • G. Burtini, S. Fazackerley, and R. Lawrence. Time series compression for adaptive chart generation. In CCECE, pages 1–6. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • J. X. Chen and X. Wang. Approximate line scan-conversion and antialiasing. In Computer Graphics Forum, pages 69–78.
    Google ScholarLocate open access versionFindings
  • David Salomon. Data Compression. Springer, 2007.
    Google ScholarFindings
  • D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Journal, 10(2):112–122, 1973.
    Google ScholarLocate open access versionFindings
  • Q. Duan, P. Wang, M. Wu, W. Wang, and S. Huang. Approximate query on historical stream data. In DEXA, pages 128–135.
    Google ScholarLocate open access versionFindings
  • S. G. Eick and A. F. Karr. Visual scalability. Journal of Computational and Graphical Statistics, 11(1):22–43, 2002.
    Google ScholarLocate open access versionFindings
  • P. Esling and C. Agon. Time-series data mining. ACM Computing Surveys, 45(1):12–34, 2012.
    Google ScholarLocate open access versionFindings
  • F. Farber, S. K. Cha, J. Primsch, C. Bornhovd, S. Sigg, and W. Lehner. SAP HANA Database-Data Management for Modern Business Applications. SIGMOD Record, 40(4):45–51, 2012.
    Google ScholarLocate open access versionFindings
  • T. Fu. A review on time series data mining. EAAI Journal, 24(1):164–181, 2011.
    Google ScholarLocate open access versionFindings
  • T. Fu, F. Chung, R. Luk, and C. Ng. Representing financial time series based on data point importance. EAAI Journal, 21(2):277–300, 2008.
    Google ScholarLocate open access versionFindings
  • S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In ICDE, pages 924–935. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line-simplification algorithm. University of British Columbia, Department of Computer Science, 1992.
    Google ScholarLocate open access versionFindings
  • Z. Jerzak, T. Heinze, M. Fehr, D. Grober, R. Hartung, and N. Stojanovic. The DEBS 2012 Grand Challenge. In DEBS, pages 393–398. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • U. Jugel and V. Markl. Interactive visualization of high-velocity event streams. In VLDB PhD Workshop. VLDB Endowment, 2012.
    Google ScholarLocate open access versionFindings
  • D. A. Keim, C. Panse, J. Schneidewind, M. Sips, M. C. Hao, and U. Dayal. Pushing the limit in visual data exploration: Techniques and applications. Lecture notes in artificial intelligence, (2821):37–51, 2003.
    Google ScholarLocate open access versionFindings
  • E. J. Keogh and Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In PAKDD, pages 122–133.
    Google ScholarLocate open access versionFindings
  • A. Kolesnikov. Efficient algorithms for vectorization and polygonal approximation. University of Joensuu, 2003.
    Google ScholarFindings
  • P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. In TVCG, volume 12, pages 1245–1250. IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • W.-Y. Ma, I. Bedner, G. Chang, A. Kuchinsky, and H. Zhang. A framework for adaptive content delivery in heterogeneous network environments. In Proc. SPIE, Multimedia Computing and Networking, volume 3969, pages 86–100. SPIE, 2000.
    Google ScholarLocate open access versionFindings
  • C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 Grand Challenge. In DEBS, pages 289–294. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • P. Przymus, A. Boniewicz, M. Burzanska, and K. Stencel. Recursive query facilities in relational databases: a survey. In DTA and BSBT, pages 89–99.
    Google ScholarLocate open access versionFindings
  • K. Reumann and A. P. M. Witkam. Optimizing curve segmentation in computer graphics. In Proceedings of the International Computing Symposium, pages 467–472. North-Holland Publishing Company, 1974.
    Google ScholarLocate open access versionFindings
  • W. Shi and C. Cheung. Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, 43(1):27–44, 2006.
    Google ScholarLocate open access versionFindings
  • M. Visvalingam and J. Whyatt. Line generalisation by repeated elimination of points. The Cartographic Journal, 30(1):46–51, 1993.
    Google ScholarLocate open access versionFindings
  • Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
    Google ScholarLocate open access versionFindings
  • R. Wesley, M. Eldridge, and P. Terlecki. An analytic data engine for visualization in tableau. In SIGMOD, pages 1185–1194. ACM, 2011.
    Google ScholarLocate open access versionFindings
  • Y. Wu, D. Agrawal, and A. El Abbadi. A comparison of DFT and DWT based similarity search in timeseries databases. In CIKM, pages 488–495. ACM, 2000.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科