AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
As the predominant form of time series visualizations, our approach exploits the semantics of line rasterization to drive the data reduction of high-volume time series data
M4: a visualization-oriented time series data aggregation
PVLDB, no. 10 (2014): 797-808
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Exi...More
PPT (Upload PPT)
- Enterprises are gathering petabytes of data in public and private clouds, with time series data originating from various sources, including sensor networks , smart grids, financial markets, and many more.
- Data analysts interact with the visualizations and their actions are transformed by the visual data analysis tools into a series of queries that are issued against the relational database, holding the original time series data.
- When reading data from highvolume data sources, result sets often contain millions of rows.
- This leads to very high bandwidth consumption between the visualization system and the database
- Enterprises are gathering petabytes of data in public and private clouds, with time series data originating from various sources, including sensor networks , smart grids, financial markets, and many more
- Data analysts interact with the visualizations and their actions are transformed by the visual data analysis tools into a series of queries that are issued against the relational database, holding the original time series data
- We marked the pixel errors for MinMax, RDP, and piece-wise aggregate approximation (PAA); black represents additional pixels and white the missing pixels compared to the base image
- We introduced a visualization-driven query rewriting technique that facilitates a data-centric time series dimensionality reduction
- As the predominant form of time series visualizations, our approach exploits the semantics of line rasterization to drive the data reduction of high-volume time series data
- We introduced the novel M4 aggregation that selects the min, max, first, and last tuples from the time spans corresponding to the pixel columns of a line chart
- The authors will compare the data reduction efficiency of the M4 aggregation with state-of-theart line simplification approaches and with commonly used naive approaches, such as averaging, sampling, and rounding.
- The authors consider three different data sets: the price of a single share on the Frankfurt stock exchange over 6 weeks (700k tuples), 71 minutes from a speed sensor of a soccer ball (ball number 8, 7M rows), and one week of sensor data from an electrical power sensor of a semiconductor manufacturing machine (sensor MF03, 55M rows).
- A visual result of M4, MinMax, RDP, and averaging (PAA), applied to 400 seconds (40k tuples) of the machine data set, is shown in Figure 15.
- The authors introduced a visualization-driven query rewriting technique that facilitates a data-centric time series dimensionality reduction.
- The authors considered aggregation-based data reduction techniques and described how they integrate with the proposed query-rewriting.
- The authors introduced the novel M4 aggregation that selects the min, max, first, and last tuples from the time spans corresponding to the pixel columns of a line chart.
- Using M4 the authors were able to reduce data volumes by two orders of magnitude and latencies by one order of magnitude, while ensuring pixel-perfect line visualizations
- In this section, we discuss existing visualization systems and provide an overview of related data reduction techniques, discussing the differences to our approach.
7.1 Visualization Systems
Regarding visualization-related data reduction, current stateof-the-art visualization systems and tools fall into three categories. They (A) do not use any data reduction, or (B) compute and send images instead of data to visualization clients, or (C) rely on additional data reduction outside of the database. In Figure 16, we compare these systems to our solution (D), showing how each type of system applies and reduces a relational query Q on a time series relation T . Note that thin arrows indicate low-volume data flow, and thick arrows indicate that raw data needs to be transferred between the system’s components or to the client.
Visual Analytics Tools. Many visual analytics tools are systems of type A that do not apply any visualizationrelated data reduction, even though they often contain stateof-the-art (relational) data engines  that could be used for this purpose. For our visualization needs, we already evaluated four common candidates for such tools: Tableau Desktop 8.1 (tableausoftware.com), SAP Lumira 1.13 (saplumira.com), QlikView 11.20 (clickview.com), and Datawatch Desktop 12.2 (datawatch.com). But none of these tools was able to quickly and easily visualize high-volume time series data, having 1 million rows or more. Since all tools allow working on data from a database or provide a tool-internal data engine, we see a great opportunity for our approach to be implemented in such systems. For brevity, we cannot provide a more detailed evaluation of these tools.
- S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it’s done: Interactive queries on very large data. PVLDB, 5(12):1902–1905, 2012.
- J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1):25–30, 1965.
- G. Burtini, S. Fazackerley, and R. Lawrence. Time series compression for adaptive chart generation. In CCECE, pages 1–6. IEEE, 2013.
- J. X. Chen and X. Wang. Approximate line scan-conversion and antialiasing. In Computer Graphics Forum, pages 69–78.
- David Salomon. Data Compression. Springer, 2007.
- D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica Journal, 10(2):112–122, 1973.
- Q. Duan, P. Wang, M. Wu, W. Wang, and S. Huang. Approximate query on historical stream data. In DEXA, pages 128–135.
- S. G. Eick and A. F. Karr. Visual scalability. Journal of Computational and Graphical Statistics, 11(1):22–43, 2002.
- P. Esling and C. Agon. Time-series data mining. ACM Computing Surveys, 45(1):12–34, 2012.
- F. Farber, S. K. Cha, J. Primsch, C. Bornhovd, S. Sigg, and W. Lehner. SAP HANA Database-Data Management for Modern Business Applications. SIGMOD Record, 40(4):45–51, 2012.
- T. Fu. A review on time series data mining. EAAI Journal, 24(1):164–181, 2011.
- T. Fu, F. Chung, R. Luk, and C. Ng. Representing financial time series based on data point importance. EAAI Journal, 21(2):277–300, 2008.
- S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. In ICDE, pages 924–935. IEEE, 2010.
- J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line-simplification algorithm. University of British Columbia, Department of Computer Science, 1992.
- Z. Jerzak, T. Heinze, M. Fehr, D. Grober, R. Hartung, and N. Stojanovic. The DEBS 2012 Grand Challenge. In DEBS, pages 393–398. ACM, 2012.
- U. Jugel and V. Markl. Interactive visualization of high-velocity event streams. In VLDB PhD Workshop. VLDB Endowment, 2012.
- D. A. Keim, C. Panse, J. Schneidewind, M. Sips, M. C. Hao, and U. Dayal. Pushing the limit in visual data exploration: Techniques and applications. Lecture notes in artificial intelligence, (2821):37–51, 2003.
- E. J. Keogh and Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In PAKDD, pages 122–133.
- A. Kolesnikov. Efficient algorithms for vectorization and polygonal approximation. University of Joensuu, 2003.
- P. Lindstrom and M. Isenburg. Fast and efficient compression of floating-point data. In TVCG, volume 12, pages 1245–1250. IEEE, 2006.
- W.-Y. Ma, I. Bedner, G. Chang, A. Kuchinsky, and H. Zhang. A framework for adaptive content delivery in heterogeneous network environments. In Proc. SPIE, Multimedia Computing and Networking, volume 3969, pages 86–100. SPIE, 2000.
- C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 Grand Challenge. In DEBS, pages 289–294. ACM, 2013.
- P. Przymus, A. Boniewicz, M. Burzanska, and K. Stencel. Recursive query facilities in relational databases: a survey. In DTA and BSBT, pages 89–99.
- K. Reumann and A. P. M. Witkam. Optimizing curve segmentation in computer graphics. In Proceedings of the International Computing Symposium, pages 467–472. North-Holland Publishing Company, 1974.
- W. Shi and C. Cheung. Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, 43(1):27–44, 2006.
- M. Visvalingam and J. Whyatt. Line generalisation by repeated elimination of points. The Cartographic Journal, 30(1):46–51, 1993.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
- R. Wesley, M. Eldridge, and P. Terlecki. An analytic data engine for visualization in tableau. In SIGMOD, pages 1185–1194. ACM, 2011.
- Y. Wu, D. Agrawal, and A. El Abbadi. A comparison of DFT and DWT based similarity search in timeseries databases. In CIKM, pages 488–495. ACM, 2000.