Pongo: Efficient Lossless Floating Point Compression

Authors

  • Yufeng Liu School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
  • Yao Shen School of Computer Science, Shanghai Jiao Tong University, Shanghai, China https://orcid.org/0000-0002-6744-1498
  • Fenghua Zhang School of Computer Science, Shanghai Jiao Tong University, Shanghai, China https://orcid.org/0009-0002-4492-2683
  • Feiteng Huang Huawei Cloud Database Innovation Lab, China

DOI:

https://doi.org/10.37256/ccds.6220257102

Keywords:

floating point compression, lossless compression, time series data, decimal native numbers

Abstract

A large amount of time series data is increasingly being collected in different fields. In order to make good use of this large amount of time series data, it is necessary to solve the problems of high storage costs and transmission bandwidth that the data bring. The general compression algorithms effectively reduce the size of data at the cost of a large amount of computation. However, due to the huge time cost and batch processing mode of the general compression algorithms, Time Series Management Systems (TSMSs) often use streaming compression algorithms to replace the general compression algorithms for compressing time series data. For floating-point data, the most prevalent streaming compression algorithms, such as those based on exclusive OR (XOR) operations, offer relatively fast processing and high compression ratios compared to conventional general-purpose compression algorithms. Among them, the Elf algorithm proposes the idea of first erasing and then compressing, achieving the best compression ratio among existing streaming compression algorithms. This paper proposes a new lossless streaming compression algorithm, Pongo, for floating-point numbers, which uses a carefully designed erasing method different from Elf. The Pongo algorithm employs a novel erasing technique that transforms the binary representation of fractional parts to decimal, leveraging a newly proposed algorithm that enhances the efficiency of this conversion process. To demonstrate the superior performance of Pongo, we conducted extensive experiments comparing it with ten leading compression algorithms across twenty-two different datasets. On average, Pongo achieves a compression ratio that is 14% better than Elf and 58% better than Gorilla, making it the top-performing algorithm among all those tested, as shown through both mathematical analysis and practical testing.

Downloads

Published

2025-07-23

How to Cite

1.
Liu Y, Shen Y, Zhang F, Huang F. Pongo: Efficient Lossless Floating Point Compression. Cloud Computing and Data Science [Internet]. 2025 Jul. 23 [cited 2025 Dec. 6];6(2):292-311. Available from: https://ojs.wiserpub.com/index.php/CCDS/article/view/7102