Metadata-Version: 2.1
Name: fapyc
Version: 0.3.4
Summary: A Python wrapper for the FAPEC data compressor.
Home-page: https://www.dapcom.es
Author: DAPCOM Data Services
Author-email: fapec@dapcom.es
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown

# FaPyc

A Python wrapper for the FAPEC data compressor.
(C) DAPCOM Data Services S.L. - https://www.dapcom.es

The full FAPEC compression and decompression library is included in this package, but a valid license file must be available to properly use it.
Without a license, you can still use the decompressor (yet with some limitations, such as the maximum number of threads and the recovery of corrupted files).
You can get free evaluation licenses at https://www.dapcom.es/get-fapec/ to test the compressor. For full licenses, please contact us at fapec@dapcom.es
Once a valid license is obtained (either full or evaluation), you must define a `FAPEC_HOME` environment variable pointing at the path where you have stored the `fapeclic.dat` license file.

## Usage

There are 3 main execution modes:
* File: When invoking Fapyc or Unfapyc on a filename, it will (de)compress it directly into another file.
* Buffer: You can load the whole file to (de)compress on e.g. a byte array, and then invoke Fapyc/Unfapyc which will leave the result in the output buffer. Obviously, you should be careful with large files, as it may use a lot of RAM!
* File-to-buffer decompression: You can directly decompress a file (without having to load it beforehand) and leave its decompressed output in a buffer, which you can use afterwards.
* Chunk: FAPEC internally works in 'chunks' of data, typically 1-8 MB each (and up to 384MB each), which allows to progressively (de)compress a huge file while keeping memory usage under control. For now, this feature is only available in the FAPEC CLI, in WinFAPEC and in the C API, not in Fapyc/Unfapyc yet.

## Examples

### Compress and decompress a file

In this example we use the `kmall` option of FAPEC, suitable for this kind of geomaritime data files from Kongsberg Maritime:

    from fapyc import Fapyc, Unfapyc

    filename = input("Path to KMALL file: ")

    print("Preparing to compress %s" % (filename))
    # Here we invoke FAPEC to directly run on files,
    # so the memory usage will be small (just 10MB or so)
    # although it won't allow us to directly access the
    # (de)compressed buffers.
    f = Fapyc(filename, chunksize = 2048576, blen = 512)
    f.compress_kmall()

    print("Preparing to decompress %s" % (filename + ".fapec"))
    uf = Unfapyc(filename + ".fapec")
    uf.decompress(output=filename+".dec")


### Compress and decompress a buffer

In this example we use the `tab` option of FAPEC, which typically outperforms `gzip` and `bzip2` on tabulated text/numerical data such as point clouds or certain scientific data files:

    from fapyc import Fapyc, Unfapyc

    filename = input("Path to file: ")
    file = open(filename, "rb")
    # Beware - Load the whole file to memory
    data = file.read()
    f = Fapyc(buffer = data)
    # Invoke our tabulated-text compression algorithm
    # indicating a comma separator
    f.compress_tabtxt(sep1=',')
    print("Ratio =", round(float(len(data))/len(f.outputBuffer), 4))

    # Now we decompress the buffer
    uf = Unfapyc(buffer = f.outputBuffer)
    uf.decompress()


### Decompress a file into a buffer, and do some operations on it

Here we provide a quite specific use case, based on ESA/DPAC Gaia (E)DR3 bulk catalogue (which is publicly available as FAPEC-compressed CSVs).
In this example, we decompress one of the files, get its CSV-formatted contents with Pandas, apply some filtering conditions, and generate a histogram.

    from fapyc import Unfapyc
    from io import BytesIO
    import pandas as pd
    import matplotlib.pyplot as plt

    filename = input("Path to CSV-FAPEC file: ")

    ### Option 1: open the file, load it to memory (beware!), and decompress the buffer:
    #file = open(filename, "rb")
    #data = file.read()
    #uf = Unfapyc(buffer = data)

    ### Option 2: directly decompress from the file into a buffer:
    uf = Unfapyc(filename = filename)

    # Actual decompressor invocation - same for both options
    uf.decompress()

    # Regenerate the CSV from the bytes buffer
    df = pd.read_csv(BytesIO(uf.outputBuffer), comment="#")

    print("Info from the full CSV:")
    print(df.info())
    # Prepare some nice histograms for all data
    plt.subplot(2,2,1)
    plt.title("Full CSV: skymap (%d sources)" % df.shape[0])
    plt.xlabel("RA")
    plt.ylabel("DEC")
    print("Getting 2D histogram...")
    plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
    plt.colorbar()
    plt.subplot(2,2,2)
    plt.title("Full CSV: G dist")
    plt.xlabel("G magnitude")
    plt.ylabel("Counts")
    plt.yscale("log")
    print("Getting histogram...")
    plt.hist(df.phot_g_mean_mag, bins=(50))

    # Now let's repeat, but doing the histogram from only the values that fulfil
    # some conditions on some of the CSV fields
    print("Loading+filtering CSV...")
    iter_csv = pd.read_csv(BytesIO(uf.outputBuffer), comment="#", iterator=True, chunksize=1000)
    df = pd.concat((x.query("ra_error < 0.1 & dec_error < 0.1 & ruwe > 0 & ruwe < 5") for x in iter_csv))
    print("Info from the filtered CSV:")
    print(df.info())
    plt.subplot(2,2,3)
    plt.title("Filtered CSV: skymap (%d sources)" % df.shape[0])
    plt.xlabel("RA")
    plt.ylabel("DEC")
    print("Getting 2D histogram...")
    plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
    plt.colorbar()
    plt.subplot(2,2,4)
    plt.title("Filtered CSV: G dist")
    plt.xlabel("G magnitude")
    plt.ylabel("Counts")
    plt.yscale("log")
    print("Getting histogram...")
    plt.hist(df.phot_g_mean_mag, bins=(50))

    print("Plotting!")
    plt.show()


