Metadata-Version: 2.1
Name: pybite
Version: 1.4.2
Summary: Chunk by chunk iteration made easier
Home-page: https://github.com/metadeta96/pybite
Author: metadeta96
Author-email: metadeta96@gmail.com
License: UNKNOWN
Download-URL: https://github.com/metadeta96/pybite
Keywords: iteration,iter,iterable,sequence,list,tuple,dict,array,chunk,block,processing,data,split,slice,file,buffer
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.5
Description-Content-Type: text/markdown

# PyBite

Chunk by chunk iteration made easier

## Installation

    pip install pybite

## Methods

### iterate_by

Return a iterator of chunks for the iterable

#### Parameters

**iterable**: *iter*

Any iterable data e.g. list, tuple, dict, iter, ...

**chunk_size**: *int*

The size of each chunk

**map**: *callable* optional, defaults *None*

A map function for transform the data before dividing in chunks

**persist_header**: *bool* optional, defaults *False*

Whether to persist a header on each chunk or not. 
If persist_header is True and header is None, 
the firt line will be read as the actual header.

**header**: *str* optional, defaults *None*

A header to be written at the start of each file.
If persist_header is True and header is None, 
the firt line will be read as the actual header.

#### Returns

**iter**

New iterable for the chunked data

#### Examples

```python
>>> iterate_by([1, 2, 3, 4, 5], 2)
iter([[1, 2], [3, 4], [5]])
>>> iterate_by([1, 2, 3, 4, 5], 2, map=lambda x: x * 2)
iter([[2, 4], [6, 8], [10]])
>>> iterate_by(["numbers", 1, 2, 3, 4, 5], 2, persist_header=True)
iter([["numbers", 1, 2], ["numbers", 3, 4], ["numbers", 5]])
```

### iterate_file_by_lines

Return a iterator of file lines.

Each line is read on-demand untill there is no more line to
be read.

#### Parameters

**iterable**: *iter*

Any iterable data e.g. list, tuple, dict, iter, ...

**file_stream** : *str*, *io.StringIO*

A file path or io.StringIO instance to the file to be read.

**encoding** : *str* optional, defaults *None*

The input file encoding. 
The chunks will be saved using the same encoding.

**strip_end** : *bool* optional, defaults *False*

Flag for strip the end of each line.
Same as calling `line.rstrip()`.

#### Returns

**iter**

New iterable for the chunked data

#### Examples

```python
with open("test.txt", "w", encoding="utf-8") as f:
    f.write("Symbols\nAyp\nBx\nCC\nDt")

>>> iterate_file_by_lines("test.txt", encoding="utf-8")
iter(["Symbols\n", "Ayp\n", "Bx\n", "CC\n", "Dt"])
>>> iterate_file_by_lines("test.txt", encoding="utf-8", strip_end=True)
iter(["Symbols", "Ayp", "Bx", "CC", "Dt"])
```

### split_file_by_lines

Split a file into multiple files and store them in the output path.

The file is divided b lines into chunk files that later can be read individually
or joined back.

#### Parameters

**file_stream**: *str*, *io.StringIO*

A file path or io.StringIO instance to the file you want to split.

**output_path**: *str*

The path to the directory where the chunks will be stored.

**lines**: *int*

The quantity of lines on each chunk file. 
If included the header the total will be lines + 1.

**encoding**: *str* optional, defaults *None*

The input file encoding. 
The chunks will be saved using the same encoding.

**persist_header**: *bool* optional, defaults *False*

Whether to persist a header on each chunk or not. 
If persist_header is True and header is None, 
the firt line will be read as the actual header.

**header**: *str* optional, defaults *None*

A header to be written at the start of each file.
If persist_header is True and header is None, 
the firt line will be read as the actual header.

**chunk_name_format**: *str* optional, defaults *"04d"*

The format string for the chunk numbers in the output file names.

#### Returns

**list**

List of paths to the written files.

#### Examples

```python
with open("test.txt", "w", encoding="utf-8") as f:
    f.write("Symbols\nAyp\nBx\nCC\nDt")

>>> split_by_lines("test.txt", "out", 2, encoding="utf-8")
["out/test.chunk0000.txt", "out/test.chunk0001.txt", 
"out/test.chunk0002.txt"]
>>> split_by_lines("test.txt", "out", 2, encoding="utf-8", persist_header=True)
["out/test.chunk0000.txt", "out/test.chunk0001.txt"]
```

### join_file_chunks

Join chunk files into a single line stream.

Join the files created by split_by_lines into a iterable of
str lines and read ordered by name. 
Chunks not found will throw an error if ignore_missing_chunks is not False.

Avoid saving different files chunks into the same directory.

#### Parameters

**files_path**: *str*

The path to a directory containing the chunk files or a list of the path to the files.

**encoding**: *str* optional, defaults *None*

The input file encoding. 
The chunks will be saved using the same encoding.

**persisted_header**: *bool* optional, defaults *False*

Whether a header was persisted on each chunk or not. 
If persisted_header is True and header is None, 
the firt line of teh first file will be read as the actual header.

**header**: *str* optional, defaults *None*

A header to be read at the start of the first file.
If persisted_header is True and header is None, 
the firt line of teh first file will be read as the actual header.

**ignore_missing_chunks**: *bool* optional, defaults *"False*

Flag to ignore missing chunks.

#### Returns

**iter**

An iterable to the lines of the files (read in the order).

### split_file_by_lines

Slice file by lines into an iterable of the sliced lines.

Returns only cut lines in iterable str format. 
It does not work with negative numbers.

#### Parameters

**files_stream**: *str*

The path and file to be cut.

***start***: *int*

Initial position of the cut.
Use positive Number.

***end***: *int*

End position of the cut.
Use Positive Number.

**encoding**: *str* optional, defaults *None*

The input file encoding. 
The chunks will be saved using the same encoding.

**persisted_header**: *bool* optional, defaults *False*

Whether a header was persisted on each chunk or not. 
If persisted_header is True and header is None, 
the firt line of teh first file will be read as the actual header.

**header**: *str* optional, defaults *None*

A header to be read at the start of the first file.
If persisted_header is True and header is None, 
the firt line of teh first file will be read as the actual header.

#### Returns

**iter**

An iterable with the cut portion of the file

### Examples

```python
test_data = [
    "Name;Age;",
    "Test;00;",
    "Test;11;",
    "John;22;",
    "Test;33;",
    "Test;44;",
    "Test;55;",
    "Test;66;",
    "Test;77;",
    "Test;88;",
]
with open("./text.txt", "w") as f:
    f.write("\n".join(test_data))

>>> slice_file_by_lines(input_path, 1,2, persist_header=True)
iter(["Name;Age;\n", "Test;11;\n", "John;22;\n"])
>>> slice_file_by_lines(input_path, 1,2)
iter(["Test;11;\n", "John;22;\n"])
```

## Test

Test are handled by [PyTest](https://pypi.org/project/pytest/) and are included inside the folder `test`.

To run the test execute the command:

    pytest


