streamio - reading, writing and sorting large files.
streamio is a simple library of functions designed to read, write and sort large files using iterators so that the operations will successfully complete on systems with limited RAM.
| copyright: | CopyRight (C) 2013 by James Mills |
|---|
Return the min and max values for the given iterable
| Parameters: | xs (Any iterable of single numerical values.) – An iterable of values |
|---|
This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.
Take a list of ordered iterables; return as a single ordered generator.
| Parameters: | key – function, for each item return key value |
|---|
Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python
Given an input file sort it by performing a merge sort on disk.
| Parameters: |
|
|---|
This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.
Stream every line in the given file.
| Parameters: |
|
|---|
Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored.
Stream every line in the given file interpreting each line as CSV.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrapper around stream where the stream is treated as CSV.
Stream every line in the given file interpreting each line as JSON.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.
Stream every line in the given file interpreting each line as a dictionary of fields to items.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).
Compress the given iterable of bytes using zlib compressin
| Parameters: |
|
|---|---|
| Returns: | An iterable compressed with zlib |
| Return type: | iterable stream of bytes |
sort
Take a list of ordered iterables; return as a single ordered generator.
| Parameters: | key – function, for each item return key value |
|---|
Directly borrowed from: http://stackoverflow.com/questions/5023266/merge-join-two-generators-in-python
Given an input file sort it by performing a merge sort on disk.
| Parameters: |
|
|---|
This uses py._path.local.LocalPath.make_numbered_dir to create temporry scratch space to work with when splitting the input file into sorted chunks. The mergesort is processed iteratively in-memory using the ~merge function which is almost identical to ~heapq.merge but adds in the support of an optional key function.
stat
Return the min and max values for the given iterable
| Parameters: | xs (Any iterable of single numerical values.) – An iterable of values |
|---|
This function returns both the min and max of the given iterable by computing both at once and iterating/consuming the iterable once.
stream
Stream every line in the given file.
| Parameters: |
|
|---|
Each line in the file is read, stripped of surrounding whitespace and returned iteratively. Blank lines are ignored.
Stream every line in the given file interpreting each line as CSV.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrapper around stream where the stream is treated as CSV.
Stream every line in the given file interpreting each line as a dictionary of fields to items.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrapper around csvstream where the stream is treated as dict of field(s) to item(s).
Stream every line in the given file interpreting each line as JSON.
| Parameters: | filename (str, py._path.local.LocalPath or file.) – A str filename, A py._path.local.LocalPath instance or open file instnace. |
|---|
This is a wrappedaround stream except that it wraps each line in a dumps call essentially treating each line as a piece of valid JSON.
Compress the given iterable of bytes using zlib compressin
| Parameters: |
|
|---|---|
| Returns: | An iterable compressed with zlib |
| Return type: | iterable stream of bytes |