Metadata-Version: 2.1
Name: csvlineparser
Version: 1.0.0
Summary: A BNF grammar based CSV Parser that cleans up with messy CSV files, with optional Excel export
Project-URL: Homepage, https://github.com/raoulsson/csvlineparser
Author-email: raoulsson <hello@raoulsson.com>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# csvlineparser

A CSV Parser that cleans a messy CSV file, with optional Excel export...
The CSV Parser is based on a BNF grammar, and the parser is built using a recursive descent parser.
Parsing line by line, character by character...

See in the example folder for example_parser.py and the files in the in and out folders.

### From the Example

```python
def main():
    """ Specify the input and output file paths """
    in_file_path = 'in/sample-raw-input.csv'
    out_file_path = 'out/sample-clean.csv'
    """ Optionally add an excel output file """
    out_excel_path = 'out/sample-clean.xlsx'

    """
    Build the sha256 of the header column to make sure the input file has the expected header. Especially useful if
    the input file is provided by a third party or different occasions, when the input might have changed over time.
    """
    expected_header_sha = "c551587bfca31449332989a377d59eb5f7bb6f1fb39dce862930094340299723"

    """
    Per column, specify the original name in the csvfile, the datatype, if the column is required, if the column should 
    be ignored, a possible rename and a column expander. The column expander is used to split a column into multiple
    new cells. 
    
    In the example, we rename the misspelled column 'stattus' to 'status', and we split the column 'created_on' into 
    three new columns 'created_on_year', 'created_on_month' and 'created_on_day'. We also split the column 'subtotal'
    into two new columns 'subtotal' and 'subtotal_ccy'. And the column 'tax' is ignored, because it is not needed.
    """
    column_definition = [
        ['order_id', 'string', True, False, None, None],
        ['stattus', 'string', True, False, 'status', None],
        ['created_on', 'datetime', True, False, None, ColumnExpander(['created_on', 'created_on_year', 'created_on_month', 'created_on_day'], YearMonthDaySplitter())],
        ['customer_name', 'string', True, False, None, None],
        ['items_count', 'int', True, False, None, None],
        ['subtotal', 'double', True, False, None, ColumnExpander(['subtotal', 'subtotal_ccy'], PriceCCYSplitter())],
        ['tax', 'double', True, True, None, None],
        ['discounts_total', 'double', True, False, None, ColumnExpander(['discounts_total', 'discounts_total_ccy'], PriceCCYSplitter())],
        ['customer_comment', 'string', False, False, None, None],
    ]

    """ At least the column_definitions are required to build the parser config """
    default_parser_config = DefaultParserConfig(column_definition, expected_header_sha)

    csv_line_parser = CsvLineParser(default_parser_config)
    csv_line_parser.parse_csv(in_file_path, out_file_path, out_excel_path)


if __name__ == '__main__':
    main()
```


### Background

Based on https://secondboyet.com/articles/csvparser.html by Julian M Bucknall, rewritten in Python by raoulsson.

BNF Grammar:

    csvFile ::= (csvRecord)* 'EOF'
    csvRecord ::= csvStringList ('\n' | 'EOF')
    csvStringList ::= rawString [',' csvStringList]
    rawString := optionalSpaces [rawField optionalSpaces)]
    optionalSpaces ::= whitespace*
    whitespace ::= ' ' | '\t'
    rawField ::= simpleField | quotedField 
    simpleField ::= (any char except \n, EOF, \t, space, comma or double quote)+
    quotedField ::= '"' escapedField '"'
    escapedField ::= subField ['"' '"' escapedField]
    subField ::= (any char except double quote or EOF)+

## Installation

    Clone and run 'make init'. Or, without cloning, to install it as a module, 
    'pip install csvlineparser' should work as well...