Metadata-Version: 2.1
Name: pii-preprocess
Version: 0.1.0
Summary: Document preprocessing for PII Management
Home-page: https://github.com/piisa/pii-preprocess
Download-URL: https://github.com/piisa/pii-preprocess/tarball/v0.1.0
Author: Paulo Villegas
Author-email: paulo.vllgs@gmail.com
License: Apache
Keywords: PIISA, PII
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: test
License-File: LICENSE

# pii-preprocess

This package is intended for the data/document preprocessing stage in the PII
Management flow designed by PIISA.

It will contain:
 * a Python API and command-line entry points to read a number of file formats
   and convert them to PII Source Documents, as defined by pii-data
 * Utilities for document transformation (to ease PII processing)
 
 
## Contents

The current contents of the package are:
 * Classes and an API for reading some file types:
     - CSV files (into Table source documents)
     - Microsoft Word files (into Sequence or Tree source documents)
	 - Raw text files (read plain text files into Sequence source documents
	   or, using indentation, into Tree source documents).
 * A configurable loader class thar can load formats by dispatching to
   appropriate subclasses
 * Some command-line scripts:
    - a generic script that uses the loader class to convert any implemented
	  format to a YAML or plain text file
    - scripts for specific formats:
	   * a script to convert between CSV files and the YAML canonical
         representation for Source Documents
       * a script to convert between plain text files and the YAML
         canonical representation for Source Documents



