Metadata-Version: 2.1
Name: TextSpitter
Version: 0.2.post0
Summary: Python package that spits out text from your document files!
Home-page: https://github.com/fsecada01/TextSpitter
Author: Francis Secada
Author-email: francis.secada@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: python-docx
Requires-Dist: PyMuPDF

# THANK YOU FOR USING TEXTSPITTER!! #

I created this little app to help me process documents from folder sets and batches.  Instead of trying to determine each file type and process accordingly, I thought it would be more prudent to read file names and then route text extraction functions accordingly.  Also, I was having a really difficult time getting textract/pdftotext to work **because of damn Poppler**.  So instead of troubleshooting that whole process after 6+ hours, I figured this was more time-efficient.

This is my first python module, so I hope I did this well!

## Directions ##
This module is designed to run as simply as possible.  Just provide the file location string data into the argument, and get your text returned to you.

```
from TextSpitter import TexSpitter as TS
import sqlite3


folder_loc = 'foo/bar/'

# doc_file = folder_loc + 'file_thing.doc'
docx_file = folder_loc + 'file_thing.docx'
pdf_file = folder_loc + 'file_thing.pdf'
text_file = folder_loc + 'file_thing.txt'

doc_tup = (docx_file, pdf_file, text_file)
# doc_tup = (doc_file, docx_file, pdf_file, text_file)

# SQL code to write to database
conn = sqlite3.connect('example_db')
c= conn.cursor()

STMNT = 'INSERT INTO doc_contents VALUE %s'

# For Loop code to insert doc content into db
for ele in doc_tup:
	text = TS(ele)
	c.executemany(STMNT, text)
	print('Done!  Wrote the following to db: %s', (text[:25]))
```

## TO DOs ##
* [x] push to github
* [x] Remove .doc support due to legacy format's extensive proprietary reqs 
* [ ] spruce up documentation
* [ ] solicit feedback
* [ ] expand functionality to other file types
* [ ] TDB

## WANT TO CONTRIBUTE!? ##
_*OH MY GOD, PLEASE DO.*_

Just make a pull request and add whatever you want (or fix whatever you want).  I'll review and approve if everything seems good.  

Thanks, everyone!

