Metadata-Version: 1.2
Name: pulling
Version: 1.2
Summary: Repository for parsing data from files and sites.
Home-page: https://github.com/ItYaS/pulling
Author: ItYaS
Author-email: ryaboshapkoseraph@gmail.com
License: Apache License 2.0
Description: Pulling
        =============
        Pulling is an open source python repository for parsing data from files and web pages. Documentation in English can be found here - https://github.com/ItYaS/pulling/wiki.
        The repository now supports .txt .rtf .pdf .docx .csv .avro .json formats and parsing data from tags(p, h, a, img, span) of web pages.
        
        Future
        ======================
        Such a repository can be extended for life. That's what I will do. But the next version (where there will be parsing from other formats) will not be released soon, because in 2020 and 2021, I am preparing for exams and admission to the Institute. So, keep this repository and be patient.
        
        In the future, I want to parse .orc .rcf .parquet .feather (and one day .doc .odt), add conversion to other extensions for all formats, add new functions, and new formats.
        
        Creation idea
        ======================
        The idea of creating this repository came to me, one might say by accident.
        I was writing my own site, which will check for matches between the link and the uploaded files. At the end I had a bug that I never fixed and because of which I never uploaded my site. But the code I wrote to check files and links, from which I suffered a lot (because by that time on the Internet. There were no materials on all extensions and I had to write everything myself) I wrote too long and looked for help on the Internet. It would be too shame not to post it.
        
        Especially, I don't think anybody else would spend so much time looking for extracting text from other extensions.
        There are no such libraries that really work, for example, on windows. Just no! (by now)
        
        Version 1.1.1
        ======================
        - At first, several things were added:
        	1. Link parsing from audio and video tags
        	2. Improved data parsing from the img tag
        	3. Parsing text from such tags b, big, small, strong, i, sub, sup, span, ins, del, th
        - Secondly, the parsing function now returns 2 dictionaries. The first one with data from tags containing text, and the second one with links.
        - The code readability was also fixed and all unnecessary loops were removed.
Platform: UNKNOWN
Requires-Python: >=3.5.1
