Metadata-Version: 2.1
Name: multilingual-pdf2text
Version: 1.0.0
Summary: A python library for extracting text from PDFs without losing the formatting of the PDF content.
Home-page: https://github.com/shahrukhx01/multilingual-pdf2text
Author: Shahrukh Khan
Author-email: sk28671@gmail.com
License: UNKNOWN
Description: # Multilingual PDF to Text.
        
        ## Install Package from Pypi
        1. Install it using pip.
        ```bash
        pip install linkedin-jobs-pyscraper
        ```
        The library uses Tesseract which can be installed by following instructions:
         
        [Tesseract Installation](https://tesseract-ocr.github.io/tessdoc/Installation.html)
        
        ## Example Usage
        2. Use it in your code
        ```python
        from multilingual_pdf2text.pdf2text import PDF2Text
        from multilingual_pdf2text.models.document_model.document import Document
        import logging
        logging.basicConfig(level=logging.INFO)
        
        def main():
            ## create document for extraction with configurations
            pdf_document = Document(
                document_path='/Users/shahrukh/Desktop/multilingual-pdf2text/example/example.pdf',
                language='spa'
                )
            pdf2text = PDF2Text(document=pdf_document)
            content = pdf2text.extract()
            print(content)
        
        if __name__ == "__main__":
            main()
        ```
        
        Tesseract supports the following languages:
        Code	Language
        * afr	Afrikaans	
        * amh	Amharic	
        * ara	Arabic	
        * asm	Assamese	
        * aze	Azerbaijani	
        * aze_cyrl	Azerbaijani - Cyrillic	aze_
        * bel	Belarusian	
        * ben	Bengali	
        * bod	Tibetan	
        * bos	Bosnian	
        * bul	Bulgarian	
        * cat	Catalan; Valencian	
        * ceb	Cebuano	
        * ces	Czech	
        * chi_sim	Chinese - Simplified	chi_
        * chi_tra	Chinese - Traditional	chi_
        * chr	Cherokee	
        * cym	Welsh	
        * dan	Danish	
        * deu	German	
        * dzo	Dzongkha	
        * ell	Greek, Modern (1453-)	
        * eng	English	
        * enm	English, Middle (1100-1500)	
        * epo	Esperanto	
        * est	Estonian	
        * eus	Basque	
        * fas	Persian	
        * fin	Finnish	
        * fra	French	
        * frk	German Fraktur	
        * frm	French, Middle (ca. 1400-1600)	
        * gle	Irish	
        * glg	Galician	
        * grc	Greek, Ancient (-1453)	
        * guj	Gujarati	
        * hat	Haitian; Haitian Creole	
        * heb	Hebrew	
        * hin	Hindi	
        * hrv	Croatian	
        * hun	Hungarian	
        * iku	Inuktitut	
        * ind	Indonesian	
        * isl	Icelandic	
        * ita	Italian	
        * ita_old	Italian - Old	ita_
        * jav	Javanese	
        * jpn	Japanese	
        * kan	Kannada	
        * kat	Georgian	
        * kat_old	Georgian - Old	kat_
        * kaz	Kazakh	
        * khm	Central Khmer	
        * kir	Kirghiz; Kyrgyz	
        * kor	Korean	
        * kur	Kurdish	
        * lao	Lao	
        * lat	Latin	
        * lav	Latvian	
        * lit	Lithuanian	
        * mal	Malayalam	
        * mar	Marathi	
        * mkd	Macedonian	
        * mlt	Maltese	
        * msa	Malay	
        * mya	Burmese	
        * nep	Nepali	
        * nld	Dutch; Flemish	
        * nor	Norwegian	
        * ori	Oriya	
        * pan	Panjabi; Punjabi	
        * pol	Polish	
        * por	Portuguese	
        * pus	Pushto; Pashto	
        * ron	Romanian; Moldavian; Moldovan	
        * rus	Russian	
        * san	Sanskrit	
        * sin	Sinhala; Sinhalese	
        * slk	Slovak	
        * slv	Slovenian	
        * spa	Spanish; Castilian	
        * spa_old	Spanish; Castilian - Old	spa_
        * sqi	Albanian	
        * srp	Serbian	
        * srp_latn	Serbian - Latin	srp_
        * swa	Swahili	
        * swe	Swedish	
        * syr	Syriac	
        * tam	Tamil	
        * tel	Telugu	
        * tgk	Tajik	
        * tgl	Tagalog	
        * tha	Thai	
        * tir	Tigrinya	
        * tur	Turkish	
        * uig	Uighur; Uyghur	
        * ukr	Ukrainian	
        * urd	Urdu	
        * uzb	Uzbek	
        * uzb_cyrl	Uzbek - Cyrillic	uzb_
        * vie	Vietnamese	
        * yid	Yiddish
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
