Metadata-Version: 2.1
Name: isaacto-html-table-extractor
Version: 1.4.0.1
Summary: A python library for extracting data from html table
Home-page: https://github.com/isaacto/html-table-extractor
Author: Isaac To
Author-email: isaac.to@gmail.com
License: MIT
Description: # HTML Table Extractor
        [![Build Status](https://travis-ci.org/yuanxu-li/html-table-extractor.svg?branch=master)](https://travis-ci.org/yuanxu-li/html-table-extractor)
        
        Note: This is a re-release of html-table-extractor of yuanxu-li,
        existing just because I've been waiting for too long for an actual
        release to fix the incorrect dependency (pipenv would refuse to
        install new version of BeautifulSoup using the original version
        1.4.0).  I've kept changes to a minimum, just to add this notice, fix
        setup.py to make it PyPI friendly, and change the PyPI package name.
        
        _HTML Table Extractor is a python library that uses [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) to extract data from complicated and messy html table_
        
        ## Important links
        * Repository: https://github.com/yuanxu-li/html-table-extractor
        * Issues: https://github.com/yuanxu-li/html-table-extractor/issues
        
        ## Installation
        
        ```bash
        pip install 'beautifulsoup4==4.5.3'
        pip install html-table-extractor
        ```
        
        ## Usage
        
        ### Example 1 - Simple
        
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        table_doc = """
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        """
        extractor = Extractor(table_doc)
        extractor.parse()
        extractor.return_list()
        ```
        It will print out:
        ```python
        [[u'1', u'2'], [u'3', u'4']]
        ```
        
        ### Example 2 - Transformer
        
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        table_doc = """
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        """
        extractor = Extractor(table_doc, transformer=int)
        extractor.parse()
        extractor.return_list()
        ```
        It will print out:
        ```python
        [[1, 2], [3, 4]]
        ```
        
        ### Example 3 - Pass BS4 Tag
        
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        from bs4 import BeautifulSoup
        table_doc = """
        <html><table id='wanted'><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table><table id='unwanted'><tr><td>not wanted</td></tr></table></html>
        """
        soup = BeautifulSoup(table_doc, 'html.parser')
        extractor = Extractor(soup, id_='wanted')
        extractor.parse()
        extractor.return_list()
        ```
        It will print out:
        ```python
        [[u'1', u'2'], [u'3', u'4']]
        ```
        
        ### Example 4 - Complex
        
        <table>
            <tr>
                <td rowspan=2>1</td>
                <td>2</td>
                <td>3</td>
            </tr>
            <tr>
                <td colspan=2>4</td>
            </tr>
            <tr>
                <td colspan=3>5</td>
            </tr>
        </table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        table_doc = """
        <table>
          <tr>
            <td rowspan=2>1</td>
            <td>2</td>
            <td>3</td>
          </tr>
          <tr>
            <td colspan=2>4</td>
          </tr>
          <tr>
            <td colspan=3>5</td>
          </tr>
        </table>
        """
        extractor = Extractor(table_doc)
        extractor.parse()
        extractor.return_list()
        ```
        It will print out:
        ```python
        [[u'1', u'2', u'3'], [u'1', u'4', u'4'], [u'5', u'5', u'5']]
        ```
        
        ### Example 5 - Conflicted
        
        <table>
            <tr>
                <td rowspan=2>1</td>
                <td>2</td>
                <td rowspan=3>3</td>
            </tr>
            <tr>
                <td colspan=2>4</td>
            </tr>
            <tr>
                <td colspan=2>5</td>
            </tr>
        </table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        table_doc = """
        <table>
            <tr>
                <td rowspan=2>1</td>
                <td>2</td>
                <td rowspan=3>3</td>
            </tr>
            <tr>
                <td colspan=2>4</td>
            </tr>
            <tr>
                <td colspan=2>5</td>
            </tr>
        </table>
        """
        extractor = Extractor(table_doc)
        extractor.parse()
        extractor.return_list()
        ```
        It will print out:
        ```python
        [[u'1', u'2', u'3'], [u'1', u'4', u'3'], [u'5', u'5', u'3']]
        ```
        
        ### Example 6 - Write to file
        
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        
        ```python
        from html_table_extractor.extractor import Extractor
        table_doc = """
        <table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>
        """
        extractor = Extractor(table_doc).parse()
        extractor.write_to_csv(path='.')
        ```
        It will write to a given path and create a new csv file called `output.csv`:
        ```
        1,2
        3,4
        
        ```
        
        ## Team
        
        * [@yuanxu-li](https://github.com/yuanxu-li)
        
        ## Errors/ Bugs
        
        If something is not working correctly, or if you have any suggestion on improvements, [report it here](https://github.com/yuanxu-li/table-extractor/issues)
        
        ## Copyright
        
        Copyright (c) 2017 Justin Li. Released under the [MIT License](https://github.com/yuanxu-li/html-table-extractor/blob/master/README.md)
        
        Third-party copyright in this distribution is noted where applicable.
        
        
Keywords: html table beautifulsoup crawler scrape
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Description-Content-Type: text/markdown
