Metadata-Version: 2.1
Name: refextract
Version: 1.1.2
Summary: Small library for extracting references used in scholarly communication.
Home-page: https://github.com/inspirehep/refextract
Author: CERN
Author-email: admin@inspirehep.net
License: GPLv2
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Dist: PyPDF2 (>=1.26.0,~=1.0)
Requires-Dist: autosemver (>=0.5.3,~=0.0)
Requires-Dist: python-magic (>=0.4.15,~=0.0)
Requires-Dist: requests (>=2.18.4,~=2.0)
Requires-Dist: six (>=1.10.0,~=1.0)
Requires-Dist: unidecode (>=1.0.22,~=1.0)
Provides-Extra: all
Requires-Dist: Sphinx (>=1.7.1,~=1.0) ; extra == 'all'
Requires-Dist: flake8-future-import (>=0.4.4,~=0.0) ; extra == 'all'
Requires-Dist: flake8 (>=3.5.0,~=3.0) ; extra == 'all'
Requires-Dist: pytest-cov (>=2.10,~=2.0) ; extra == 'all'
Requires-Dist: pytest (>=4.6,~=4.0) ; extra == 'all'
Requires-Dist: responses (>=0.8.1,~=0.0) ; extra == 'all'
Provides-Extra: docs
Requires-Dist: Sphinx (>=1.7.1,~=1.0) ; extra == 'docs'
Provides-Extra: tests
Requires-Dist: flake8-future-import (>=0.4.4,~=0.0) ; extra == 'tests'
Requires-Dist: flake8 (>=3.5.0,~=3.0) ; extra == 'tests'
Requires-Dist: pytest-cov (>=2.10,~=2.0) ; extra == 'tests'
Requires-Dist: pytest (>=4.6,~=4.0) ; extra == 'tests'
Requires-Dist: responses (>=0.8.1,~=0.0) ; extra == 'tests'
Requires-Dist: unicode-string-literal (>=1.1,~=1.0) ; (python_version=="2.7") and extra == 'tests'

..
   This file is part of refextract
   Copyright (C) 2015, 2016, 2018 CERN.

   refextract is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   refextract is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with refextract; if not, write to the Free Software Foundation, Inc.,
   59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

   In applying this license, CERN does not waive the privileges and immunities
   granted to it by virtue of its status as an Intergovernmental Organization
   or submit itself to any jurisdiction.


============
 refextract
============

.. image:: https://travis-ci.org/inspirehep/refextract.svg?branch=master
    :target: https://travis-ci.org/inspirehep/refextract

.. image:: https://coveralls.io/repos/github/inspirehep/refextract/badge.svg?branch=master
    :target: https://coveralls.io/github/inspirehep/refextract?branch=master


About
=====

A small library for extracting references used in scholarly communication.


Install
=======

.. code-block:: console

    $ pip install refextract


Usage
=====

To get structured information from a publication reference:

.. code-block:: python

    >>> from refextract import extract_journal_reference
    >>> reference = extract_journal_reference('J.Phys.,A39,13445')
    >>> print(reference)
    {
        'extra_ibids': [],
        'is_ibid': False,
        'misc_txt': u'',
        'page': u'13445',
        'title': u'J. Phys.',
        'type': 'JOURNAL',
        'volume': u'A39',
        'year': '',
    }

To extract references from a PDF:

.. code-block:: python

    >>> from refextract import extract_references_from_file
    >>> references = extract_references_from_file('1503.07589.pdf')
    >>> print(references[0])
    {
        'author': [u'F. Englert and R. Brout'],
        'doi': [u'doi:10.1103/PhysRevLett.13.321'],
        'journal_page': [u'321'],
        'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
        'journal_title': [u'Phys. Rev. Lett.'],
        'journal_volume': [u'13'],
        'journal_year': [u'1964'],
        'linemarker': [u'1'],
        'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
        'texkey': [u'Englert:1964et'],
        'year': [u'1964'],
    }

To extract directly from a URL:

.. code-block:: python

    >>> from refextract import extract_references_from_url
    >>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
    >>> print(references[0])
    {
        'author': [u'F. Englert and R. Brout'],
        'doi': [u'doi:10.1103/PhysRevLett.13.321'],
        'journal_page': [u'321'],
        'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
        'journal_title': [u'Phys. Rev. Lett.'],
        'journal_volume': [u'13'],
        'journal_year': [u'1964'],
        'linemarker': [u'1'],
        'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
        'texkey': [u'Englert:1964et'],
        'year': [u'1964'],
    }


Notes
=====

``refextract`` depends on `pdftotext`_.

.. _`pdftotext`: http://linux.die.net/man/1/pdftotext


Acknowledgments
===============

``refextract`` is based on code and ideas from the following people, who
contributed to the ``docextract`` module in Invenio:

- Alessio Deiana
- Federico Poli
- Gerrit Rindermann
- Graham R. Armstrong
- Grzegorz Szpura
- Jan Aage Lavik
- Javier Martin Montull
- Micha Moskovic
- Samuele Kaplun
- Thorsten Schwander
- Tibor Simko


License
=======

GPLv2


