Metadata-Version: 2.1
Name: ratvec
Version: 0.1.2
Summary: Generating dense embeddings for proteins using kernel PCA
Home-page: https://github.com/ratvec/ratvec
Author: Eduardo Brito
Author-email: eduardo.alfredo.brito.chacon@iais.fraunhofer.de
Maintainer: Eduardo Brito
Maintainer-email: eduardo.alfredo.brito.chacon@iais.fraunhofer.de
License: Apache 2.0 License
Download-URL: https://github.com/ratvec/ratvec/releases
Project-URL: Bug Tracker, https://github.com/ratvec/ratvec/issues
Project-URL: Documentation, https://ratvec.readthedocs.io
Keywords: Representation Learning,Kernel PCA,Principle Component Analysis,PCA
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6
Requires-Dist: click (>=7.0)
Requires-Dist: requests
Requires-Dist: distance
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: tqdm
Requires-Dist: sklearn
Requires-Dist: cython
Requires-Dist: click
Requires-Dist: biopython

RatVec
======
This tool generates low-dimensional, continuous, distributed vector representations for non-numeric entities such as
text or biological sequences (e.g. DNA or proteins) via kernel PCA with rational kernels.

The current implementation accepts any input dataset that can be read as a list of strings.

Installation |pypi_version| |python_versions| |pypi_license|
------------------------------------------------------------
RatVec can be installed on Python 3.6+ from `PyPI <https://pypi.python.org/pypi/ratvec>`_ with the following code in
your favorite terminal:

.. code-block:: sh

    $ pip install ratvec

or from the latest code on `GitHub <https://github.com/ratvec/ratvec>`_ with:

.. code-block:: sh

   $ pip install git+https://github.com/ratvec/ratvec.git

It can be installed in development mode with:

.. code-block:: sh

   $ git clone https://github.com/ratvec/ratvec.git
   $ cd ratvec
   $ pip install -e .

The ``-e`` dynamically links the code in the git repository to the Python site-packages so your changes get
reflected immediately.

How to Use
----------
``ratvec`` automatically installs a command line interface. Check it out with:

.. code-block:: sh

   $ ratvec --help

RatVec has three main commands: ``generate``, ``train``, and ``evaluate``:

1. **Generate**. Downloads and prepare the SwissProt data set that is showcased in the RatVec paper.

.. code-block:: sh

   $ ratvec generate

2. **Train**. Compute KPCA embeddings on a given data set. Please run the following command to see the arguments:

.. code-block:: sh

   $ ratvec train --help

3. **Evaluate**. Evaluate and optimize KPCA embeddings. Please run the following command to see the arguments:

.. code-block:: sh

   $ ratvec evaluate --help

Showcase Dataset
----------------
The application presented in the paper (SwissProt dataset [1]_ used by Boutet *et al.* [2]_) can be downloaded directly
from `here <https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JMFHTN>`_ or running the following
command:

.. code-block:: sh

   $ ratvec generate

References
----------
.. [1] Boutet, E. *et al.* (2016). `UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase:
   how to use the entry view. <https://doi.org/10.1007/978-1-4939-3167-5_2>`_. Plant Bioinformatics (pp. 23-54).

.. [2] Asgari, E., & Mofrad, M. R. (2015). `Continuous distributed representation of biological sequences for deep
   proteomics and genomics <https://doi.org/10.1371/journal.pone.0141287>`_. PloS one, 10(11), e0141287.


.. |python_versions| image:: https://img.shields.io/pypi/pyversions/ratvec.svg
    :alt: Python versions supported by RatVec

.. |pypi_version| image:: https://img.shields.io/pypi/v/ratvec.svg
    :alt: Current version of RatVec on PyPI

.. |pypi_license| image:: https://img.shields.io/pypi/l/ratvec.svg
    :alt: RatVec is distributed under the Apache 2.0 License


