Metadata-Version: 2.1
Name: miRe2e
Version: 0.17
Summary: An end-to-end deep neural network based on Transformers for pre-miRNA prediction
Home-page: https://github.com/sinc-lab/miRe2e
Author: Jonathan Raad
Author-email: jraad@sinc.unl.edu.ar
License: UNKNOWN
Project-URL: Webdemo, https://sinc.unl.edu.ar/web-demo/miRe2e/
Description: # miRe2e
        
        This package contains the original methods proposed in:
        
            [1] J. Raad, L. A. Bugnon, D. H. Milone and G. Stegmayer, "miRe2e: a full
            end-to-end deep model based on  Transformers for prediction
            of pre-miRNAs from raw genome-wide data", 2021.
        
        miRe2e is a novel deep learning model based on Transformers that allows
        finding  pre-miRNA sequences in raw genome-wide data. This model is a full
        end-to-end neural architecture, using only the raw sequences as inputs.
        This way, there is no need to use other libraries for preprocessing RNA sequences.
        
        The model has 3 stages, as depicted in the figure:
        
        1. Structure prediction model: predicts RNA secondary structure using only 
            the input  sequence.
        2. MFE estimation model: estimates the minimum free energy when folding (MFE) the secondary  structure.
        3. Pre-miRNA classifier: uses the input RNA sequence and the outputs of the two previous
          models to give a score to the input sequence in order to determine if it is a  pre-miRNA candidate.  
         
        ![Abstract](abstract.png)
        
        This repository provides a miRe2e model pre-trained with known pre-miRNAs
        from *H. sapiens*. It is open sourced and free to use. If you use any of
        the following, please cite them properly. 
        
        An easy to try online demo is available at [https://sinc.unl.edu.
        ar/web-demo/miRe2e/](https://sinc.unl.edu.ar/web-demo/miRe2e/). This demo runs 
        a pre-trained model on small RNA sequences. To use larger datasets, or 
        train your oun model, see the following instructions.  
        
        ## Installation
        
        You need a Python>=3.7 distribution to use this package. You can install
        the package from PyPI:
        
            pip install miRe2e
        
        depending on your system configuration. You can also clone this repository and install with:
        
            git clone git@github.com:sinc-lab/miRe2e.git
            cd miRe2e
            pip install .
        
        ## Using the trained models
        
        When using miRe2e, pre-trained weights will be automatically downloaded.
        The model receives a fasta file with a raw RNA sequence. The sequence is
        analyzed with a sliding window, and a pre-miRNA score is assigned to each part. 
        
        You can find a complete demonstration of usage in
        [miRe2e usage](https://colab.research.google.com/drive/1xeOrjaYP150War9R-LsPpukpJ7_TV0sh#scrollTo=Uan5dhSzegA2).
        
        The notebook is also in this repository: [miRe2e_usage.ipynb](miRe2e_usage.ipynb).
        
        ## Training the models
        
        Training the models may take several hours and requires GPU processing 
        capabilities beyond the ones provided freely by google colab.  In  the 
        following, there are instructions for training each stage of this 
        model. 
        
        Each one of the following steps will train a stage of the model, replacing 
        the current model during the rest of the program. New models are saved as 
        pickle files (*.pkl). These files can be loaded using 
        
        ```python
        from miRe2e import MiRe2e
        new_model = MiRe2E(mfe_model_file="trained_mfe_predictor.pkl",
                           structure_model_file="trained_structure_predictor.pkl",
                           predictor_model_file="trained_predictor.pkl")
        ```
        
          
        
        ### Structure prediction model
        
        To train the Structure prediction model, run:
        ```python
        from miRe2e import MiRe2e
        model = MiRe2e(device="cuda")
        model.fit_structure("hairpin_examples.fa")
        ```
        The fasta file should contain sequences of hairpins and it's secondary structure.
        
        ### MFE estimation model
        
        To train the Structure prediction model, run:
        ```python
        from miRe2e import MiRe2e
        model = MiRe2e(device="cuda")
        model.fit_mfe("mfe_examples.fa")
        ```
        
        The fasta file should contain sequences of pre-miRNAs, hairpin and flats with the target MFE. 
        
        
        ### Pre-miRNA classifier model
        
        To train the pre-miRNA classifier model, you need at least one set of 
        positive samples (known pre-miRNA sequences) and a set of negative samples. 
        Each sample must be a trimmed to 100 nt in length to use the current 
        model configuration. These should be stored in a single FASTA file, one sample 
        per row.  Furthermore, since the pre-miRNAs have an average length of less 
        than 100nt, it is  necessary to randomly trim negative training sequences 
        to match the positive distribution.  This prevents that training got  
        biased by 
        the length of the sequences.          
        
        To train this stage, run:
        
        ```python
        from miRe2e import MiRe2e
        model = MiRe2e(device="cuda")
        model.fit(pos_fname="positive_examples.fa", 
                  neg_fname="negative_examples.fa")
        ```
        
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
