Metadata-Version: 2.1
Name: selfies
Version: 0.2.3
Summary: Self-referencing embedded strings
Home-page: https://github.com/aspuru-guzik-group/selfies
Author: Mario Krenn
Author-email: mario.krenn@utoronto.ca, alan@aspuru.com
License: UNKNOWN
Description: ## SELFIES
        
        SELFIES (SELF-referencIng Embedded Strings) is a general-purpose, sequence-based,
        robust representation of semantically constrained graphs. It is based on a Chomsky
        type-2 grammar, augmented with two self-referencing functions. A main objective is
        to use SELFIES as direct input into machine learning models, in particular in
        generative models, for the generation of graphs with high semantical and syntactical
        validity.
        
        See the paper at arXiv: https://arxiv.org/abs/1905.13741
        
        The code presented here is a concrete application of SELFIES in chemistry, for
        the robust representation of molecule. We show the encoding and decoding of three
        molecules from various databases, and the generation of a new, random molecule
        with high semantical and syntactical validity.
        
        ### Installation
        You can install SELFIES via
        ```
        pip install selfies
        ```
        
        ### Examples
        Several examples can be seen in examples/selfies_example.py. Here is a simple encoding and decoding:
        
        ```python
        from selfies import encoder, decoder
            
        test_molecule1='CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1' # non-fullerene acceptors for organic solar cells
        selfies1=encoder(test_molecule1)
        smiles1=decoder(selfies1)
        
        print('test_molecule1: '+test_molecule1+'\n')
        print('selfies1: '+selfies1+'\n')
        print('smiles1: '+smiles1+'\n')
        print('equal: '+str(test_molecule1==smiles1)+'\n\n\n')
        ```
        
        - an example of SELFIES in a generative model can be seen in the directory 'VariationalAutoEncoder_with_SELFIES\'. There, SMILES datasets are automatically translated into SELFIES, and used for training of a variational autoencoder (VAE).
        
        ### Python version
        fully tested with Python 3.7.1 on
        - 134.000 molecules at QM9 database (https://www.nature.com/articles/sdata201422)
        - 250.000 molecues from the ZINC database (https://en.wikipedia.org/wiki/ZINC_database)
        - 72 million molecules from PubChem (https://pubchem.ncbi.nlm.nih.gov/)
        - 50.000 molecules for organic solar cells (https://www.sciencedirect.com/science/article/pii/S2542435117301307)
        - 1 million molecules from organic chemical reactions (https://pubs.rsc.org/en/content/articlehtml/2018/sc/c8sc02339e)
        
        supported:
        - Python 3.7.2, 3.7.1, 3.6.8, 3.6.7, 2.7.15
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
