Metadata-Version: 2.1
Name: codx
Version: 0.1.4
Summary: A package used to retrieve exon for protein sequences from RefSeqGene database
License: MIT
Author: Toan Phung
Author-email: toan.phungkhoiquoctoan@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: biopython (>=1.81,<2.0)
Requires-Dist: click (>=8.1.3,<9.0.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: uniprotparser (>=1.0.9,<2.0.0)
Description-Content-Type: text/markdown

## CODX

`codx` is a Python package that allows retrieval of exon data from the NCBI RefSeqGene database.

## Installation

```bash
pip install codx
```

## Usage Python Package

The package uses gene IDs to retrieve exon data from the NCBI RefSeqGene database. Gene IDs can be obtained from the UniProt database using the accession ID of the gene. The `get_geneids_from_uniprot` function can be used to obtain the gene ID from the RefSeqGene database of NCBI.

### Example Usage

#### Retrieve Gene IDs from UniProt

```python
from codx.components import get_geneids_from_uniprot

# Example UniProt accession IDs
accession_ids = ["P35568", "P05019", "Q99490", "Q8NEJ0", "Q13322", "Q15323"]
gene_ids = get_geneids_from_uniprot(accession_ids)
print(gene_ids)  # Output: Set of gene IDs
```

#### Create a Database and Retrieve Gene Data

```python
from codx.components import create_db

# Create a database with gene and exon data from NCBI
db = create_db(["120892"], entrez_email="your@email.com")  # Provide an email address for NCBI API

# Retrieve a gene object using its gene name
gene = db.get_gene("LRRK2")

# Retrieve exon data from the gene object
for exon in gene.blocks:
    print(exon.start, exon.end, exon.sequence)

# Generate all possible ordered combinations of exons
for exon_combination in gene.shuffle_blocks():
    print(exon_combination)
```

#### Six-Frame Translation of Sequences

```python
from codx.components import three_frame_translation

# Generate six-frame translation of exon combinations
for exon_combination in gene.shuffle_blocks():
    three_frame = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"])
    three_frame_complement = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"], reverse=True)
    print(three_frame)
    print(three_frame_complement)
```

## Usage Command Line

In addition to the Python API, the package provides a CLI interface for the same purpose.

### CLI Usage

```bash
Usage: codx [OPTIONS] IDS

Options:
  -o, --output TEXT              Output file
  -i, --include-intron           Include intron
  -u, --uniprot                  Input is UniProt accession IDs
  -t, --translate                Translate to protein
  -3, --three-frame-translation  Translate to protein in 3 frames
  -6, --six-frame-translation    Translate to protein in 6 frames (3 forward and 3 reverse complement)
  --help                         Show this message and exit.
```

### Example CLI Usage

Retrieve data using UniProt accession IDs:

```bash
codx -o output.fasta -u P35568,P05019,Q99490,Q8NEJ0,Q13322,Q15323
```

Retrieve data using gene IDs:

```bash
codx -o output.fasta 1190,120892
```
