Metadata-Version: 2.1
Name: code2seq
Version: 0.0.0
Summary: Set of pytorch modules and utils to train code2seq model
Home-page: https://github.com/JetBrains-Research/code2seq
Author: Egor Spirin
Author-email: spirin.egor@gmail.com
License: MIT
Download-URL: https://pypi.org/project/code2seq/
Keywords: code2seq,pytorch,pytorch-lightning,ml4code,ml4se
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: black (==20.8b1)
Requires-Dist: torch (==1.7.1)
Requires-Dist: tqdm (==4.58.0)
Requires-Dist: numpy (==1.20.1)
Requires-Dist: pytorch-lightning (==1.1.7)
Requires-Dist: wandb (==0.10.20)
Requires-Dist: mypy (==0.812)
Requires-Dist: hydra-core (==1.0.6)
Requires-Dist: omegaconf (==2.0.6)

# code2seq

[![JetBrains Research](https://jb.gg/badges/research.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
[![Github action: build](https://github.com/SpirinEgor/code2seq/workflows/Build/badge.svg)](https://github.com/SpirinEgor/code2seq/actions?query=workflow%3ABuild)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


PyTorch's implementation of code2seq model.

## Configuration

Use `yaml` files from [config](code2seq/configs) directory to configure all processes.
`model` option is used to define model, for now repository supports:
- code2seq
- typed-code2seq
- code2class

`data_folder` stands for the path to the folder with dataset.
For checkpoints with predefined config, users can specify data folder by argument in corresponding script.

## Data

Code2seq implementation supports the same data format as the original [model](https://github.com/tech-srl/code2seq).
The only one different is storing vocabulary. To recollect vocabulary use
```shell
PYTHONPATH='.' python preprocessing/build_vocabulary.py
```

## Train model

To train model use `train.py` script
```shell
python train.py model
```
Use [`main.yaml`](code2seq/configs/main.yaml) to set up hyper-parameters.
Use corresponding configuration from [`configs/model`](code2seq/configs/model) to set up dataset.

To resume training from saved checkpoint use `--resume` argument
```shell
python train.py model --resume checkpoint.ckpt
```

## Evaluate model

To evaluate trained model use `test.py` script
```shell
python test.py checkpoint.py
```

To specify the folder with data (in case on evaluating on different from training machine) use `--data-folder` argument
```shell
python test.py checkpoint.py --data-folder path
```


