Metadata-Version: 2.1
Name: libsa4py
Version: 0.2.0
Summary: LibSA4Py: Light-weight static analysis for extracting type hints and features
Home-page: https://github.com/saltudelft/libsa4py
Author: Amir M. Mir (TU Delft)
Author-email: mir-am@hotmail.com
License: UNKNOWN
Description: # Intro
        [![PyPI version](https://badge.fury.io/py/libsa4py.svg)](https://badge.fury.io/py/libsa4py) 
        ![GH Workflow](https://github.com/saltudelft/libsa4py/actions/workflows/libsa4py_test.yaml/badge.svg)
        
        `LibSA4Py` is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.
        
        - [Requirements](#requirements)
        - [Quick Installation](#quick-installation)
        - [Usage](#usage)
          - [Processing projects](#processing-projects)
          - [Merging projects](#merging-projects)
        - [JSON Output](#json-output)
        
        # Requirements
        
        - Python 3.6 or newer (Python 3.8 is recommended)
        - [Watchman](https://facebook.github.io/watchman/) (for running [pyre](https://pyre-check.org/)) [**Optional**]
        - MacOS or Linux systems
        
        # Quick Installation
        
        ```
        git clone https://github.com/saltudelft/libsa4py.git
        cd libsa4py && pip install .
        ```
        
        # Usage
        ## Processing projects
        Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:
        ```
        libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre
        ```
        
        Description:
        - `--p $REPOS_PATH`: The path to the Python corpus or dataset.
        - `--o $OUTPUT_PATH`: Path to store processed projects.
        - `--d $DUPLICATE_PATH`: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the [CD4Py](https://github.com/saltudelft/CD4Py) tool). [**Optional**]
        - `--s`: Path to the CSV file for splitting the given dataset. [**Optional**]
        - `--j $WORKERS_COUNT`: Number of workers for processing projects. [**Optional**, default=no. of available CPU cores]
        - `--l $LIMIT`: Number of projects to be processed. [**Optional**]
        - `--c`: Whether to ignore processed projects. [**Optional**, default=False]
        - `--no-nlp`: Whether to apply standard NLP techniques to extracted identifiers. [**Optional**, default=True]
        - `--pyre`: Whether to run `pyre` to infer the types of variables for given projects. [**Optional**, default=False]
        
        ## Merging projects
        To merge all the processed JSON-formatted projects into a single dataframe, run the following command:
        ```
        libsa4py merge --o $OUTPUT_PATH --l $LIMIT
        ```
        
        Description:
        - `--o $OUTPUT_PATH`: Path to the processed projects, used in the previous processing step.
        - `--l $LIMIT`: Number of projects to be merged. [**Optional**]
        
        ## Applying types
        To apply Pyre's inferred types to projects, run the following command:
        ```
        libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH
        ```
        
        Description:
        - `--p $REPOS_PATH`: The path to the Python corpus or dataset.
        - `--o $OUTPUT_PATH`: Path to the processed projects, used in the previous processing step.
        
        # JSON Output
        After processing each project, a JSON-formatted file is produced, which is described [here](https://github.com/saltudelft/light-sa-type-inf/blob/master/JSONOutput.md).
Keywords: libsa4py static analysis features type hints type inference machine learning python pipeline light-weight
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Environment :: Console
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Operating System :: Unix
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
