Metadata-Version: 2.1
Name: sclassifier
Version: 1.0.0
Summary: Source classification using supervised and self-supervised learning
Home-page: https://github.com/SKA-INAF/sclassifier
Author: Simone Riggi
Author-email: simone.riggi@gmail.com
License: GPL3
Download-URL: https://github.com/SKA-INAF/sclassifier/archive/refs/tags/v1.0.0.tar.gz
Description: # sclassifier
        This python module allows to perform radio source classification analysis using different ML methods in a supervised/self-supervised or unsupervised way: 
        * convolutional neural networks (CNNs)    
        * convolutional autoencoders (CAEs)   
        * decision trees & LightGBM  
        * HDBSCAN clustering algorithm   
        * UMAP dimensionality reduction   
        * SimCLR & BYOL self-supervised frameworks   
        
        ## **Status**
        This software is under development. It requires python3 + tensorflow 2.x. 
        
        ## **Credit**
        This software is distributed with GPLv3 license. If you use it for your research, please add repository link or acknowledge authors in your papers.   
        
        ## **Installation**  
        
        To build and install the package:    
        
        * Clone this repository in a local directory (e.g. $SRC_DIR):   
          ```git clone https://github.com/SKA-INAF/sclassifier.git```
        * Create a virtual environment with your preferred python version (e.g. python3.6) in a local install directory (e.g. INSTALL_DIR):   
          ``` python3.6 -m venv $INSTALL_DIR```   
        * Activate your virtual environment:   
          ```source $INSTALL_DIR/bin/activate```
        * Install module dependencies listed in ```requirements.txt```:    
          ``` pip install -r requirements.txt```  
        * Build and install package:   
          ``` python setup build```   
          ``` python setup install```   
        * If required (e.g. outside virtual env), add installation path to your ```PYTHONPATH``` environment variable:   
          ``` export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python3.6/site-packages ```
        
        ## **Usage**
        Several python scripts are provided in the ```scripts``` directory to run desired tasks, described below.  
        
        ### **Image supervised classification with CNNs**
        The script `run_classifier_nn.py` allows to perform binary and multi-class (single or multi-label) radio image (single- or multi-channel, FITS format) classification using customized or predefined CNN architectures (resnet18/resnet34/resnet50/resnet101). Customized networks can be built by user through input options, piling up stacks of Conv2D/MaxPool/BatchNorm/Dropout layers, enabled or disabled when desired. Several user options are provided to customize network architecture, data pre-processing and augmentation. A list if available with: ```python run_classifier_nn.py --help```.    
        
        Input data (train/validation) must be given in json format with the following structure:    
        
        ```
        {  
          "data": [    
            {    
              "filepaths": [     
                "G340.743+00.313_ch1.fits",    
                "G340.743+00.313_ch2.fits",    
                "G340.743+00.313_ch3.fits"   
              ],    
              "sname": "G340.743+00.313",   
              "id": 6,   
              "label": "HII"    
            },    
            ...
            ...
          ]   
        }   
        ```    
        
        For multilabel classification the ```id``` and ```label``` keys must be lists.    
        
        Two run modes are supported: training, inference. To perform inference you need to specify the ```--predict``` option. To perform binary or multi-class classification you must specify the options ```--binary_class``` and ```--multilabel```, respectively. 
        
        To customize the desired class id/label names and relative targets, eventually remapping them with respect to values given in the input data list, you must specify the following options:   
        
        ```
        --nclasses=$NCLASSES     
        --classid_remap=$CLASSID_REMAP    
        --target_label_map=$TARGET_LABEL_MAP      
        --classid_label_map=$CLASSID_LABEL_MAP     
        --target_names=$TARGET_NAMES     
        ```
        
        For example:   
             
        ```
        NCLASSES=4     
        CLASS_PROBS='{"BACKGROUND":1.0,"COMPACT":0.1,"EXTENDED":1.0,"DIFFUSE":1.0}'    
        CLASSID_REMAP='{0:-1,1:0,2:1,3:2,4:3}'    
        TARGET_LABEL_MAP='{-1:"UNKNOWN",0:"BACKGROUND",1:"COMPACT",2:"EXTENDED",3:"DIFFUSE"}'    
        CLASSID_LABEL_MAP='{0:"UNKNOWN",1:"BACKGROUND",2:"COMPACT",3:"EXTENDED",4:"DIFFUSE"}'    
        TARGET_NAMES="BACKGROUND,COMPACT,EXTENDED,DIFFUSE"
        ```    
        
        Below we report some run examples:
        
        * To train a custom model (2 conv layers + 1 dense layer) from scratch:   
          ```
          python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV --nepochs=10 \    
            --nfilters_cnn=16,32 --kernsizes_cnn=3,3 --strides_cnn=1,1 --add_maxpooling_layer \    
            --add_dense_layer --dense_layer_sizes=16 \    
            --add_dropout --dropout_rate=0.4 --add_conv_dropout --conv_dropout_rate=0.2 \  
            --batch_size=64 --optimizer=adam --learning_rate=1e-4 \    
            --augment --augmenter=cnn --augment_scale_factor=5 \    
            --resize_size=64 --scale_to_abs_max
          ```   
          
        * To train a predefined model (resnet18) using pre-trained backbone .h5 weights (e.g. $WEIGHTFILE):    
          ```
          python run_classifier_nn.py --datalist=$DATALIST_TRAIN --datalist_cv=$DATALIST_CV [OPTIONS] \    
            --use_predefined_arch --predefined_arch=resnet18 --weightfile_backbone=$WEIGHTFILE 
          ```    
        
        * To perform inference with a saved .h5 model (e.g. $WEIGHTFILE) and weights (e.g. $WEIGHTFILE):     
          ```
          python run_classifier_nn.py --datalist=$DATALIST_TEST [OPTIONS] \    
            --modelfile=$MODELFILE --weightfile=$WEIGHTFILE [OPTIONS] \    
            --predict
          ```    
        
        ### **Image feature extraction with CAE**
        WRITE ME
        
        ### **Image feature extraction with SimCLR**
        WRITE ME
        
        ### **Feature reduction with UMAP**
        WRITE ME
        
        ### **Clustering feature data with HDBSCAN**
        WRITE ME
        
Keywords: radio,source,classification
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Astronomy
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown
