Metadata-Version: 1.1
Name: DecisionTree
Version: 3.3.0
Summary: A Python module for decision-tree based classification of multidimensional data
Home-page: https://engineering.purdue.edu/kak/distDT/DecisionTree-3.3.0.html
Author: Avinash Kak
Author-email: kak@purdue.edu
License: Python Software Foundation License
Download-URL: https://engineering.purdue.edu/kak/distDT/DecisionTree-3.3.0.tar.gz#md5=0dd4d0d3c9fcb9306b2e4ff35b62c87a
Description:  
        
        
        
        Consult the module API page at 
        
              https://engineering.purdue.edu/kak/distDT/DecisionTree-3.3.0.html
        
        for all information related to this module, including
        information regarding the latest changes to the code. The
        page at the URL shown above lists all of the module
        functionality you can invoke in your own code.  That page
        also describes in great detail how you can use the boosting
        and the bagging capabilities of the module.  The latest
        changes to the module allow you to tackle 
        needle-in-a-haystack sort of data classification problems; these are problems
        when your training data is dominated excessively by just one
        class.
        
        With regard to the basic purpose of the module, assuming you
        have placed your training data in a CSV file, all you have
        to do is to supply the name of the file to this module and
        it does the rest for you without much effort on your part
        for classifying a new data sample.  A decision tree
        classifier consists of feature tests that are arranged in
        the form of a tree. The feature test associated with the
        root node is one that can be expected to maximally
        disambiguate the different possible class labels for a new
        data record.  From the root node hangs a child node for each
        possible outcome of the feature test at the root. This
        maximal class-label disambiguation rule is applied at the
        child nodes recursively until you reach the leaf nodes.  A
        leaf node may correspond either to the maximum depth desired
        for the decision tree or to the case when there is nothing
        further to gain by a feature test at the node.
        
        Typical usage syntax:
        
        ::
        
              training_datafile = "stage3cancer.csv"
              dt = DecisionTree.DecisionTree( 
                              training_datafile = training_datafile,
                              csv_class_column_index = 2,
                              csv_columns_for_features = [3,4,5,6,7,8],
                              entropy_threshold = 0.01,
                              max_depth_desired = 8,
                              symbolic_to_numeric_cardinality_threshold = 10,
                   )
        
                dt.get_training_data()
                dt.calculate_first_order_probabilities()
                dt.calculate_class_priors()
                dt.show_training_data()
                root_node = dt.construct_decision_tree_classifier()
                root_node.display_decision_tree("   ")
        
                test_sample  = ['g2 = 4.2',
                                'grade = 2.3',
                                'gleason = 4',
                                'eet = 1.7',
                                'age = 55.0',
                                'ploidy = diploid']
                classification = dt.classify(root_node, test_sample)
                print "Classification: ", classification
        
                  
Keywords: data classification,decision trees,information analysis
Platform: All platforms
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
