Metadata-Version: 2.1
Name: mrputils
Version: 0.3.0
Summary: This is a util module to help with movie revenue prediction
Home-page: https://github.com/scienclick/pde_cap_mrp_zagros/tree/main/mrputils
Author: Amir Shamsa
Author-email: amirshamsa@gmail.com
License: Apache License 2.0
Description: # Movie Revenue Prediction 🎬
        
        Mission: Given the characteristics of a movie (director, actors, budget…), predict the revenue it will generateDataset: Imdb (link).
        
        #### 🚩Data
        350 k+ movies
        Multiple countries and languages
        Data fetched from www.themoviedb.org.
        
        
        
        Technology: Tensorflow, Steamlit, Python, NLP
        
        #### 🚩 Zagros PDE Team 🌄
        [Amir Shamsa](https://www.eureka.slb.com/CNP.cfm?uid=amir-20111016a)            [Syed Aaquib Hussain](https://www.eureka.slb.com/CNP.cfm?uid=syed-20160505)
        [Mehdi Paydayesh](https://www.eureka.slb.com/CNP.cfm?uid=mehdi-20120402)        [Abdurraouf Aljaber](https://eur.delve.office.com/?u=9c7ac147-2739-4a06-899d-ff302ba9de0a&v=work)
        
        #### 🚩Learn more
        [Link to the detialed documnetation](https://slb001-my.sharepoint.com/:p:/g/personal/mpaydayesh_slb_com/Ec0pxL9AxSJOpv8jozuAhxYBrP4yZTm9_R2MNUCdyu8uvw?e=mNZBBv)
        
        [Link to the final presentation](https://slb001-my.sharepoint.com/:p:/g/personal/mpaydayesh_slb_com/EdngxR73sKtFvvKSnq7EI4gBcvekNsW04VWBla1r_g9GTA?e=0pD2A0)
        
        [Trello project managment](https://trello.com/invite/b/YjawKwro/ATTI7c94ea9cf3b681ea13ca96182052b4ccCD950991/project-management)
        
        💖This has been a cool project 😆 in this bootcamp!
        
        
        ## Requirements
        
        The major libraries used in these projects are:
        1. numpy,
        2. pandas,
        3. seaborn,
        4. matplotlib,
        5. missingno,
        6. random,
        7. re
        8. nltk
        9. sklearn
        10. tensorflow
        11. xgboost
        12. lightgbm
        
        
        rand_state=100
        RANDOMSEED = 100
        DISPLAY_WIDTH = 400
        DISPLAYMAX_COLUMNS = 25
        #endregion
        
        #region settings
        random.seed(RANDOMSEED)
        pd.set_option('display.width', DISPLAY_WIDTH)
        pd.set_option('display.max_columns', DISPLAYMAX_COLUMNS)
        import warnings
        warnings.filterwarnings('ignore')
        warnings.filterwarnings(action='once')
        
        #endregion
        
        
        
        ## File structure
        
        **Part 0: importing libararies**
        
        **Part 1: define functions (methods)**
        
        **Part 2: define processing functions (methods)**
        
        **Part 3: QCs**
        
        **Part 4: defining the features and targets**
        
        **Part 5: making the pipeline**
        
        **Part 6: cross validation and bagging regressor**
        
        **Part 7: model selection**
        
        **Part 8: gridSearch and hyperparameters testing**
        
        **Part 9: TPOT testing**
        
        **Part 10: stacking**
        
        **Part 11: model performance and learning curve**
        
        **Part 12: movie revenue prediction**
        
        **Part 13: model B - creating a model to find similar movies using KNN**
        
        **Part 14: model C - Creating a model to predict the movie popularity**
        
        **Part 15: model D - scraping new movie data for testing the model**
        
Platform: UNKNOWN
Description-Content-Type: text/markdown
