Metadata-Version: 2.1
Name: nba_analytics
Version: 0.2.0
Summary: A package for collecting and analyzing NBA player data.
Home-page: https://github.com/ahernandezjr/nba_analytics
Author: Alexander Hernandez
Author-email: ahernandezjr0@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp==3.9.5
Requires-Dist: aiosignal==1.3.1
Requires-Dist: asttokens==2.4.1
Requires-Dist: async-timeout==4.0.3
Requires-Dist: attrs==23.2.0
Requires-Dist: azure-core==1.30.2
Requires-Dist: azure-storage-blob==12.20.0
Requires-Dist: azure-storage-file-datalake==12.15.0
Requires-Dist: backcall==0.2.0
Requires-Dist: basketball_reference_web_scraper==4.14.0
Requires-Dist: beautifulsoup4==4.12.3
Requires-Dist: bleach==6.1.0
Requires-Dist: certifi==2024.2.2
Requires-Dist: cffi==1.16.0
Requires-Dist: chardet==5.2.0
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: colorama==0.4.6
Requires-Dist: colorlog==6.8.2
Requires-Dist: comm==0.2.2
Requires-Dist: contourpy==1.2.1
Requires-Dist: cryptography==42.0.8
Requires-Dist: cssutils==2.11.1
Requires-Dist: cycler==0.12.1
Requires-Dist: dataframe-image==0.2.3
Requires-Dist: debugpy==1.8.1
Requires-Dist: decorator==5.1.1
Requires-Dist: defusedxml==0.7.1
Requires-Dist: docopt==0.6.2
Requires-Dist: dynaconf==3.2.5
Requires-Dist: exceptiongroup==1.2.1
Requires-Dist: executing==2.0.1
Requires-Dist: fastjsonschema==2.19.1
Requires-Dist: filelock==3.13.1
Requires-Dist: fonttools==4.51.0
Requires-Dist: fpdf==1.7.2
Requires-Dist: frozenlist==1.4.1
Requires-Dist: fsspec==2024.2.0
Requires-Dist: gitdb==4.0.11
Requires-Dist: GitPython==3.1.43
Requires-Dist: html2image==2.0.4.3
Requires-Dist: idna==3.7
Requires-Dist: intel-openmp==2021.4.0
Requires-Dist: ipykernel==6.29.4
Requires-Dist: ipython==8.12.3
Requires-Dist: isodate==0.6.1
Requires-Dist: jedi==0.19.1
Requires-Dist: Jinja2==3.1.3
Requires-Dist: joblib==1.4.2
Requires-Dist: jsonschema==4.22.0
Requires-Dist: jsonschema-specifications==2023.12.1
Requires-Dist: jupyter_client==8.6.1
Requires-Dist: jupyter_core==5.7.2
Requires-Dist: jupyterlab_pygments==0.3.0
Requires-Dist: kaleido==0.2.1
Requires-Dist: kiwisolver==1.4.5
Requires-Dist: lxml==5.2.2
Requires-Dist: MarkupSafe==2.1.5
Requires-Dist: matplotlib==3.9.0
Requires-Dist: matplotlib-inline==0.1.7
Requires-Dist: mistune==3.0.2
Requires-Dist: mkl==2021.4.0
Requires-Dist: more-itertools==10.3.0
Requires-Dist: mpmath==1.3.0
Requires-Dist: multidict==6.0.5
Requires-Dist: nbclient==0.10.0
Requires-Dist: nbconvert==7.16.4
Requires-Dist: nbformat==5.10.4
Requires-Dist: nest-asyncio==1.6.0
Requires-Dist: networkx==3.2.1
Requires-Dist: numpy==1.26.3
Requires-Dist: packaging==24.0
Requires-Dist: pandas==2.2.2
Requires-Dist: pandocfilters==1.5.1
Requires-Dist: parso==0.8.4
Requires-Dist: pickleshare==0.7.5
Requires-Dist: pillow==10.2.0
Requires-Dist: pipreqs==0.5.0
Requires-Dist: platformdirs==4.2.2
Requires-Dist: prompt-toolkit==3.0.43
Requires-Dist: psutil==5.9.8
Requires-Dist: pure-eval==0.2.2
Requires-Dist: pycparser==2.22
Requires-Dist: Pygments==2.18.0
Requires-Dist: pyparsing==3.1.2
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pytz==2024.1
Requires-Dist: pywin32==306
Requires-Dist: pyzmq==26.0.3
Requires-Dist: referencing==0.35.1
Requires-Dist: reportlab==4.2.2
Requires-Dist: requests==2.31.0
Requires-Dist: rpds-py==0.18.1
Requires-Dist: scikit-learn==1.5.0
Requires-Dist: scipy==1.13.1
Requires-Dist: six==1.16.0
Requires-Dist: smmap==5.0.1
Requires-Dist: soupsieve==2.5
Requires-Dist: stack-data==0.6.3
Requires-Dist: sympy==1.12
Requires-Dist: tbb==2021.11.0
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: tinycss2==1.3.0
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: torchvision
Requires-Dist: tornado==6.4
Requires-Dist: tqdm==4.66.4
Requires-Dist: traitlets==5.14.3
Requires-Dist: typing_extensions==4.9.0
Requires-Dist: tzdata==2024.1
Requires-Dist: urllib3==2.2.1
Requires-Dist: wcwidth==0.2.13
Requires-Dist: webencodings==0.5.1
Requires-Dist: websocket-client==1.8.0
Requires-Dist: yarg==0.1.9
Requires-Dist: yarl==1.9.4

# nba_pistons

**Compiling a report is the primary goal of this project. It is located in the [`REPORT.md`](REPORT.md) file.**
This is a project for organizing data collection, data processing, and machine learning tasks related to NBA player statistics, specifically to determine valuable players among the DETROIT PISTONS.


## Usage

To use this project, clone the repository and set up the necessary dependencies.
Create an environment (Ctrl+Shift+P on VSCODE) using the requirements.txt.
You can then run the scripts in the `main_ipynb.ipynb` for easy use or directly in the `src` directory for data collection, processing, and machine learning tasks.


## Directory Structure

The project directory is organized as follows:

- **data/**: Contains datasets used in the project.
  - **datasets/**
    - `nba_players.csv`: Dataset containing information about NBA players.
    - `nba_player_stats_5years_overlap.csv`: Dataset containing every 5 consecutive years of NBA player statistics (from `nba_player_stats_5years.csv`).
    - `nba_player_stats_5years_tensor_ready.csv`: PyTorch import version of `nba_player_stats_5years.csv`.
    - `nba_player_stats_5years.csv`: Dataset (csv) containing first 5 years of NBA player statistics.
    - `nba_player_stats_5years.json`: Json version of `nba_player_stats_5years.csv`.
    - `nba_players_advanced.csv`: Dataset containing advanced NBA player statistics.
    - `nba_players_basic.csv`: Dataset containing basic NBA player statistics.
    - `nba_player_stats.csv`: Dataset containing combined NBA player statistics.
  - **graphs**: Contains data analytic graphs from `analytics/`.
  - **models**: Contains machine learning models from `machine_learning/`.
  - **reports**: Location for PowerBI and local pdf created reports from `src/utils/reporting.py`.

- **logs/**: Contains log files generated during the project.
  - `nba_player_stats.log`: Log file for NBA player statistics data processing.

  - **src/**: Contains the source code for data collection, data processing, and machine learning tasks.

    - **dataset/**: Contains scripts for processing and cleaning data.
      - `creation.py`: Module for creating datasets from NBA API using basketball_reference_web_scraper.
      - `processing.py`: Module for processing datasets to create a useful dataset.
      - `torch.py`: Module for processing datasets for PyTorch/machine learning evaluation.
      - `filtering.py`: Module for processing datasets further (possibly to be used by `dataset_processing.py`).
    - **machine_learning/**: Contains scripts for machine learning tasks.
      - **models/**: Contains models to be used for the machine learning tasks.
        - `arima.py`: (To Do for better step evaluation)
        - `lstm.py`: LSTM neural networks (custom and PyTorch built-in) for Many-to-Many prediction.
        - `neuralnet.py`: Basic neural net for 1-to-1 prediction
      - `train_models.py`: Module for directly training models in `models/`.
      - `use_models.py`: Module for directly using models in `models/`.

  - **utils/**: Contains utility scripts used across the project.

    - `logger.py`: Utility script for logging messages.
    - `config.py`: Utility for settings among files.

- **generate_requirements.bat**: Batch file to generate the requirements.txt file.
- **requirements.txt**: File containing project dependencies.
- **reference**: Any other files related to the project used for referencing.


# Work Schedule

<details open>
  <summary>Week of 7/8</summary>

  | Day       | Task | Status |
  | --------- | --------- | --------- |
  | Monday    | Set up Azure Resource Group. <br> Create Python Azure Function for data collection. | &#x2718; |
  | Tuesday   | Create Azure Data Factory. <br> Set up linked services and define ETL pipelines. | &#x2718; |
  | Wednesday | Create Azure Machine Learning Workspace. <br> Set up machine learning environment and upload datasets. | &#x2718; |
  | Thursday  | Train models using Azure Machine Learning. <br> Deploy models as web services. | &#x2718; |
  | Friday    | Integrate with Azure Blob Storage for data storage. <br> Update scripts to use Azure Blob Storage SDK. | &#x2718; |
  | Saturday  | **N/A: No Progress on Saturdays.** | --- |
  | Sunday    | Integrate with Power BI for visualization and reporting. <br> Automate workflow using Azure Logic Apps or Azure DevOps. | &#x2718; |

</details>

<details>
  <summary>Week of 7/1</summary>

  | Task | Result | Status |
  | --------- | --------- | --------- |
  | Explore Power BI, Azure, and Fabric | Decided on adapting project into Azure workflow with analytics into Fabric | &#x2714; |

</details>

<details>
  <summary>Week of 6/24</summary>

  | Day       | Task | Status |
  | --------- | --------- | --------- |
  | Monday    | Complete [`lstm`](src/machine_learning/models/lstm.py). <br> Look into [`REPORT.md`](REPORT.md) automation. | &#x2714; |
  | Tuesday   | Complete automation of [`reports`](reports/). | &#x2714; |
  | Wednesday | Look into Databricks implementation. Begin PowerBI testing. | &#x2714; |
  | Thursday  | Modify [`use_models.py`](src/machine_learning/use_models.py) use_model() for model prediction output. | &#x2714; |
  | Friday    | Complete prediction graphs and create average prediction bar graph in [`analytics`](src/dataset/analytics.py). <br> Look into PowerBI use cases over weekend and plan report. | &#x2714; |
  | Saturday  | **N/A: No Progress on Saturdays.** | --- |
  | Sunday    | Begin including Azure/Fabric/PowerBI for data organization, engineering, and reports. | &#x2714; |

</details>

<details>
  <summary>Week of 6/17</summary>

  | Day       | Task | Status |
  | --------- | --------- | --------- |
  | Monday    | Look into ARIMA and complete LSTM. | &#x2714; |
  | Tuesday   | Perform analytics for tasks and update `REPORT.md`. | &#x2714; |
  | Wednesday | Complete dataset expansion for any 5-year length players. | &#x2714; |
  | Thursday  | Complete `torch_overlap` to merge custom dataset. | &#x2714; |
  | Friday    | Create many(4)-to-one and one-to-one neural networks. | &#x2714; |
  | Saturday  | No Progress on Saturdays. <br> Meanwhile: Re-think dataset names for dataset. | --- |
  | Sunday    | Re-check and complete neural networks and start ARIMA preparation in `use_models`. <br> Perform analytics for tasks and update `REPORT.md`. | &#x2714; |

</details>


<details>
  <summary>Extra Tasks</summary>

  | By When | Task | Status |
  | --------- | --------- | --------- |
  | Before Azure Machine Learning Tasks | Refactor/modify dataset [`processing`]() to use numpy savez for saving with dictionary or label row. | &#x2718; |

</details>


## Contributors

- [Alexander Hernandez](https://github.com/ahernandezjr)

Feel free to contribute to this project by submitting pull requests or opening issues.
