Metadata-Version: 2.1
Name: spark-dataframe-tools
Version: 0.6.6
Summary: spark_dataframe_tools
Home-page: https://github.com/jonaqp/spark_dataframe_tools/
Author: Jonathan Quiza
Author-email: jony327@gmail.com
License: UNKNOWN
Download-URL: https://github.com/jonaqp/spark_dataframe_tools/archive/main.zip
Keywords: spark,dataframe
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas (==2.0.1)
Requires-Dist: numpy (==1.24.3)
Requires-Dist: findspark (==2.0.1)
Requires-Dist: pyspark (==3.1.1)
Requires-Dist: pyarrow (==12.0.0)
Requires-Dist: setuptools (==58.2.0)
Requires-Dist: jinja2 (==3.1.2)
Requires-Dist: humanize (==4.6.0)
Requires-Dist: wheel (==0.40.0)
Requires-Dist: sizebytes-tools (==0.0.3)
Requires-Dist: color-tools (>=0.0.2)
Requires-Dist: faker (==19.6.2)

# spark_dataframe_tools

[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)

spark_dataframe_tools is a Python library that implements styles in the Dataframe

## Installation

The code is packaged for PyPI, so that the installation consists in running:

```sh
pip install spark-dataframe-tools --user --upgrade
```

## Usage
```sh
import spark_dataframe_tools 
```

```sh
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","","F",-1)
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
df = spark.createDataFrame(data=data2, schema=schema)
```

## Pandas

```sh
df_pandas = df.toPandas()
df_pandas.show2()
```

## Spark

```sh
# Dataframe template table
df.show2()

# Dataframe memory usage
df.size()
```



## License

[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).

## New features v1.0

## BugFix

- choco install visualcpp-build-tools

## Reference

- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).


