Metadata-Version: 2.1
Name: media-parser
Version: 2.0.0
Summary: API for parsing media from social networks
Home-page: https://github.com/jag-k/media-parser#readme
License: MIT
Keywords: media-parser,poetry,tiktok,youtube,reddit,twitter
Author: Jag_k
Author-email: me@jagk.dev
Maintainer: Jag_k
Maintainer-email: me@jagk.dev
Requires-Python: >=3.12,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: FastAPI
Classifier: Framework :: Pydantic
Classifier: Framework :: Pydantic :: 1
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Natural Language :: Russian
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Typing :: Typed
Requires-Dist: aiohttp (>=3.9.5,<4.0.0)
Requires-Dist: async-lru (>=2.0.2,<3.0.0)
Requires-Dist: contextvars (>=2.4,<3.0)
Requires-Dist: httpx (>=0.27.0,<0.28.0)
Requires-Dist: jsonref (>=1.1.0,<2.0.0)
Requires-Dist: motor (>=3.3.1,<4.0.0)
Requires-Dist: pydantic (>=2,<3)
Requires-Dist: pydantic-settings (>=2.2.1,<3.0.0)
Requires-Dist: pytube (>=15.0.0,<16.0.0)
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Project-URL: Documentation, https://media-parser.rtfd.io
Project-URL: Repository, https://github.com/jag-k/media-parser
Description-Content-Type: text/markdown

# Media Parser

[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
[![Documentation Status](https://readthedocs.org/projects/media-parser/badge/?version=latest)](https://media-parser.readthedocs.io/?badge=latest)
[![Build Docker image](https://github.com/jag-k/media-parser/actions/workflows/docker-image.yml/badge.svg)](https://github.com/jag-k/media-parser/actions/workflows/docker-image.yml)
[![PyPI publish](https://github.com/jag-k/media-parser/actions/workflows/pypi-publish.yml/badge.svg)](https://github.com/jag-k/media-parser/actions/workflows/pypi-publish.yml)
[![PyPI version](https://img.shields.io/pypi/v/media-parser?logo=pypi&label=media-parser)](https://pypi.org/project/media-parsers/)

Server for parse Media by URL.

## Supported medias

- [x] Youtube
- [x] Tiktok
- [x] Instagram
- [x] Twitter
- [x] Reddit
- [ ] Pinterest

## Installation and Configuration Server

Use the `docker-compose.yml` file to run the server.

```yaml
version: "3.8"

service:
    media-parser:
        image: ghcr.io/jag-k/media-parser:latest
        ports:
            - 8000:8000
        environment:
            # Sentry integration (optional)
            SENTRY_DSN: "https://abcabc@sentry.io/2"
            SENTRY_ENVIRONMENT: "dev"

            # Enable sentry user feedback (optional)
            SENTRY_ORGANISATION_SLUG: "sentry"
            SENTRY_PROJECT_SLUG: "media-parser"
            SENTRY_AUTH_TOKEN: "..."  # with scope project:write
            SENTRY_API_HOST: "https://api.sentry.io/"

            # Database
            MONGO_URL: "mongodb://mongodb:27017"
            MONGO_DATABASE: "test"

        volumes:
            - ./config:/config

    mongodb:
        image: mongo:latest
        volumes:
            - ./data:/data/db
```

### Parsers Configuration

All configs for parsers stored in `config/parsers.json`. JSON Schema for
this: [schemas/parser_schema.json](https://github.com/jag-k/media-parser/blob/main/schemas/parser_schema.json).

To enable parser, you need to add config for this parser.
If parser hasn't config, like `tiktok` set an empty object (`{}`) to enable it.

Example:

```json5
// config/parsers.json
{
    "$schema": "https://raw.github.com/jag-k/media-parsers/blob/main/schemas/parser_schema.json",
    "instagram": {
        // Optional
        "instagram_saas_token": "asdasd"
    },
    "reddit": {
        "client_id": "",
        "client_secret": "",
        // Optional
        "user_agent": "video downloader (by u/Jag_k)"
    },
    "tiktok": {},
    "twitter": {
        "twitter_bearer_token": "asdasd"
    },
    "youtube": {}
}
```

Or you can use YAML file like `config/parsers.yaml` or `config/parsers.yml`:

```yaml
# config/parsers.yml
$schema: "https://raw.github.com/jag-k/media-parsers/blob/main/schemas/parser_schema.json"
instagram:
    # Optional
    instagram_saas_token: "asdasd"
reddit:
    client_id: ""
    client_secret: ""
    # Optional
    user_agent: "video downloader (by u/Jag_k)"
tiktok: {}
twitter:
    twitter_bearer_token: "asdasd"
youtube: {}
```

## Usage

API documentation available on `/docs` endpoint.

## Clients

### Installation

```bash
poetry add media-parser  # or pip install media-parser
```

### Usage

```python
from media_parser import Client, FeedbackTypes

client = Client(url="http://localhost:8000")


async def main():
    # Get all media
    media = await client.parse("https://www.youtube.com/watch?v=9bZkp7q19f0", user="jag-k")
    print(media)

    # If media is incorrect, you can send feedback
    await client.send_feedback(media, "jag-k", FeedbackTypes.wrong_media)


if __name__ == '__main__':
    import asyncio

    asyncio.run(main())
```

## License

[MIT](https://github.com/jag-k/media-parser/blob/main/LICENSE)

