Metadata-Version: 2.1
Name: fswalk
Version: 1.3.2
Summary: An efficient multiprocessing directory walk and search tool
Home-page: https://github.com/bzizou/fs_walk
Author: Bruno Bzeznik
Author-email: Bruno.Bzeznik@univ-grenoble-alpes.fr
License: GNU GPL v3
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Clustering
Classifier: Topic :: System :: Filesystems
Classifier: Topic :: Utilities
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Requires-Dist: requests
Requires-Dist: pyjson5
Requires-Dist: elasticsearch

# fswalk
An efficient multiprocessing directory walk and search tool

## Introduction
fswalk is a simple python script that recursively walks through a filesystem 
directory to gather files meta-data and collect them into a **json file** or
an **Elasticsearch** database.
It runs several processes, each responsible of doing the list of the files
contained into a subdirectory.
Collected meta-data are `filename, path, uid, gid, size` and `atime`.
The output is either a json file sent on the fly to stdout, or an Elastisearch
indexing. A simple search option is provided to retrieve files by their owner,
group or a part of the name.

The script aslo provides an option to do a quick analyze of the resulting
output file.

*warning*: When the results are sent to stdout, due to multiprocessing and not 
to slow down the thing, the json file is printed with an extra `,` sign that might 
break json compatibility.
The `pyjson5` python library allows such non-standard json file to be read.

## Installation

Requirements:
  - python >= 3.5
  - python packages: requests, pyjson5, elasticsearch

Installing the current stable release:

```
$ pip install fswalk
```

Installing the latest devel snapshot:

```
$ pip install git+https://github.com/bzizou/fs_walk.git
```

## Example

Start a walk into the `/home/bzizou` directory with 8 process, excluding 
the `.snapshot`subdirectory and getting the result as a gzipped json file:

```
bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 |gzip > /tmp/out.gz    
```

Analyze the output from the resulting file:

```
bzizou@f-dahu:~/git/fs_walk$ fswalk -a /tmp/out.gz
User                                       Size            Count
=================================================================
bzizou                               2749804131            11125
root                                 1030651826             1351
1000                                  390705282              476
11610                                    726417                7

Group                                      Size            Count
=================================================================
realuser                             2749795275            11119
root                                 1030660332             1356
1000                                  390705282              476
2222                                     726417                7
staff                                       350                1

TOTAL SIZE: 4171887656
TOTAL FILES: 12959
```

Same directory scan, but we index the results into an Elastisearch database:

```
bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home -g
```

Do a search for all files with the "povray" string in their path name and belonging to the user which uid is 10000:

```
bzizou@f-dahu:~/git/fs_walk$ fswalk --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home --search="10000:*:povray:*"
/home/bzizou/povray/OAR.cigri.14068.1251218.stderr
/home/bzizou/povray/OAR.cigri.14068.1251220.stderr
/home/bzizou/povray/OAR.cigri.14068.1251224.stderr
/home/bzizou/povray/OAR.cigri.14068.1251231.stderr
/home/bzizou/povray/OAR.cigri.14068.1251231.stdout
/home/bzizou/povray/OAR.cigri.14068.1251233.stderr
/home/bzizou/povray/OAR.cigri.14068.1251233.stdout
/home/bzizou/povray/OAR.cigri.14068.1251234.stderr
/home/bzizou/povray/OAR.cigri.14068.1251234.stdout
/home/bzizou/povray/OAR.cigri.14068.1251237.stderr
/home/bzizou/povray/OAR.cigri.14068.1251237.stdout
/home/bzizou/povray/OAR.cigri.14068.1251238.stderr
```

## Usage
```
Usage: fswalk [options]

Options:
  -h, --help            show this help message and exit
  -p PATH, --path=PATH  Path to scan
  -n NPROC, --nproc=NPROC
                        Number of process to launch
  -x EXCLUDE_EXPR, --exclude=EXCLUDE_EXPR
                        Regular expression for path exclusion
  -a ANALYZE_FILE, --analyze=ANALYZE_FILE
                        Creates a summary based on a previously generated json
                        file
  -s SEARCH_STRING, --search=SEARCH_STRING
                        Search a subset of files with syntax:
                        [uid]:[gid]:[path_glob]:[hostname] (--analyze or
                        --elastic-host needed)
  --numeric             Output numeric uid/gid instead of names
  --hostname=HOSTNAME   Overwrite the value of the hostname string. Defaults
                        to local hostname.
  -e ELASTIC_HOST, --elastic-host=ELASTIC_HOST
                        Use an elasticsearch server for output. 'Ex:
                        http://localhost:9200'
  --elastic-index=ELASTIC_INDEX
                        Name of the elasticsearch index
  --elastic-bulk-size=MAX_BULK_SIZE
                        Size of the elastic indexing bulks
  -g, --elastic-purge-index
                        Purge the elasticsearch index before indexing
```

The `ANALYZE_FILE` parameter may be a gzip compressed json file or a plain-text json file.


