Metadata-Version: 2.1
Name: pcas
Version: 0.0.3
Summary: pcas provides an interface to PCAS microservices.
Home-page: https://bitbucket.org/pcas/python-interface
Author: Tom Coates, Alexander Kasprzyk
Author-email: t.coates@imperial.ac.uk
License: CC0
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: System :: Logging
Classifier: Programming Language :: SQL
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: COPYING.txt
License-File: AUTHORS.txt

# PCAS Python interface

The `pcas` Python package provides an interface to services running on [PCAS](https://www.pcas.xyz) infrastructure. It consists of several submodules:
* [keyvalue](#keyvalue-interface)
* [logger](#logger-interface)
* [sql](#sql-submodule)

## Keyvalue interface

The `keyvalue` submodule provides an interface to key-value databases hosted on servers running the PCAS database `kvdbd`. To connect to such a database, use a `Connection` object.

### Connection objects

A `Connection` represents a connection to a PCAS key-value database. You can create a `Connection` like this:

```python
from pcas import keyvalue

db = '...your database name...'
ssl_cert = '...your SSL certificate...'
conn = keyvalue.Connection(db, address='localhost:12356', certificate=ssl_cert)
```

The `address` parameter is optional; if omitted, its value will be read from the environment variable `PCAS_KVDB_ADDRESS`. Similarly if the `certificate` parameter is omitted, its value will be read from the environment variable `PCAS_SSL_CERT`. 

A `Connection` object has methods to connect to a table, create a new table, delete a table, describe a table, and list the tables in the database.

#### Connect to a table

To connect to a table, use the `connect_to_table` method. This takes the name of the table as an argument, and returns a `Table` object.
```python
t = conn.connect_to_table('table_name')
```
`Table` objects are documented [below](#table-objects).

#### Create a table
To create a new table, use the `create_table` method. This takes the name of the table and a dictionary as an argument, and returns `None`. The dictionary represents a template key-value record: it should have keys that are strings and values that are strings, integers, booleans, floating-point numbers, or `bytes` objects.
```python
conn.create_table('new_table', template) 
```

#### Delete a table
To delete a table from the database, use the `delete_table` method. This takes the name of the table as an argument, and returns `None`.
```python
conn.delete_table('unwanted')
```

#### Describe a table
To get a description of a table in the database, use the `describe_table` method. This takes the name of the table as an argument, and returns a dictionary describing a typical entry in the table.
```python
d = conn.describe_table('table_name`)
```
The return value is a dictionary such as 
```python
{'id': 0, 'coefficients': ''}
```
describing a typical key-value record in the table; here we see that the value of `'id'` in the records is an integer, and the value of `'coefficients'` is a string. If the database backing `kvdbd` has a schema (e.g. PostgreSQL) then the table description uses this schema; if the backing database is schemaless (e.g. MongoDB) then the table description is a "best guess" based on a sample of records from the table, and may not be correct for all records.


#### List tables
To list the tables in the database, use the `list_tables` method. The takes no arguments, and returns a list of the names of the tables in the database.

```python
table_list = conn.list_tables()  
```



### Table objects

A `Table` object represents a table in a PCAS keyvalue database. As discussed, you create a `Table` by calling the `connect_to_table` method on the `Connection` that represents the database. 

`Table` objects have methods to count, delete, insert, select, and update key-value records in the table, as well as methods to add, list, and delete indexes on keys of the key-value records.

#### Count records
To count records in the table, use the `count` method. It optionally takes a selector as argument, and returns the number of records in the table that match that selector.
```python
n = t.count(selector)
```
The selector can be omitted, in which case the total number of records in the table is returned; it can be a dictionary, representing a key-value record, in which case the number of records matching those keys and values is returned; or it can be a string in [SQL format](#sql-submodule) that specifies the condition. Examples:
```python
# Count all the records in the table t
num_records = t.count()

# Count the records with the specified keys and values
selector = {'rank': 3, 'name': 'exactly this'}
n = t.count(selector)

# Count the records matching the specified condition
n = t.count('WHERE rank BETWEEN 2 and 4 AND id >= 100')
```


#### Delete records
To delete records from the table, use the `delete` method. It optionally takes a selector as argument, and returns the number of records deleted.
```python
num_deleted = t.delete(selector)
```
The selector can be omitted, in which case all records in the table are deleted; it can be a dictionary, representing a key-value record, in which case all records matching those keys and values are deleted; or it can be a string in [SQL format](#sql-submodule) that specifies the condition. Examples:
```python
# Delete all the records in the table t
num_deleted = t.delete()

# Delete all records with the specified keys and values
selector = {'id': 3, 'rank': 5}
num_deleted = t.delete(selector)

# Delete all records that match the specified condition
num_deleted = t.delete('WHERE id = 3 AND rank <= 6')
```

#### Insert records
To insert records into the table, use the `insert` method. It takes an iterable of dictionaries as argument, and returns `None`. The dictionaries represent key-value records; they should have keys that are strings and values that are strings, integers, booleans, floating-point numbers, or `bytes` objects.
```python
t.insert(recs)
```

#### Select records
To select records from the table, use the `select` method. It takes a dictionary, an optional selector, an optional sort order, and an optional limit as arguments, and returns an iterator containing the matching records. 
```python
itr = t.select(template, selector, order, limit)
```
The records returned will be in the form specified by the key-value record `template`. If an `order` is provided, the records will be returned in that order. If a `limit` is provided, it must be a positive integer and at most that many records will be returned. The selector can be omitted, in which case all records in the table are returned; it can be a dictionary, representing a key-value record, in which case all records matching those keys and values are returned; or it can be a string in [SQL format](#sql-submodule) that specifies the condition.

If the selector is provided as a string in [SQL format](#sql-submodule) then it may also specify the order and/or the limit. If it does so then order, and/or respectively limit, must either be unspecified or specified as `None`. Examples:
```python
# We return the id and the rank, which are both integers
template = {'id': 0, 'rank': 0}

# Select all the records in the table t
itr = t.select(template)

# Select at most 10 records
itr = t.select(template, limit=10)

# Select records with specified keys and values
selector = {'rank':2, 'dimension': 4}
itr = t.select(template, selector)

# Select records that match a specified condition
itr = t.select(template, 'WHERE id > 40 AND dimension IN (1,3,5,7)')

# Select with the records returned in a specified order
itr = t.select(template, 'WHERE id > 40 SORT BY id ASC, rank DESC')

# An equivalent select, with the sort order specified directly. 
from pcas.sql import sort
my_order = [sort.ascending('id'), sort.descending('rank')]
itr = t.select(template, 'WHERE id > 40', order = my_order)
```

#### Update records
To update records in the table, use the `update` method. It takes a dictionary and an optional selector as arguments, and returns the number of records updated. 
```python
num_updated = t.update(replacement, selector)
```
All records in the table that match the selector are updated by setting the keys present in the dictionary `replacement` to the corresponding values. The selector can be omitted, in which case all records in the table are updated; it can be a dictionary, representing a key-value record, in which case all records matching those keys and values are updated; or it can be a string in [SQL format](#sql-submodule) that specifies the condition. Examples:
```python
# We will set the values 'id' and 'dimension' in the updated records
replacement = {'id': 98, 'dimension':3}

# Update all the records in the table t
num_updated = t.update()

# Update all records with the specified keys and values
selector = {'id': 98, 'rank': 12}
num_updated = t.update(selector)

# Update all records that match the specified condition
num_updated = t.update('WHERE id = 3 AND rank <= 6')
```

#### Add, delete, and list indexes
To add an index to the table, use the `add_index` method. It takes the key to be indexed as an argument, and returns `None`.
```python
t.add_index('the_key_to_be_indexed')
```
A unique index is an index together with the constraint that, for each value of the indexed key, there is at most one record with that value. To add a unique index to the table, use the `add_unique_index` method. It takes the key to be indexed as an argument, and returns `None`.
```python
t.add_unique_index('the_key_to_be_indexed')
```
To delete an index from the table, use the `delete_index` method. It takes the key for the index as an argument, and returns `None`. If there is no index on the specified key, this operation succeeds but does nothing.
```python
t.delete_index('the_key_to_be_indexed')
```
To list the indexes present on the table, use the `list_indices` method. It takes no arguments, and returns a list of the indexed keys.
```python
key_list = t.list_indices()
```

#### Describe the table

To get a description of the table, use the `describe` method. It takes no arguments, and has return value exactly as for the `describe_table` method of a `Connection` documented [above](#describe-a-table).
```python
d = t.describe()
```

### Key-value records

Key-value records are represented by dictionaries. These have keys that are strings and values that are strings, integers, booleans, floating-point numbers, or `bytes` objects.

## Logger interface

The `logger` submodule provides a log handler that writes to a server running the PCAS logging service `logd`. It uses the standard Python [logging framework](https://docs.python.org/3/library/logging.html). Example usage:
```python
import multiprocessing
import logging
from logging.handlers import QueueListener, QueueHandler
from pcas import logger

# Set up the log handler
ssl_cert = '...your SSL certificate...'
handler = logger.LogHandler(address='localhost:12354', certificate=ssl_cert)

# Set up a queue that passes log messages to our handler
queue = multiprocessing.Queue(-1)
queue_listener = QueueListener(queue, logger.LogHandler())
queue_listener.start()

# Create the logger
lg = logging.getLogger('your_log_name')
lg.addHandler(QueueHandler(queue))

# Do stuff...

# Write to the logger
lg.warning('hello world')

# Clean up
handler.close()
queue_listener.stop()
```
The `address` argument for `logger.LogHandler` specifies the address of the PCAS `logd` server and is optional; if omitted the address will be read from the environment variable `PCAS_LOG_ADDRESS`. The `certificate` option specifies the SSL certificate to use and is also optional; if omitted the certificate will be read from the environment variable `PCAS_SSL_CERT`. The log handler blocks on logging if the `logd` server is unavailable. This is why we used a queue in the code above: it ensures that calls to the logger `lg` are non-blocking.

## SQL submodule

The `sql` submodule defines a parser for SQL-formatted queries. The SQL should be formatted as follows:

```sql
[[WHERE] <where condition>] [ORDER BY <sort order>] [LIMIT <limit>]
```

See below for [examples](#examples). Note that prefixing the WHERE condition with "WHERE" is currently optional, although this might change in the future.

### `WHERE` conditions

The following types are supported:

* string -	surrounded by matching double- (") or single-quotes (')
* integer -	must fit in a 64-bit signed or unsigned integer
* float -		must fit in a 64-bit `float`
* boolean -	`TRUE` or `FALSE`

The following standard SQL operators are supported:
* `=`, `!=`
* `<`, `>`, `<=`, `>=`
* `IS`, `IS NOT`
* `IN`, `NOT IN`
* `BETWEEN`, `NOT BETWEEN`
* `AND`
* `OR`

### `SORT` orders

A `SORT` order should be formatted

```sql
key1 [ASC | DESC], key2 [ASC | DESC], ..., keyn [ASC | DESC]
```

where `ASC` and `DESC` denote increasing and decreasing order, respectively. Precisely what this means is determined by the underlying storage engine and data type. If `ASC` or `DESC` is omitted, then `ASC` is assumed by default.

### `LIMIT` specifications

A non-negative integer (that must fit in 64-bit signed integer) must be provided.

### Examples

```sql
WHERE is_reflexive IS FALSE AND number_of_points >= 12 ORDER BY id DESC LIMIT 5

WHERE id BETWEEN 4 and 9

WHERE TRUE

ORDER BY rank, id 

ORDER BY rank DESC, id ASC

LIMIT 50

WHERE rank <= 6 LIMIT 10
```


