=====================================
 Schevo Internal Database Structures
=====================================


This document explains the data structures a Schevo database uses
internally.


Structural overview
===================

At the root of the Durus connection a key called 'SCHEVO' contains the
following structures::

  'SCHEVO': PersistentDict{
      'label': <db-label>,          [*]
      'format': <db-format>,
      'version': <schema-version>,
      'schema_source': <schema-module-source>,
      'extent_name_id': PersistentDict{
          <extent-name>: <extent-id>,
          ...,
          },
      'extents': PersistentDict{
          <extent-id>: PersistentDict{
              'entities': BTree{
                  <entity-oid>: PersistentDict{
                      'rev': <entity-rev>,
                      'fields': PersistentDict{
                          <field-id>: <stored-value>,
                          ...,
                          },
                      'link_count': <count-of-links>,
                      'links': PersistentDict{
                          (<referrer-extent-id>, <referrer-field-id>): BTree{
                              <referrer-oid>: None,
                              ...,
                              },
                          },
                      'related_entities': PersistentDict{       [**]
                          <field-id>: <related-entity-set>,
                          ...,
                          },
                      },
                  ...,
                  },
              'entity_field_ids': (<field-id>, ...),
              'field_id_name': PersistentDict{
                  <field-id>: <field-name>,
                  ...,
                  },
              'field_name_id': PersistentDict{
                  <field-name>: <field-id>,
                  ...,
                  },
              'id': <id-of-extent>,
              'indices': <indices>,
              'index_map': <index-map>,
              'len': <length-of-extent>,
              'name': <name-of-extent>,
              'normalized_index_map': <normalized-index-map>,
              'next_oid': <next-oid-of-extent>,
              },
          ...,
          },
      }

``[*]``: Labels are not present in older databases, and a default
label is assumed if a persistent label is not assigned.

``[**]``: The related entities structure is only present in databases
of format 2 or higher.


Indices
=======

Specification in schema
-----------------------

The distinction between a "key" and an "index" is that a key is the
minimal amount of information that will uniquely identify an entity,
whereas an index is a structure used to increase the efficiency of
lookup and sorting operations.

Implementation-wise, the specification of a key results in a unique
index.

Keys are specified in the same way as they always have been, using the
following syntax::

    _key(field1, [..., fieldn])

Indices are specified in a similar manner, using the following
syntax to create a non-unique index::

    _index(field1, [..., fieldn])

Of note is that the field order now becomes important, since each type
of index will also be used for obtaining ordered lists of entity OIDs,
not just for find operations and key collision checking.  Therefore,
the following becomes valid and creates two separate index structures,
although for key collision checking in effect either one may be used::

    _key(field1, field2)
    _index(field2, field1)

If an index is specified that is a superset of a key, it becomes
unique.  Therefore, in the above example, the index specified is a
unique index.

Additionally, the index specified in this example becomes a unique
index::

    _key(field1, field2)
    _index(field3, field2, field1)


Internal structures
-------------------

The main top-level structure of an extent index is `indices`, which
is structured as follows::

    dict{
      index-spec: (unique, index-tree),
      ...,
      }

`unique` is a boolean that is True if each leaf in the index must have
only zero or one `oid` keys as described below.

`index-spec` is a tuple of field IDs in the order given by the
`_key` and `_index` specifications in the schema.

`index-tree` is the following structure, where `field-value` are the
values of the first field specified in the `index-spec`::

    BTree{
      field-value: index-tree | oid-tree,
    }

If the `index-tree` itself is a branch, then for each `field-value`
there will be another `index-tree` for the next field in the index
spec.

If the `index-tree` is instead a leaf, then for each `field-value`
there will be an `oid-tree`, which is the following structure::

    BTree{
      oid: None,
      ...,
      }

Each `oid` in the `oid-tree` corresponds to an entity whose fields
specified in the `index-spec` have values that match all of the
`field-value` keys in each traversed `index-tree`.

The next top-level structure of an extent index is `index-map`, which
maps several `partial-index-spec` to lists of actual `index-spec` that
are stored in `indices`::

    dict{
      partial-index-spec: [index-spec-1, ..., index-spec-n],
    }

`partial-index-spec` is a tuple of field IDs, sorted by the order
defined in its corresponding `index-spec`.  For instance, if you have
two `index-spec` of (123, 456, 789) and (123, 456, 234), then a
`partial-index-spec` of (123, 456) would map to both of those
`index-spec`.

Of note is that `partial-index-spec` could actually be a full
`index-spec` that exists in `indices`.  For instance, given two
`index-spec` of (123, 456, 789) and (123, 456), a `partial-index-spec`
of (123, 456) would map to both of those `index-spec`.

The final top-level structure of an extent index is
`normalized-index-map`, which is the same as `index-map` except that
each `partial-index-spec` is normalized by sorting by field ID.


Traversal of indices (`find` method)
------------------------------------

If no arguments, just return a list of all OIDs.

The arguments to a `find` operation are a dictionary of field name and
field value.

Transform those arguments by converting field names to field IDs.

Sort a tuple of field IDs.  This becomes the normalized
`partial-index-spec`.

Get the most specific `index-spec` from `normalized-index-map` using
that `partial-index-spec` as the key.

If `partial-index-spec` was not found, revert to brute-force find. An
optimization of this would be to trim that `partial-index-spec` in
some way and look for an index that matches that.  In this manner, you
can use the index to obtain a subset of entities in which to perform a
brute-force find.

Access the top-level `index-tree` from `indices`.

In a loop:

  Pop the first field ID off of the `index-spec`.  Pop the value for
  that field off of the arguments to `find`.

  Look up that field value in the current `index-tree`; if there are
  no more field IDs in `index-spec` this will be an `oid-tree`,
  otherwise it will be another `index-tree`.

  If not found, return an empty list as the result of `find`.

  If there are more arguments to `find`, continue the loop.

  If there are no more arguments to `find`, and there are no more
  fields in `index-spec`, we are at a leaf; return the keys of the
  `oid-tree` as the result of `find`.

  If there are no more arguments to `find`, but there are more fields
  in `index-spec`, we are at a branch; traverse each branch
  recursively until leaves are reached.  Return the resulting set of
  matching OIDs.


Obtaining ordered OID lists (`by` method)
-----------------------------------------

The arguments to a `by` operation are a list of field names,
optionally with a hyphen prefix to indicate reverse sort order.

Transform the field names to field IDs, ignoring sort order for the
time being.

Create an `index-spec` based on these field IDs.

Access the appropriate `index-tree` in `indices`:

  First, look for a direct match in `indices` for the `index-spec`.

  If that fails, look for `index-spec` in `index-map`.  If no match is
  found, raise an `IndexNotFound` exception.  Otherwise, use the first
  spec in the resulting list as `index-spec` and use that to get the
  `index-tree`.

Create an empty list to store results.

Recursively, starting with the first field:

  If we are at a branch (more fields need to be traversed):

    If the current field being traversed is to be sorted ascending,
    iterate over the keys of the `index-tree` and recurse to the next
    level.

    If the current field being traversed is to be sorted descending,
    iterate over the keys of the `index-tree` in reverse order and
    recurse to the next level.

  If we are at a leaf (no more fields to traverse):

    Get the list of keys in the `oid-tree`.  Append those to the
    results list.

Return the results list.


Temporarily relaxing uniqueness constraints
-------------------------------------------

A transaction can relax the uniqueness constraint of a unique index by
calling `db.relax_index(*spec)`.

While a unique index is relaxed, the database will keep track of
changes made to that index by the transaction and its subtransactions.

When the transaction calls the corresponding
`db.enforce_index(*spec)`, the database will look at all the changes
made to that index.  If any non-unique values are found as a result of
those changes, the transaction will fail with an `KeyCollision`
exception and the changes made by that transaction will be rolled
back.

If a transaction calls `relax_index` without later explicitly calling
`enforce_index`, the database will automatically call `enforce_index`
at the end of the transaction that called `relax_index`.

If a transaction calls `relax_index` on an index, and a subtransaction
calls `relax_index` on the same index, the call to `enforce_index`
will be a no-op in the subtransaction as the outer transaction still
expects the index to be relaxed.
