Hurry Query
===========

The hurry query system for the Zope 3 catalog builds on catalog
indexes as defined in Zope 3 core, as well as the indexes in
zc.catalog. It is in part inspired by AdvancedQuery for Zope 2 by
Dieter Maurer, though has an independent origin.

Setup
-----

Let's define a simple content object. First its interface::

  >>> from zope.interface import Interface, Attribute, implements
  >>> class IContent(Interface):
  ...     f1 = Attribute('f1')
  ...     f2 = Attribute('f2')
  ...     f3 = Attribute('f3')
  ...     f4 = Attribute('f4')
  ...     t1 = Attribute('t1')
  ...     t2 = Attribute('t2')

And its implementation::

  >>> from zope.app.container.contained import Contained
  >>> class Content(Contained):
  ...     implements(IContent)
  ...     def __init__(self, id, f1='', f2='', f3='', f4='', t1='', t2=''):
  ...         self.id = id
  ...         self.f1 = f1
  ...         self.f2 = f2
  ...         self.f3 = f3
  ...         self.f4 = f4
  ...         self.t1 = t1
  ...         self.t2 = t2
  ...     def __cmp__(self, other):
  ...         return cmp(self.id, other.id)

The id attribute is just so we can identify objects we find again
easily. By including the __cmp__ method we make sure search results
can be stably sorted.

We use a fake int id utility here so we can test independent of
the full-blown zope environment::

  >>> from zope import interface
  >>> import zope.app.intid.interfaces
  >>> from zope.app.testing import ztapi
  >>> class DummyIntId(object):
  ...     interface.implements(zope.app.intid.interfaces.IIntIds)
  ...     MARKER = '__dummy_int_id__'
  ...     def __init__(self):
  ...         self.counter = 0
  ...         self.data = {}
  ...     def register(self, obj):
  ...         intid = getattr(obj, self.MARKER, None)
  ...         if intid is None:
  ...             setattr(obj, self.MARKER, self.counter)
  ...             self.data[self.counter] = obj
  ...             intid = self.counter
  ...             self.counter += 1
  ...         return intid
  ...     def getObject(self, intid):
  ...         return self.data[intid]
  ...     def __iter__(self):
  ...         return iter(self.data)
  >>> intid = DummyIntId()
  >>> ztapi.provideUtility(
  ...     zope.app.intid.interfaces.IIntIds, intid)

Now let's register a catalog::

  >>> from zope.app.catalog.interfaces import ICatalog
  >>> from zope.app.catalog.catalog import Catalog
  >>> catalog = Catalog()
  >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1')

And set it up with various indexes::

  >>> from zope.app.catalog.field import FieldIndex
  >>> from zope.app.catalog.text import TextIndex
  >>> catalog['f1'] = FieldIndex('f1', IContent)
  >>> catalog['f2'] = FieldIndex('f2', IContent)
  >>> catalog['f3'] = FieldIndex('f3', IContent)
  >>> catalog['f4'] = FieldIndex('f4', IContent)
  >>> catalog['t1'] = TextIndex('t1', IContent)
  >>> catalog['t2'] = TextIndex('t2', IContent)

Now let's create some objects so that they'll be cataloged::

  >>> content = [
  ... Content(1, 'a', 'b', 'd'),
  ... Content(2, 'a', 'c'),
  ... Content(3, 'X', 'c'),
  ... Content(4, 'a', 'b', 'e'),
  ... Content(5, 'X', 'b', 'e'),
  ... Content(6, 'Y', 'Z')]

And catalog them now::

  >>> for entry in content:
  ...     catalog.index_doc(intid.register(entry), entry)

Now let's register a query utility::

  >>> from hurry.query.query import Query
  >>> from hurry.query.interfaces import IQuery
  >>> ztapi.provideUtility(IQuery, Query())

Set up some code to make querying and display the result
easy::

  >>> from zope.app import zapi
  >>> from hurry.query.interfaces import IQuery
  >>> def displayQuery(q):
  ...     query = zapi.getUtility(IQuery)
  ...     r = query.searchResults(q)
  ...     return [e.id for e in sorted(list(r))]

FieldIndex Queries
------------------

Now for a query where f1 equals a::

  >>> from hurry.query import Eq
  >>> f1 = ('catalog1', 'f1')
  >>> displayQuery(Eq(f1, 'a'))
  [1, 2, 4]

Not equals (this is more efficient than the generic ~ operator)::

  >>> from hurry.query import NotEq
  >>> displayQuery(NotEq(f1, 'a'))
  [3, 5, 6]

Testing whether a field is in a set::

  >>> from hurry.query import In
  >>> displayQuery(In(f1, ['a', 'X']))
  [1, 2, 3, 4, 5]

Whether documents are in a specified range::

  >>> from hurry.query import Between
  >>> displayQuery(Between(f1, 'X', 'Y'))
  [3, 5, 6]

You can leave out one end of the range::

  >>> displayQuery(Between(f1, 'X', None)) # 'X' < 'a'
  [1, 2, 3, 4, 5, 6]
  >>> displayQuery(Between(f1, None, 'X'))
  [3, 5]

You can also use greater-equals and lesser-equals for the same purpose::

  >>> from hurry.query import Ge, Le
  >>> displayQuery(Ge(f1, 'X'))
  [1, 2, 3, 4, 5, 6]
  >>> displayQuery(Le(f1, 'X'))
  [3, 5]

It's also possible to use not with the ~ operator::

  >>> displayQuery(~Eq(f1, 'a'))
  [3, 5, 6]

Using and (&)::

  >>> f2 = ('catalog1', 'f2')
  >>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b'))
  [1, 4]

Using or (|)::

  >>> displayQuery(Eq(f1, 'a') | Eq(f2, 'b'))
  [1, 2, 4, 5]

These can be chained::

  >>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b') & Between(f1, 'a', 'b'))
  [1, 4]
  >>> displayQuery(Eq(f1, 'a') | Eq(f1, 'X') | Eq(f2, 'b'))
  [1, 2, 3, 4, 5]

And nested::

  >>> displayQuery((Eq(f1, 'a') | Eq(f1, 'X')) & (Eq(f2, 'b') | Eq(f2, 'c')))
  [1, 2, 3, 4, 5]

"and" and "or" can also be spelled differently::

  >>> from hurry.query import And, Or
  >>> displayQuery(And(Eq(f1, 'a'), Eq(f2, 'b')))
  [1, 4]
  >>> displayQuery(Or(Eq(f1, 'a'), Eq(f2, 'b')))
  [1, 2, 4, 5]

Combination of In and &
-----------------------

A combination of 'In' and '&'::

  >>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']))
  [1, 2, 3, 4, 5, 6]
  >>> displayQuery(In(f1, ['Z']))
  []
  >>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']) & In(f1, ['Z']))
  []


SetIndex queries
----------------

The SetIndex is defined in zc.catalog. Let's make a catalog which uses
it::

  >>> intid = DummyIntId()
  >>> ztapi.provideUtility(
  ...     zope.app.intid.interfaces.IIntIds, intid)
  >>> from zope.app.catalog.interfaces import ICatalog
  >>> from zope.app.catalog.catalog import Catalog
  >>> catalog = Catalog()
  >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1')
  >>> from zc.catalog.catalogindex import SetIndex
  >>> catalog['f1'] = SetIndex('f1', IContent)
  >>> catalog['f2'] = FieldIndex('f2', IContent)

First let's set up some new data::

  >>> content = [
  ... Content(1, ['a', 'b', 'c'], 1),
  ... Content(2, ['a'], 1),
  ... Content(3, ['b'], 1),
  ... Content(4, ['c', 'd'], 2),
  ... Content(5, ['b', 'c'], 2),
  ... Content(6, ['a', 'c'], 2)]

And catalog them now::

  >>> for entry in content:
  ...     catalog.index_doc(intid.register(entry), entry)

Now do a a 'any of' query, which returns all documents that
contain any of the values listed::

  >>> from hurry.query.set import AnyOf
  >>> displayQuery(AnyOf(f1, ['a', 'c']))
  [1, 2, 4, 5, 6]
  >>> displayQuery(AnyOf(f1, ['c', 'b']))
  [1, 3, 4, 5, 6]
  >>> displayQuery(AnyOf(f1, ['a']))
  [1, 2, 6]

Do a 'all of' query, which returns all documents that
contain all of the values listed::

  >>> from hurry.query.set import AllOf
  >>> displayQuery(AllOf(f1, ['a']))
  [1, 2, 6]
  >>> displayQuery(AllOf(f1, ['a', 'b']))
  [1]
  >>> displayQuery(AllOf(f1, ['a', 'c']))
  [1, 6]

We can combine this with other queries::

  >>> displayQuery(AnyOf(f1, ['a']) & Eq(f2, 1))
  [1, 2]


ValueIndex queries
------------------

The ``ValueIndex`` is defined in ``zc.catalog`` and provides a generalization
of the standard field index.

  >>> from hurry.query import value

Let's set up a catalog that uses this index. The ``ValueIndex`` is defined in
``zc.catalog``. Let's make a catalog which uses it:

  >>> intid = DummyIntId()
  >>> ztapi.provideUtility(zope.app.intid.interfaces.IIntIds, intid)

  >>> from zope.app.catalog.interfaces import ICatalog
  >>> from zope.app.catalog.catalog import Catalog
  >>> catalog = Catalog()
  >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1')

  >>> from zc.catalog.catalogindex import ValueIndex
  >>> catalog['f1'] = ValueIndex('f1', IContent)

Next we set up some content data to fill the indices:

  >>> content = [
  ... Content(1, 'a'),
  ... Content(2, 'b'),
  ... Content(3, 'c'),
  ... Content(4, 'd'),
  ... Content(5, 'c'),
  ... Content(6, 'a')]

And catalog them now:

  >>> for entry in content:
  ...     catalog.index_doc(intid.register(entry), entry)


Let's now query for all objects where ``f1`` equals 'a':

  >>> f1 = ('catalog1', 'f1')
  >>> displayQuery(value.Eq(f1, 'a'))
  [1, 6]

Next, let's find all objects where ``f1`` does not equal 'a'; this is more
efficient than the generic ``~`` operator:

  >>> displayQuery(value.NotEq(f1, 'a'))
  [2, 3, 4, 5]

You can also query for all objects where the value of ``f1`` is in a set of
values:

  >>> displayQuery(value.In(f1, ['a', 'd']))
  [1, 4, 6]

The next interesting set of queries allows you to make evaluations of the
values. For example, you can ask for all objects between a certain set of
values:

  >>> displayQuery(value.Between(f1, 'a', 'c'))
  [1, 2, 3, 5, 6]

  >>> displayQuery(value.Between(f1, 'a', 'c', exclude_min=True))
  [2, 3, 5]

  >>> displayQuery(value.Between(f1, 'a', 'c', exclude_max=True))
  [1, 2, 6]

  >>> displayQuery(value.Between(f1, 'a', 'c',
  ...                            exclude_min=True, exclude_max=True))
  [2]

You can also leave out one end of the range:

  >>> displayQuery(value.Between(f1, 'c', None))
  [3, 4, 5]
  >>> displayQuery(value.Between(f1, None, 'c'))
  [1, 2, 3, 5, 6]

You can also use greater-equals and lesser-equals for the same purpose:

  >>> displayQuery(value.Ge(f1, 'c'))
  [3, 4, 5]
  >>> displayQuery(value.Le(f1, 'c'))
  [1, 2, 3, 5, 6]

Of course, you can chain those queries with the others as demonstrated before.

The ``value`` module also supports ``zc.catalog`` extents. The first query is
``ExtentAny``, which returns all douments matching the extent. If the the
extent is ``None``, all document ids are returned:

  >>> displayQuery(value.ExtentAny(f1, None))
  [1, 2, 3, 4, 5, 6]

If we now create an extent that is only in the scope of the first four
documents,

  >>> from zc.catalog.extentcatalog import FilterExtent
  >>> extent = FilterExtent(lambda extent, uid, obj: True)
  >>> for i in range(4):
  ...     extent.add(i, i)

then only the first four are returned:

  >>> displayQuery(value.ExtentAny(f1, extent))
  [1, 2, 3, 4]

The opposite query is the ``ExtentNone`` query, which returns all ids in the
extent that are *not* in the index:

  >>> id = intid.register(Content(7, 'b'))
  >>> id = intid.register(Content(8, 'c'))
  >>> id = intid.register(Content(9, 'a'))

  >>> extent = FilterExtent(lambda extent, uid, obj: True)
  >>> for i in range(9):
  ...     extent.add(i, i)

  >>> displayQuery(value.ExtentNone(f1, extent))
  [7, 8, 9]
