cr.cube package

Submodules

cr.cube.dimension module

Contains implementation of the Dimension class, for Crunch Cubes.

class cr.cube.dimension.Dimension(dim, selections=None)[source]

Bases: object

Implementation of the Dimension class for Crunch Cubes.

This class contains all the utility functions for working with Crunch Cube dimensions. It also hides some of the internal implementation detail from the user, especially for Multiple response variables.

elements

Get elements of the crunch Dimension.

For categorical variables, the elements are represented by categories internally. For other variable types, actual ‘elements’ of the Crunch Cube JSON response are returned.

labels(include_missing=False)[source]

Get labels of the Crunch Dimension.

type

Get type of the Crunch Dimension.

valid_indices(include_missing)[source]

Gets valid indices of Crunch Cube Dimension’s elements.

This function needs to be used by CrunchCube class, in order to correctly calculate the indices of the result that needs to be returned to the user. In most cases, the non-valid indices are those of the missing values.

cr.cube.crunch_cube module

Home of the CrunchCube class.

This module contains the definition of the CrunchCube class. It represents the open-source library used for manipulating the crunch cubes (JSON responses from the Crunch.io platform).

class cr.cube.crunch_cube.CrunchCube(response)[source]

Bases: object

Implementation of the CrunchCube API class.

Class is used for the implementation of the main API functions that are needed for seamless integration with the crunch cube responses (from Crunch.io platform).

Main API functions are:
  • as_array
  • margin
  • proportions
  • percentages

These functions are used to retrieve statistical information of interest, from the JSON like crunch cubes. Complete usage of each API function is described within the appropriate docstring.

Crunch Cubes contain richer metadata than standart Python objects, and they also conceal certain complexity in the data structures from the user. In particular, Multiple Response variables are generally represented as single dimensions in result tables, but in the actual data, they may comprise of two dimensions. These methods (API) understand the subtleties in the Crunch data types, and correctly compute margins and percentages off of them.

as_array(include_missing=False, weighted=True)[source]

Get crunch cube as ndarray.

Returns the tabular representation of the crunch cube. The returning value has as many dimensions, as there are dimensions in the crunch cube itself. E.g. for a cross-tab representation of a categorical and numerical variable, the resulting cube will have two dimensions.

Args
include_missing (bool): Include rows/cols for missing values
Returns
(ndarray): Tabular representation of the crunch cube
Example 1 (Categorical x Categorical):
>>> cube = CrunchCube(response)
>>> cube.as_array()
np.array([
    [5, 2],
    [5, 3],
])
Example 2 (Categorical x Categorical, include missing values):
>>> cube = CrunchCube(response)
>>> cube.as_array(include_missing=True)
np.array([
    [5, 3, 2, 0],
    [5, 2, 3, 0],
    [0, 0, 0, 0],
])
dimensions

Dimensions of the crunch cube.

labels(include_missing=False)[source]

Gets labels for each cube’s dimension.

Args
include_missing (bool): Include labels for missing values
Returns
labels (list of lists): Labels for each dimension
margin(axis=None, weighted=True)[source]

Get margin for the selected axis.

the selected axis. For MR variables, this is the sum of the selected and non-selected slices.

Args
axis (int): Axis across the margin is calculated. If no axis is
provided the margin is calculated across all axis. For Categoricals, Num, Datetime, and Text, this translates to sumation of all elements.
Returns
Calculated margin for the selected axis
Example 1:
>>> cube = CrunchCube(fixt_cat_x_cat)
np.array([
   [5, 2],
   [5, 3],
])
>>> cube.margin(axis=0)
np.array([10, 5])
Example 2:
>>> cube = CrunchCube(fixt_cat_x_num_x_datetime)
np.array([
    [[1, 1],
     [0, 0],
     [0, 0],
     [0, 0]],
    [[2, 1],
     [1, 1],
     [0, 0],
     [0, 0]],
    [[0, 0],
     [2, 3],
     [0, 0],
     [0, 0]],
    [[0, 0],
     [0, 0],
     [3, 2],
     [0, 0]],
    [[0, 0],
     [0, 0],
     [1, 1],
     [0, 1]]
])
>>> cube.margin(axis=0)
np.array([
    [3, 2],
    [3, 4],
    [4, 3],
    [0, 1],
])
percentages(axis=None)[source]

Get the percentages for crunch cube values.

This function calculates the percentages for crunch cube values. The percentages are based on the values of the ‘proportions’.

Args
axis (int): Base axis of percentages calculation. If no axis is
provided, calculations are done accros entire table.
Returns
(nparray): Calculated array of crunch cube percentages.
Example 1:
>>> cube = CrunchCube(fixt_cat_x_cat)
np.array([
   [5, 2],
   [5, 3],
])
>>> cube.percentages()
np.array([
    [33.33333, 13.33333],
    [33.33333, 20.00000],
])
Example 2:
>>> cube = CrunchCube(fixt_cat_x_cat)
np.array([
   [5, 2],
   [5, 3],
])
>>> cube.percentages(axis=0)
np.array([
    [50., 40.],
    [50., 60.],
])
proportions(axis=None)[source]

Get proportions of a crunch cube.

This function calculates the proportions across the selected axis of a crunch cube. For most variable types, it means the value divided by the margin value. For Multiple Response types, the value is divied by the sum of selected and non-selected slices.

Args
axis (int): Base axis of proportions calculation. If no axis is
provided, calculations are done accros entire table.
Returns
(nparray): Calculated array of crunch cube proportions.
Example 1:
>>> cube = CrunchCube(fixt_cat_x_cat)
np.array([
   [5, 2],
   [5, 3],
])
>>> cube.proportions()
np.array([
    [0.3333333, 0.1333333],
    [0.3333333, 0.2000000],
])
Example 2:
>>> cube = CrunchCube(fixt_cat_x_cat)
np.array([
   [5, 2],
   [5, 3],
])
>>> cube.proportions(axis=0)
np.array([
    [0.5, 0.4],
    [0.5, 0.6],
])

Module contents