223 lines
6.7 KiB
ReStructuredText
223 lines
6.7 KiB
ReStructuredText
Bi-Joe
|
|
~~~~~~
|
|
|
|
BI Joe is a library and a Django application to simplify querying a postgresql
|
|
database containing a star schema in order to easily produce BI dashboards
|
|
Schema must be described in a schema file.
|
|
|
|
It's inspired by the Cubes project.
|
|
|
|
Feature
|
|
~~~~~~~
|
|
|
|
* use PostgreSQL database as datastore,
|
|
* declare joins to define star schemas of your cubes
|
|
* declare dimensions as SQL query or exepression defining, label, group by,
|
|
ordering or member list.
|
|
* declare measure as SQL expression using aggregate functions
|
|
|
|
Missing features
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
* hierarchical dimensions
|
|
* measure necessiting a rollup (percentage based on count() of facts at an upper
|
|
level)
|
|
|
|
Schema
|
|
~~~~~~
|
|
|
|
You declare your schema using JSON files, those JSON files are targeted by a list
|
|
of glob patterns in the Djagno setting BIJOE_SCHEMAS.
|
|
|
|
.. code:: python
|
|
|
|
BIJOE_SCHEMAS = ['/var/lib/bijoe/cubes/*.model']
|
|
|
|
The JSON model files must conform to this schema:
|
|
|
|
* name: technical identifier of the mode, a short string without space if
|
|
preferred
|
|
|
|
* label: any string describing the model
|
|
|
|
* pg_dsn: string describing the connection to PostgreSQL, as expected by
|
|
psycopg2, ex.: `"dbname=olap_db user=admin password=xxx"`
|
|
|
|
* search_path: search path to set set if relations are not all in the public
|
|
schema,
|
|
|
|
* cubes: the list of cube descriptors,
|
|
|
|
* name: technical identifier of the cube, same remark thant for models,
|
|
|
|
* label: as for model,
|
|
|
|
* fact_table: name of the table storing facts
|
|
|
|
* key: column of the table identifying individual facts
|
|
|
|
* joins: list of equality joins, all joins are RIGHT OUTER JOINS, table are
|
|
cross joined when drill involve dimensions using multiple joins.
|
|
|
|
* name: SQL identifier for naming the join,
|
|
|
|
* table: name of the relation being joined,
|
|
|
|
* master: table and column indicating the left part of the equality
|
|
condition for the join, you can use `mydim_id` to reference the fact
|
|
table or `otherjoin.mydim_id` to reference another join.
|
|
|
|
* detail: name of the column on the joined table for the equality condition,
|
|
|
|
* kind: type of join, must be `inner`, `left` or `right`, default is right.
|
|
|
|
* dimensions: list of dimension descriptors,
|
|
|
|
* name: technical identifier of the dimension, it will be used to name the
|
|
dimension in the API,
|
|
|
|
* label: human description for the dimension, used in UIs,
|
|
|
|
* join: list of join names, indicate that some joins must be used when using
|
|
this dimension,
|
|
|
|
* type: indicate the type of the dimension: integer, string, bool, date
|
|
|
|
* value: SQL expression giving the value for the dimension,
|
|
it can be different than the value used for filtering or grouping,
|
|
* sql_filter: SQL expression that will be used in the SQL template
|
|
`<sql_filter> IN (...)` when filtering along a dimension,
|
|
|
|
* value_label: SQL expression giving the shown value
|
|
|
|
* group_by: SQL expression to group facts along a dimension, default is to
|
|
use value
|
|
|
|
* order_by: SQL expression to order dimension values, default is to use
|
|
value
|
|
|
|
* measures: list of measure descriptors
|
|
|
|
* name: as for models,
|
|
|
|
* label: as for models,
|
|
|
|
* type: type of the measure: integer or duration,
|
|
|
|
* expression: SQL expression indicating how to compute the aggregate,
|
|
ex.: `avg(delay)`, `count(product_type)`.
|
|
|
|
Example
|
|
+++++++
|
|
|
|
.. code:: json
|
|
|
|
{
|
|
"name" : "cam",
|
|
"label" : "cam",
|
|
"pg_dsn" : "dbname=wcs-olap",
|
|
"search_path" : [
|
|
"cam",
|
|
"public"
|
|
],
|
|
"cubes" : [
|
|
{
|
|
"name" : "all_formdata",
|
|
"label" : "Tous les formulaires",
|
|
"fact_table" : "formdata",
|
|
"key" : "id",
|
|
"joins" : [
|
|
{
|
|
"name" : "formdef",
|
|
"master" : "formdef_id",
|
|
"detail" : "id",
|
|
"table" : "formdef"
|
|
},
|
|
{
|
|
"name" : "dates",
|
|
"master" : "receipt_time",
|
|
"detail" : "date",
|
|
"table" : "dates"
|
|
}
|
|
],
|
|
"dimensions" : [
|
|
{
|
|
"name" : "formdef",
|
|
"label" : "formulaire",
|
|
"type" : "integer",
|
|
"join" : ["formdef"],
|
|
"value" : "formdef.id",
|
|
"value_label" : "formdef.label"
|
|
},
|
|
{
|
|
"join": [
|
|
"receipt_time"
|
|
],
|
|
"label": "date de la demande",
|
|
"name": "receipt_time",
|
|
"type": "date",
|
|
"value": "receipt_time.date"
|
|
},
|
|
],
|
|
"measures" : [
|
|
{
|
|
"name": "count",
|
|
"label": "nombre de demandes",
|
|
"type": "integer",
|
|
"expression": "count({fact_table}.id)",
|
|
},
|
|
{
|
|
"name" : "avg_endpoint_delay",
|
|
"label" : "Delai de traitement",
|
|
"type" : "duration",
|
|
"expression" : "avg(endpoint_delay)"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
|
|
API
|
|
~~~
|
|
|
|
Model description is handled by `bijoe.schema` and model querying by
|
|
`bijoe.engine`.
|
|
|
|
bijoe.schema.Warehouse
|
|
++++++++++++++++++++++
|
|
|
|
`Warehouse` is the main class to manipulate models, it has two class methods:
|
|
|
|
* `from_json(d)` which transform a dictionnary obtained by eventually parsing
|
|
some JSON file to a Warehouse object.
|
|
|
|
* `to_json()` which transform a Warehouse object into a JSON compatible
|
|
dictionnary.
|
|
|
|
bijoe.engine.Engine
|
|
+++++++++++++++++++
|
|
|
|
`Engine(warehouse)` is the entry-point for querying a model, you get an
|
|
`EngineCube` object by indexing the engine with the name of a cube.
|
|
|
|
.. code:: python
|
|
|
|
cube = Engine(warehouse)['mycube']
|
|
|
|
|
|
You can query your cube using the `query` method.
|
|
|
|
.. code:: python
|
|
|
|
cube.query(filters=[('year', [2013, 2014, 2015])],
|
|
drilldown=['year', 'product'],
|
|
measures=['count', 'avg_sale'])
|
|
|
|
It returns a sequence of rows whose elements are the values of the drilldown
|
|
dimensions in the same order as in the query followed by the values of the
|
|
measures also in the same ordre as the query.
|
|
|
|
The `count` measure is a special measure which is always present whose
|
|
expression is always `count({fact_table}.{key})` where `fact_table` and `key`
|
|
are the corresponding attributes of the cube.
|