summaryrefslogtreecommitdiffstats
path: root/README.rst
blob: c9c23816cce2e9d9e3f678bb4ac8165005368eda (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
BI for Publik
=============

w.c.s. OLAP
~~~~~~~~~~~

Tool to export w.c.s. data in a database with star schema for making an OLAP
cube.

::

        usage: wcs-olap [--no-feed] [-a | --url URL] [-h] [--orig ORIG] [--key KEY]
                        [--pg-dsn PG_DSN] [--schema SCHEMA]
                        [config_path]

        Export W.C.S. data as a star schema in a postgresql DB

        positional arguments:
          config_path

        optional arguments:
          --no-feed        only produce the model
          -a, --all        synchronize all wcs
          --url URL        url of the w.c.s. instance
          -h, --help       show this help message and exit
          --orig ORIG      origin of the request for signatures
          --key KEY        HMAC key for signatures
          --pg-dsn PG_DSN  Psycopg2 DB DSN
          --schema SCHEMA  schema name

Bi-Joe
~~~~~~

BI Joe is a library and a Django application to simplify querying a postgresql
database containing a star schema with BI annotations in order to
easily produce BI dashboards

It's inspired by the Cubes project.

Feature
~~~~~~~

* use PostgreSQL database as datastore,
* declare joins to define star schemas of your cubes
* declare dimensions as SQL query or exepression defining, label, group by,
  ordering or member list.
* declare measure as SQL expression using aggregate functions

Missing features
~~~~~~~~~~~~~~~~

* hierarchical dimensions
* measure necessiting a rollup (percentage based on count() of facts at an upper
  level)

Model
~~~~~

You declare your model using JSON files, those JSON files are targeted by a list
of glob patterns in the Djagno setting BIJOE_SCHEMAS.

.. code:: python

        BIJOE_SCHEMAS = ['/var/lib/bijoe/cubes/*.model']

The JSON model files must conform to this schema:

* name: technical identifier of the mode, a short string without space if
  preferred

* label: any string describing the model

* pg_dsn: string describing the connection to PostgreSQL, as expected by
  psycopg2, ex.: `"dbname=olap_db user=admin password=xxx"`

* search_path: search path to set set if relations are not all in the public
  schema,

* cubes: the list of cube descriptors,

  * name: technical identifier of the cube, same remark thant for models,

  * label: as for model,

  * fact_table: name of the table storing facts

  * key: column of the table identifying individual facts

  * joins: list of equality joins, all joins are RIGHT OUTER JOINS, table are
    cross joined when drill involve dimensions using multiple joins.

    * name: SQL identifier for naming the join,

    * table: name of the relation being joined,

    * master: table and column indicating the left part of the equality
      condition for the join, you can use `mydim_id` to reference the fact
      table or `otherjoin.mydim_id` to reference another join.

    * detail: name of the column on the joined table for the equality condition,

    * kind: type of join, must be `inner`, `left` or `right`, default is right. 

  * dimensions: list of dimension descriptors,

    * name: technical identifier of the dimension, it will be used to name the
      dimension in the API,

    * label: human description for the dimension, used in UIs,

    * join: list of join names, indicate that some joins must be used when using
      this dimension,

    * type: indicate the type of the dimension, numerical, time-like,
      geographical, duration, etc..

    * value: SQL expression giving the value for the dimension,
      it can be different than the value used for filtering or grouping,
      * sql_filter: SQL expression that will be used in the SQL template
      `<sql_filter> IN (...)` when filtering along a dimension,

    * value_label: SQL expression giving the shown value

    * group_by: SQL expression to group facts along a dimension, default is to
      use value

    * order_by: SQL expression to order dimension values, default is to use
      value

  * measures: list of measure descriptors

    * name: as for models,

    * label: as for models,

    * type: type of the measure: integer or duration,

    * expression: SQL expression indicating how to compute the aggregate,
      ex.: `avg(delay)`, `count(product_type)`.

Example
+++++++

.. code:: json

        {
           "name" : "cam",
           "label" : "cam",
           "pg_dsn" : "dbname=wcs-olap",
           "search_path" : [
              "cam",
              "public"
           ],
           "cubes" : [
              {
                 "name" : "all_formdata",
                 "label" : "Tous les formulaires",
                 "fact_table" : "formdata",
                 "key" : "id",
                 "joins" : [
                    {
                       "name" : "formdef",
                       "master" : "formdef_id",
                       "detail" : "id",
                       "table" : "formdef"
                    },
                    {
                       "name" : "dates",
                       "master" : "receipt_time",
                       "detail" : "date",
                       "table" : "dates"
                    }
                 ],
                 "dimensions" : [
                    {
                       "name" : "formdef",
                       "label" : "formulaire",
                       "type" : "integer",
                       "join" : ["formdef"],
                       "value" : "formdef.id",
                       "value_label" : "formdef.label"
                    },
                    {
                       "join": [
                         "receipt_time"
                       ], 
                       "label": "date de la demande", 
                       "name": "receipt_time", 
                       "type": "date", 
                       "value": "receipt_time.date"
                    }, 
                 ],
                 "measures" : [
                    {
                      "name": "count", 
                      "label": "nombre de demandes", 
                      "type": "integer",
                      "expression": "count({fact_table}.id)",
                    }, 
                    {
                       "name" : "avg_endpoint_delay",
                       "label" : "Délai de traitement",
                       "type" : "duration",
                       "expression" : "avg(endpoint_delay)"
                    }
                 ]
              }
           ]
        }

API
~~~

Model description is handled by `bijoe.schema` and model querying by
`bijoe.engine`.

bijoe.schema.Warehouse
++++++++++++++++++++++

`Warehouse` is the main class to manipulate models, it has two class methods:

* `from_json(d)` which transform a dictionnary obtained by eventually parsing
  some JSON file to a Warehouse object.

* `to_json()` which transform a Warehouse object into a JSON compatible
  dictionnary.

bijoe.engine.Engine
+++++++++++++++++++

`Engine(warehouse)` is the entry-point for querying a model, you get an
`EngineCube` object by indexing the engine with the name of a cube.

.. code:: python

        cube = Engine(warehouse)['mycube']


You can query your cube using the `query` method.

.. code:: python

        cube.query(filters=[('year', [2013, 2014, 2015])],
                   drilldown=['year', 'product'],
                   measures=['count', 'avg_sale'])

It returns a sequence of rows whose elements are the values of the drilldown
dimensions in the same order as in the query followed by the values of the
measures also in the same ordre as the query.

The `count` measure is a special measure which is always present whose
expression is always `count({fact_table}.{key})` where `fact_table` and `key`
are the corresponding attributes of the cube.