Bi-Joe ~~~~~~ BI Joe is a library and a Django application to simplify querying a postgresql database containing a star schema in order to easily produce BI dashboards Schema must be described in a schema file. It's inspired by the Cubes project. Feature ~~~~~~~ * use PostgreSQL database as datastore, * declare joins to define star schemas of your cubes * declare dimensions as SQL query or exepression defining, label, group by, ordering or member list. * declare measure as SQL expression using aggregate functions Missing features ~~~~~~~~~~~~~~~~ * hierarchical dimensions * measure necessiting a rollup (percentage based on count() of facts at an upper level) Schema ~~~~~~ You declare your schema using JSON files, those JSON files are targeted by a list of glob patterns in the Djagno setting BIJOE_SCHEMAS. .. code:: python BIJOE_SCHEMAS = ['/var/lib/bijoe/cubes/*.model'] The JSON model files must conform to this schema: * name: technical identifier of the mode, a short string without space if preferred * label: any string describing the model * pg_dsn: string describing the connection to PostgreSQL, as expected by psycopg2, ex.: `"dbname=olap_db user=admin password=xxx"` * search_path: search path to set set if relations are not all in the public schema, * cubes: the list of cube descriptors, * name: technical identifier of the cube, same remark thant for models, * label: as for model, * fact_table: name of the table storing facts * key: column of the table identifying individual facts * joins: list of equality joins, all joins are RIGHT OUTER JOINS, table are cross joined when drill involve dimensions using multiple joins. * name: SQL identifier for naming the join, * table: name of the relation being joined, * master: table and column indicating the left part of the equality condition for the join, you can use `mydim_id` to reference the fact table or `otherjoin.mydim_id` to reference another join. * detail: name of the column on the joined table for the equality condition, * kind: type of join, must be `inner`, `left` or `right`, default is right. * dimensions: list of dimension descriptors, * name: technical identifier of the dimension, it will be used to name the dimension in the API, * label: human description for the dimension, used in UIs, * join: list of join names, indicate that some joins must be used when using this dimension, * type: indicate the type of the dimension: integer, string, bool, date * value: SQL expression giving the value for the dimension, it can be different than the value used for filtering or grouping, * sql_filter: SQL expression that will be used in the SQL template ` IN (...)` when filtering along a dimension, * value_label: SQL expression giving the shown value * group_by: SQL expression to group facts along a dimension, default is to use value * order_by: SQL expression to order dimension values, default is to use value * measures: list of measure descriptors * name: as for models, * label: as for models, * type: type of the measure: integer or duration, * expression: SQL expression indicating how to compute the aggregate, ex.: `avg(delay)`, `count(product_type)`. Example +++++++ .. code:: json { "name" : "cam", "label" : "cam", "pg_dsn" : "dbname=wcs-olap", "search_path" : [ "cam", "public" ], "cubes" : [ { "name" : "all_formdata", "label" : "Tous les formulaires", "fact_table" : "formdata", "key" : "id", "joins" : [ { "name" : "formdef", "master" : "formdef_id", "detail" : "id", "table" : "formdef" }, { "name" : "dates", "master" : "receipt_time", "detail" : "date", "table" : "dates" } ], "dimensions" : [ { "name" : "formdef", "label" : "formulaire", "type" : "integer", "join" : ["formdef"], "value" : "formdef.id", "value_label" : "formdef.label" }, { "join": [ "receipt_time" ], "label": "date de la demande", "name": "receipt_time", "type": "date", "value": "receipt_time.date" }, ], "measures" : [ { "name": "count", "label": "nombre de demandes", "type": "integer", "expression": "count({fact_table}.id)", }, { "name" : "avg_endpoint_delay", "label" : "Delai de traitement", "type" : "duration", "expression" : "avg(endpoint_delay)" } ] } ] } API ~~~ Model description is handled by `bijoe.schema` and model querying by `bijoe.engine`. bijoe.schema.Warehouse ++++++++++++++++++++++ `Warehouse` is the main class to manipulate models, it has two class methods: * `from_json(d)` which transform a dictionnary obtained by eventually parsing some JSON file to a Warehouse object. * `to_json()` which transform a Warehouse object into a JSON compatible dictionnary. bijoe.engine.Engine +++++++++++++++++++ `Engine(warehouse)` is the entry-point for querying a model, you get an `EngineCube` object by indexing the engine with the name of a cube. .. code:: python cube = Engine(warehouse)['mycube'] You can query your cube using the `query` method. .. code:: python cube.query(filters=[('year', [2013, 2014, 2015])], drilldown=['year', 'product'], measures=['count', 'avg_sale']) It returns a sequence of rows whose elements are the values of the drilldown dimensions in the same order as in the query followed by the values of the measures also in the same ordre as the query. The `count` measure is a special measure which is always present whose expression is always `count({fact_table}.{key})` where `fact_table` and `key` are the corresponding attributes of the cube. Code Style ========== black is used to format the code, using thoses parameters: black --target-version py37 --skip-string-normalization --line-length 110 There is .pre-commit-config.yaml to use pre-commit to automatically run black before commits. (execute `pre-commit install` to install the git hook.) isort is used to format the imports, using those parameter: isort --profile black --line-length 110 pyupgrade is used to automatically upgrade syntax, using those parameters: pyupgrade --keep-percent-format --py37-plus There is .pre-commit-config.yaml to use pre-commit to automatically run black, isort and pyupgrade before commits. (execute `pre-commit install` to install the git hook.)