SQL Support in TinyOlap¶
Currently TinyOlap only comes with very little support for SQL queries. But although far from complete, the available SQL support is quite useful for many purposes. Let’s take a look at some SQL queries using the tiny sample database.
The tiny sample database (‘…tinyolap/samples/tiny.py’) contains a Sales cube which consists of the following 5 dimensions, bases members and aggregated members:
years: 2021, 2022, 2023, All years
months: Jan, Feb, …, Dec, Q1, …, Q4, Year. In addition, the months dimension has a member subset called summer that contains the members Jun, Jul, Aug and Sep.
regions: Total, North, South, West, East. The regions dimension also has a member attribute called manager that contains the following attribute values:
Total = “Peter Parker”
North = “Ingmar Ice”
South =”Carlo Carulli”
West:=Heinz Erhardt”
East=”Pyotr Tchaikovsky”
products: Total, cars, coupe, sedan, sports, van, trucks, motorcycles, best sellers)
measures: Sales, Cost, Profit, Profit in %. Two of these measures are defined through rules:
Profit := Sales - Cost
Profit in % := Profit / Sales
Simple SQL queries¶
The common structure for all SQL queries in TinyOlap is as follows:
SELECT [fields to be returned] FROM [cube] WHERE [filters to be applied]
To return the value for a certain cell in the cube Sales, you need to define a member for each of the cube dimensions:
SELECT * FROM Sales WHERE '2021', 'Jan', 'North', 'motorcycles', 'Sales'
Although the order of the members in the above sample query is not important, to keep the order helps to maintain the consistency over multiple queries. The query engine automatically tries to resolve the members against all dimensions of the cube. You can also explicitly define to dimension for each individual dimension.
SELECT * FROM Sales WHERE
years:'2021', months:'Jan', 'North',
products:'motorcycles', measures:'Sales'
Although is some more code to write, it greatly supports the readability of your SQL statements and also overcomes certain issue, e.g., as a member named Total exists in two dimensions (regions and products) the following will fail:
SELECT * FROM Sales WHERE
'2021', 'Jan', 'Total', 'Total', 'Sales'
But this query would work just fine:
SELECT * FROM Sales WHERE
'2021', 'Jan', regions:'Total', products:'Total', 'Sales'
The query resultset¶
All of the above queries use the wildcard character * for the SELECT statement. This will cause the query engine to return the member name for all dimension of the cube and the value of the cell for all returned records as a Python list (rows) of lists (column values), e.g., like this:
2021 |
Jan |
North |
motorcycles |
Sales |
525.34 |
query = Query(db)
query.execute("SELECT * FROM Sales WHERE '2021', 'Jan', 'North', 'motorcycles', 'Sales'")
data = query.records
So, the data
variable will actually contain something like this.
data = [['2021', 'Jan', 'North', 'motorcycles', 'Sales', 123.45]]
As you can see, the names of the name of columns are not contained in the returned record
array.
This behaviour can be changed through the property query.include_column_names = True
.
query = Query(db, include_column_names=True)
query.include_column_names = True # alternative approach to include column names
query.execute("SELECT * FROM Sales WHERE '2021', 'Jan', 'North', 'motorcycles', 'Sales'")
data = query.records
years |
months |
regions |
products |
measures |
value |
---|---|---|---|---|---|
2021 |
Jan |
North |
motorcycles |
Sales |
525.34 |
Defining the fields to be returned¶
As alternative to the wildcard character * for the SELECT statement, you can also specify the columns that should be returned. As of to today only the member name for specific dimensions and/or (if defined) the attribute value of a member from a specific dimension can be returned. e.g., the following SQL statements requests the member names for the dimensions months and regions, but also the manager attribute from the regions dimension.
SELECT months, regions, regions.manager FROM Sales WHERE
'2021', 'Jan', 'East', 'vans', 'Sales'
The resulting records would look something like this:
months |
regions |
regions.manager |
value |
---|---|---|---|
Jan |
East |
Pyotr Tchaikovsky |
525.34 |
Although not explicitly requested, the query engine automatically adds a value column to the resultset containing the current cell value. But you can also specific value as an explicit column name. The following statement shows an example.
SELECT months, value, regions, regions.manager FROM Sales WHERE
'2021', 'Jan', 'East', 'vans', 'Sales'
The resulting records would look something like this:
months |
value |
regions |
regions.manager |
---|---|---|---|
Jan |
525.34 |
East |
Pyotr Tchaikovsky |
Underdefined WHERE statements¶
Although recommended, is not necessary to define a member for all dimensions of the cube in the WHERE statement. For every dimension that is not defined, the query engine automatically determines the topmost members for each dimension and adds these to the query. So the following statements are valid and return
More advanced SQL queries¶
Usage: Open and/or start the script …tinyolap/samples/tiny.py
- class tinyolap.query.Query(db: database.Database, sql: Optional[str] = None, include_column_names: bool = False)
Basic implementation of a SQL query interface for TinyOlap cubes.
- execute(sql: Optional[str] = None) tinyolap.query.Query
Executes an SQL statement against the defined database. :param sql: The SQL statement to execute. :return: True, if the execution was successful, False otherwise. :raises NotImplementedError: Raised if a certain SQL language capability is not supported. :raises KeyError: If a certain name (e.g. cube, dimension, member, subset) does not exists in the database. :raises SyntaxError: If a SQL statement contains a syntax error or is incomplete.
- property records
The records returned by the query. Records contain the address and the current value of the queried cells. :return: An array of arrays, the outer array represents the rows