Bad Modelling 1: Single flat file table e.g. Salesextract
When people first start out using Power BI as their Analytics platform, there is a tendency to say, lets import all the data in one big flat file, like an Excel worksheet.
This way of working is just not well organised and doesn’t give you a friendly analytics structure.
Avoid Wide Tables
Narrow tables are much better to work with in Power BI. As the data volumes grows it will affect performance and bloat your model and become inefficient. then, when you create measures, things will start getting even more overly complex in the one long and wide table.
Not to mention the point when you have to add another table and create joins. You may be faced with the many to many join because of your wide table.
STAR SCHEMA are the recommended approach to modelling in Power BI
Stars with a few Snowflaked dimensions are also ok.
If you have a flat file wide table its always important to convert to an above data model with narrow dimension tables and a fact table in the middle with all your measures.
Remember, Chaos is a flat file.
Model Relationships propagate filters to other tables.
In this example the ProductID propagates down to the sales table. 1 Product can be sold many times. (1 to many)
With a snowflake you can add another level
CategoryA Propagates down to the Sales Fact table
Deliver the right number of tables with the right relationships in place.
Power BI was designed for the people who never had to think about the design of data warehouses. originally, this self service tool would allow any one with little or no knowledge of best practice to import data from their own sources, excel spreadsheets, databases etc without any knowledge of how they were set up.
This becomes an issue when the recommended Power BI model is the fact and dimension schemas as above.
Understanding OLAP models go a long way to helping you set up Power BI
- Dimensions Filter and group
- Facts Summarise measures
Bad Modelling 2: Direct Query your Transactional Database
When you connect up to OLTP and drag in all your tables ( there may be hundreds of them) using Direct Query there are lots of things to consider.
the overall performance depends on the underlying data source
When you have lots of users opening shared reports, lots of visuals are refreshed and queries are sent to the underlying source. This means that the source MUST handle these query loads for all your users AND maintain reasonable performance for those using the OLTP as they enter data.
You are not the most important person in this scenario. The person(s) using the database to add data is the most important person
OLTP is designed for speedy data input. OLAP is designed for speedy retrieval of data for analytics. These are to very different things.
With OLTP, you have row-Store indexes (Clustered Index, Non-Clustered Index) and these are slow for data analysis. They are perfect for OLTP style workloads. Data Warehouse queries, consume a huge amount of data, this is another reason why using OLTP as your direct query data source isn’t the best approach.
Also your Direct Query means you loose a fair amount of DAX functionality time time based DAX calculations, What if Parameters, etc.
I was chatting to someone about this on the forums and they gave me a fantastic analogy
When you connect into a transactional database with Direct Query, its like being in a busy restaurant and getting all the customers to go and get their food from the kitchen.
It slows down the customers because of the layout of the kitchen. They don’t know where anything is, and other customers are also milling around trying to find where their starter is.
the Kitchen staff who are now trying to prepare the food are having to fight for physical space. Look at the pastry chef, trying to work around 10 customers asking where their various desserts are?
So you set up a reporting area. This is where the food gets placed, someone shouts service and a waiter will go and speedily deliver the course to the correct table.
No one needs to go into the kitchen unless they are in food prep. Everything works in the most efficient way.
Model relationships Dos
- Only 1 ID to One ID. If you have composite keys they need to be merged
- No recursive Relationships (relationships that go back to the same table. the example always used for this is the managerID in the employer table
- the Cardinality is 1 to many. 1 to 1. many to one. (Many to Many needs a specific approach in Power BI)
- Cardinality determines whether it has filter group behavior or summarise behavior
- There can only be one active path (relationship) Between two tables. All your other paths will be inactive (But you can set up DAX to use them)
In this example OrderDateKey is the active relationship because we use this the most and joins to DateKey
ShipdateKey and DueDateKey also join to DateKey in the date table and are inactive.
DAX Functions for Relationships to help with modelling decisions
When creating calculated columns you can only include fields from the same table. Unless you use RELATED
For example, I’m adding the column Colour into the SalesOrderDetail table which has a Many to One join to Products •Colour = RELATED(Products[Colour])
RELATED allows you to use data from the one side in the many side of the join
RELATEDTABLE Uses data from the Many side of the Join
TotalSales = SUMX(RELATEDTABLE(SalesOrderDetail),SalesOrderDetail[LineTotal])
Forces you to use a relationship instead of the active relationship
=CALCULATE(SUM(InternetSales[SalesAmount]), USERELATIONSHIP(InternetSales[DueDate], DateTime[Date]))
Modifies the filter direction Disables propagation. You can actually do this in the model by changing the filter to both directions instead of single. OR you can do it for a specific DAX query using CROSSFILTER
Create virtual relationships between tables
TREATAS(VALUES(Dates[Year]),’Unconnected Budged Data'[Year]))
Our Unconnected Budgeted Data is in Year only and its not joined to our main model.
Here we connect up to Year in Date. then we can create a visal with Date from the Date dimension. Total sales from our connected data which is at daily level and Total Budget from our unconnected budgeted data at a different level of granularity.
Naturalise a recursive relationship with the PATH function
Getting your model right and understanding your data sources is the most important thing to get right with Power BI. make sure you don’t have lots of headaches six months into your project. Its better to spend the time now, than having to start again later.