Bad Modelling 1: Single flat file table e.g. Salesextract
When people first start out using Power BI as their Analytics platform, there is a tendency to say, lets import all the data in one big flat file, like an Excel worksheet.
This way of working is just not well organised and doesn’t give you a friendly analytics structure.
Avoid Wide Tables
Narrow tables are much better to work with in Power BI. As the data volumes grows it will affect performance and bloat your model and become inefficient. then, when you create measures, things will start getting even more overly complex in the one long and wide table.
Not to mention the point when you have to add another table and create joins. You may be faced with the many to many join because of your wide table.
STAR SCHEMA are the recommended approach to modelling in Power BI
Stars with a few Snowflaked dimensions are also ok.
If you have a flat file wide table its always important to convert to an above data model with narrow dimension tables and a fact table in the middle with all your measures.
Remember, Chaos is a flat file.
Model Relationships propagate filters to other tables.
In this example the ProductID propagates down to the sales table. 1 Product can be sold many times. (1 to many)
With a snowflake you can add another level
CategoryA Propagates down to the Sales Fact table
Deliver the right number of tables with the right relationships in place.
Power BI was designed for the people who never had to think about the design of data warehouses. originally, this self service tool would allow any one with little or no knowledge of best practice to import data from their own sources, excel spreadsheets, databases etc without any knowledge of how they were set up.
This becomes an issue when the recommended Power BI model is the fact and dimension schemas as above.
Understanding OLAP models go a long way to helping you set up Power BI
Dimensions Filter and group
Facts Summarise measures
Bad Modelling 2: Direct Query your Transactional Database
When you connect up to OLTP and drag in all your tables ( there may be hundreds of them) using Direct Query there are lots of things to consider.
the overall performance depends on the underlying data source
When you have lots of users opening shared reports, lots of visuals are refreshed and queries are sent to the underlying source. This means that the source MUST handle these query loads for all your users AND maintain reasonable performance for those using the OLTP as they enter data.
You are not the most important person in this scenario. The person(s) using the database to add data is the most important person
OLTP is designed for speedy data input. OLAP is designed for speedy retrieval of data for analytics. These are to very different things.
With OLTP, you have row-Store indexes (Clustered Index, Non-Clustered Index) and these are slow for data analysis. They are perfect for OLTP style workloads. Data Warehouse queries, consume a huge amount of data, this is another reason why using OLTP as your direct query data source isn’t the best approach.
Also your Direct Query means you loose a fair amount of DAX functionality time time based DAX calculations, What if Parameters, etc.
I was chatting to someone about this on the forums and they gave me a fantastic analogy
When you connect into a transactional database with Direct Query, its like being in a busy restaurant and getting all the customers to go and get their food from the kitchen.
It slows down the customers because of the layout of the kitchen. They don’t know where anything is, and other customers are also milling around trying to find where their starter is.
the Kitchen staff who are now trying to prepare the food are having to fight for physical space. Look at the pastry chef, trying to work around 10 customers asking where their various desserts are?
So you set up a reporting area. This is where the food gets placed, someone shouts service and a waiter will go and speedily deliver the course to the correct table.
No one needs to go into the kitchen unless they are in food prep. Everything works in the most efficient way.
Model relationships Dos
Only 1 ID to One ID. If you have composite keys they need to be merged
No recursive Relationships (relationships that go back to the same table. the example always used for this is the managerID in the employer table
the Cardinality is 1 to many. 1 to 1. many to one. (Many to Many needs a specific approach in Power BI)
Cardinality determines whether it has filter group behavior or summarise behavior
There can only be one active path (relationship) Between two tables. All your other paths will be inactive (But you can set up DAX to use them)
In this example OrderDateKey is the active relationship because we use this the most and joins to DateKey
ShipdateKey and DueDateKey also join to DateKey in the date table and are inactive.
DAX Functions for Relationships to help with modelling decisions
When creating calculated columns you can only include fields from the same table. Unless you use RELATED
For example, I’m adding the column Colour into the SalesOrderDetail table which has a Many to One join to Products •Colour = RELATED(Products[Colour])
RELATED allows you to use data from the one side in the many side of the join
RELATEDTABLE Uses data from the Many side of the Join
Modifies the filter direction Disables propagation. You can actually do this in the model by changing the filter to both directions instead of single. OR you can do it for a specific DAX query using CROSSFILTER
Our Unconnected Budgeted Data is in Year only and its not joined to our main model.
Here we connect up to Year in Date. then we can create a visal with Date from the Date dimension. Total sales from our connected data which is at daily level and Total Budget from our unconnected budgeted data at a different level of granularity.
Naturalise a recursive relationship with the PATH function
Getting your model right and understanding your data sources is the most important thing to get right with Power BI. make sure you don’t have lots of headaches six months into your project. Its better to spend the time now, than having to start again later.
I have gone through a fair few training courses on DAX now because its an important part of Microsoft Analytics. I thought It would be nice to include all my notes from one of the first DAX courses I attempted.
You don’t have to be an expert on DAX to get started with Power BI. You can start with a few of the basics and get lots of help along the way in the forums.
DAX (Data Analysis Expressions) is an expression language for slicing and dicing analytical data
Where is DAX Used?
Power Pivot (Excel)
SSAS Tabular Mode (You can’t use it in Multidimensional mode)
Azure Analysis Services (This is only available in Tabular Mode)
DAX IS USED HEAVILY IN THE MICROSOFT BI STACK
Where Does DAX Shine?
Aggregations and filtering. Its optimized for both
What you need
An easy way of defining key metrics.
You need to be able to slice and dice
You need to be able to do historical analysis
DAX is an easy way of defining key metrics.
What is DAX not good at?
Operational Reporting – Detail heavy used for day to day operating. Line by Line Reports
Wide tables – Tables with a lot of columns
Many to Many relationships. There are ways around this, but it can be difficult to resolve
Its worth noting that this visual was provided over a year ago when Azure Analysis Services was the only way of creating the back end centralised model. we are now at December 2020 and Power BI Has moved on to become not only Self Service, but a really great way to implement a Standard centralised model for Enterprise reporting. You do this by creating data flows in Service, which establishes your transformations in one specific area for reuse. Then over this, data sets can be created that can be reused by other developers once you have promoted them.
Thinking in columns not rows
OLTP – Online Transactional Processing. (Normalised schemas for frequent updates)
OLAP – Online analytical Processing (Fact and Dimensions. Star Schemas etc.)
Single column aggregations – Sales by area etc. (OLAP)
Large number of rows – Sales by year, we have 10 years of data. Lots of rows (OLAP)
Repeated values – Data is flattened unlike in an OLTP
Need to quickly apply filters By Area, Postcode, Year etc.
We may only want 3 columns in a table that has 50 columns. We just want the slice of those 3 columns and we don’t want to read the other columns
For this we store data as columns Not Rows
Also known as xVelocity. It is the engine used to store data as columns. The data is imported into Power BI data center as Columns not rows
If you set up your connection as a direct Query, Power BI has to translates DAX formulas into relational SQL queries so you lose a lot of the functionality like Time DAX because its too complex for SQL.
Compression and Encoding
This is how Power BI Compresses your data. Each column is compressed separately
Let’s take the lowest value in the range
and then store everything else as the delta (Difference) of that value C.
This makes the Code shorter in length, compressing the data
The value gets assigned a number which is then used in place of the actual data item
This takes up far less space.
The data is sorted
This just keeps the colour and the number of repeats. Repeating values creates excellent compression. Unique values, not so much.
Adding Business Logic with Calculated Columns and Measures
The following examples have been set up using the old faithful Adventureworks data base.
Im initially bringing through two tables
and renaming them to Products and SalesOrderDetail
We are now ready to try some examples
Where should you be creating calculated columns? In Power Query using M Language or in DAX.
the speedy answer is, if you can, always create them in Power Query. However, you do need to understand how the calculated column works in DAX in order to understand measures
Expands a table by adding a new column.
Operational, Detailed information. Can only look at the specific row
It’s an expression that produces a column. Computes at the time of data refresh and is stored within the table
Limited by a row context. Price-Discount. They don’t take advantage of the columnar database
In Power BI we are going to set up a Line total in SalesOrderDetail because you need where possible to put the new column in the right table with the data that is being used to create it
Click on … Against the table under fields in power BI and New Column
The new column now sits in the model with the other fields (Go to Data Tab)
And you can use this like any other metric. In this table we are filtering by product Colour. Line Total has already been calculated within the row context.
You can now use Line Total in a table visualization because we can treat this column like any other column and its now being set in the implicit filter context which in this case is Color.
Implicit context filters are when you add a description column which aggregates your metric within the visual. Or if you add a filter to the visual. We will look at Explicit filters later.
When you create a column you can only create it from data in the same table unless you add RELATED or RELATEDTABLE into your DAX
Adding Color into the SalesOrderDetailTable from Products
1 Product can be sold Many times.
Colour = RELATED(Products[Color])
I have added data from another table into a table. RELATED allows you to pull data across from another related table. Note that in the previous DAX I only used data from one table, so I didn’t need to use RELATED.
RELATED is for the One to Many Relationship. Using data from the one side in the many side. Colour (from 1) now sits in salesOrderDetail (M)
RELATEDTABLE uses the data from the many in the one side
The SUMX is an Iterator. Takes a table and an expression to evaluate. This means We are going through each row in the Products table and its running the evaluation. It’s using the related table to go through all of the LineTotal’s that we created in SalesOrderDetail
So for example
Row for Product ID 707. Sum up the Line total in salesOrder Detail where ProductID 707 and add the line total into Product
Next iteration. Row for ProductID 708. Sum up the Line total in salesOrder Detail where ProductID 708 and add the line total into Product
And repeat until you have iterated through the entire table for each row
Again, a quick rule of thumb is, if you are creating a calculated column using data from the same table, do it in Power Query Editor. If you have to use data from other tables, use DAX.
Your data will be compressed like any other field if its been created with Power Query Editor but it wont be if its a calculated column because it happens after compression.
Summarises all the data into a single value. Not stored on the table.
Analytical. Takes a column and brings back a summary
They are computed at runtime so stored temporarily
Every time you open a report, your Measures are computed
Limited by a filter context (Rather than a row context)
A measure looks at the data minus what has been filtered out by the user at that time.
They are more loosely associated with a table, so you don’t need to RELATE tables
Again, In Power BI on the SalesOrderDetail Table: Create Measure
Minimum Price = Min(SalesOrderDetail[UnitPrice])
You can’t add your measure to a slicer or use as a filter because its created after the filters have been set
Implicit Measure – Underneath the hood is a measure for anything that you aggregate that is created by the DAX engine. For example your Total column. Here we are implicitly filtering UnitPrice by Colour to get the minimum value.
If you go into the Data tab, you wont see this measure because it only exists after you create your visual.
Filter data Using Calculate
Applied by the user or the layout of the report. The visuals implicitly filter the data. When you drill up and down on a visual for example.
Coded into the DAX Expression. Explicit filtering overrides Implicit filtering
Calculate changes the current evaluation context
In Power BI set up a Table containing Products Colour and Unit Price (Set to Average rather than Sum)
So for Full Finger Gloves L.44 were bought, Around 17 dollars each with a discount of 15%
the Average of UnitPriceDiscount is the same as the Summed UnitPriceDiscount because the products are all identical with the same discount applied
44 $17 gloves would bring in $752. Applying the discount of 15% (0.15) shows is that £112.83 has been discounted from the total price.
This is above 10 so we can simply display the average Unit Price
Historical Sales = CALCULATE (SUM ([Total]), ALL (‘Date’) )
ALL clears any filters in the entire date table
For this example, we want the Sum of Order Quantity for the specific context filter divided by (We now need calculate because we are manipulating filters) Sum of all Order Quantities.
ALL takes in just the product table because we want to undo any filters from the Product table (New Measure in the Products Table)
% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]), ALL(Products))
So for Black context filter the Order Quantity is 81937 / the total (We have removed the expression filter for the entire product table) of 274914 so we can see that Black is 29.80% or the order total.
With a slicer on the report. We choose One Product and even though its the only row in the visual Total Sales shows -.13% instead of 100% which is just what we want.
We still have 18% of Total Sales instead of it showing 100% because that’s the only row left in the visual. Perfect. This is what would have happened without ALL
% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]))
Looping over the Data with Iterators
Allows you to use multiple columns in an aggregation
What is an iterator?
Power BI’s DAX engine is optimized for working with columns not rows.
Iterators can process on a row by row manner. Its less performant because its working with rows.
Takes in a table parameter (What is it looping through)
Then takes in an expression (For each row, evaluate this expression)
We have used something before that takes in the table and then an expression. The Filter function. The Filter function is an iterator.
Many of them end in X. SUMX MAXX CONCTENATEX etc
Average Gross Sales =
AVERAGEX(Category, [Quantity] * [Price] )
You could create Quantity* Price as a calculated column so each row gets this valuated first, and then average this column separately. The iterator does both in one formula
Add a new Measure into SalesOrderDetail and start typing in the following (Type it in so you bring up the intellisense)
AverageGrossSales = AVERAGE(SalesOrderDetail[OrderQty] * Uni
If you type in the above you notice intellisense stops working when you get to Unit price. This doesn’t work. The average function only accepts a single column.
This is where you could create this part as a calculated column but we want everything in one measure
Its only called Incorrect because this is the wrong way to do it
This measure should show us what colours we have filtered on but as you can see below, now I have added a Card. We get the filtered colours for every row and we don’t want that. The Concatenate is iterating through every row and adding every colour together