Introduction to DAX

I have gone through a fair few training courses on DAX now because its an important part of Microsoft Analytics. I thought It would be nice to include all my notes from one of the first DAX courses I attempted.

You don’t have to be an expert on DAX to get started with Power BI. You can start with a few of the basics and get lots of help along the way in the forums.

DAX (Data Analysis Expressions)  is an expression language for slicing and dicing analytical data

Where is DAX Used?

  • Power BI
  • Power Pivot (Excel)           
  • SSAS Tabular Mode (You can’t use it in Multidimensional mode)
  • Azure Analysis Services (This is only available in Tabular Mode)

DAX IS USED HEAVILY IN THE MICROSOFT BI STACK

Where Does DAX Shine?

  • Aggregations and filtering. Its optimized for both

What you need

  • An easy way of defining key metrics.
  • You need to be able to slice and dice
  • You need to be able to do historical analysis
  • DAX is an easy way of defining key metrics.

What is DAX not good at?

  • Operational Reporting – Detail heavy used for day to day operating. Line by Line Reports
  • Wide tables – Tables with a lot of columns
  • Many to Many relationships. There are ways around this, but it can be difficult to resolve

Its worth noting that this visual was provided over a year ago when Azure Analysis Services was the only way of creating the back end centralised model. we are now at December 2020 and Power BI Has moved on to become not only Self Service, but a really great way to implement a Standard centralised model for Enterprise reporting. You do this by creating data flows in Service, which establishes your transformations in one specific area for reuse. Then over this, data sets can be created that can be reused by other developers once you have promoted them.

Thinking in columns not rows

  • OLTP – Online Transactional Processing. (Normalised schemas for frequent updates)
  • OLAP – Online analytical Processing (Fact and Dimensions. Star Schemas etc.)
  • Single column aggregations – Sales by area etc. (OLAP)
  • Large number of rows – Sales by year, we have 10 years of data. Lots of rows (OLAP)
  • Repeated values – Data is flattened unlike in an OLTP
  • Need to quickly apply filters By Area, Postcode, Year etc.

We may only want 3 columns in a table that has 50 columns. We just want the slice of those 3 columns and we don’t want to read the other columns

For this we store data as columns Not Rows

Vertipaq

Also known as xVelocity. It is the engine used to store data as columns. The data is imported into Power BI data center as Columns not rows

Direct Query

If you set up your connection as a direct Query, Power BI has to translates DAX formulas into relational SQL queries so you lose a lot of the functionality like Time DAX because its too complex for SQL.

Compression and Encoding

This is how Power BI Compresses your data. Each column is compressed separately

  • Value Encoding
  • Dictionary Encoding
  • Run-Length Encoding

Value Encoding

Let’s take the lowest value in the range

and then store everything else as the delta (Difference) of that value C. 

This makes the Code shorter in length, compressing the data

Dictionary Encoding

Actual Column

The value gets assigned a number which is then used in place of the actual data item

This takes up far less space.

Run-length encoding

The data is sorted

This just keeps the colour and the number of repeats. Repeating values creates excellent compression. Unique values, not so much.

Adding Business Logic with Calculated Columns and Measures

The following examples have been set up using the old faithful Adventureworks data base.

Im initially bringing through two tables

  • Sales.SalesOrderDetail
  • Production.Product

and renaming them to Products and SalesOrderDetail

We are now ready to try some examples

Calculated Columns

Where should you be creating calculated columns? In Power Query using M Language or in DAX.

the speedy answer is, if you can, always create them in Power Query. However, you do need to understand how the calculated column works in DAX in order to understand measures

  • Expands a table by adding a new column.
  • Operational, Detailed information. Can only look at the specific row
  • It’s an expression that produces a column. Computes at the time of data refresh and is stored within the table
  • Limited by a row context. Price-Discount. They don’t take advantage of the columnar database

In Power BI we are going to set up a Line total in SalesOrderDetail because you need where possible to put the new column in the right table with the data that is being used to create it

Click on … Against the table under fields in power BI and New Column

LineTotal = SalesOrderDetail[UnitPrice] * SalesOrderDetail[OrderQty] * (1- SalesOrderDetail[UnitPriceDiscount])

It calculates the values for each row

The new column now sits in the model with the other fields (Go to Data Tab)

And you can use this like any other metric. In this table we are filtering by product Colour. Line Total has already been calculated within the row context.

You can now use Line Total in a table visualization because we can treat this column like any other column and its now being set in the implicit filter context which in this case is Color.

Implicit context filters are when you add a description column which aggregates your metric within the visual. Or if you add a filter to the visual. We will look at Explicit filters later.

Related

When you create a column you can only create it from data in the same table unless you add RELATED or RELATEDTABLE into your DAX

Adding Color into the SalesOrderDetailTable from Products

1 Product can be sold Many times.

Colour = RELATED(Products[Color])

I have added data from another table into a table. RELATED allows you to pull data across from another related table. Note that in the previous DAX I only used data from one table, so I didn’t need to use RELATED.

RELATED is for the One to Many Relationship. Using data from the one side in the many side. Colour (from 1) now sits in salesOrderDetail (M)

RELATEDTABLE

RELATEDTABLE uses the data from the many in the one side

We are in Product

TotalSales = SUMX(RELATEDTABLE(SalesOrderDetail),SalesOrderDetail[LineTotal])

The SUMX is an Iterator. Takes a table and an expression to evaluate. This means We are going through each row in the Products table and its running the evaluation. It’s using the related table to go through all of the LineTotal’s that we created in SalesOrderDetail

So for example

Row for Product ID 707. Sum up the Line total in salesOrder Detail where ProductID 707 and add the line total into Product

Next iteration. Row for ProductID 708. Sum up the Line total in salesOrder Detail where ProductID 708 and add the line total into Product

And repeat until you have iterated through the entire table for each row

Again, a quick rule of thumb is, if you are creating a calculated column using data from the same table, do it in Power Query Editor. If you have to use data from other tables, use DAX.

Your data will be compressed like any other field if its been created with Power Query Editor but it wont be if its a calculated column because it happens after compression.

Measures

  • Summarises all the data into a single value. Not stored on the table.
  • Analytical. Takes a column and brings back a summary
  • They are computed at runtime so stored temporarily
  • Every time you open a report, your Measures are computed
  • Limited by a filter context (Rather than a row context)
  • A measure looks at the data minus what has been filtered out by the user at that time.
  • They are more loosely associated with a table, so you don’t need to RELATE tables

Again, In Power BI on the SalesOrderDetail Table:  Create Measure

Minimum Price = Min(SalesOrderDetail[UnitPrice])

You can’t add your measure to a slicer or use as a filter because its created after the filters have been set

Implicit Measure – Underneath the hood is a measure for anything that you aggregate that is created by the DAX engine. For example your Total column. Here we are implicitly filtering UnitPrice by Colour to get the minimum value.

If you go into the Data tab, you wont see this measure because it only exists after you create your visual.

Filter data Using Calculate

Implicit

Applied by the user or the layout of the report. The visuals implicitly filter the data. When you drill up and down on a visual for example.

Explicit

Coded into the DAX Expression. Explicit filtering overrides Implicit filtering

Calculate

Calculate changes the current evaluation context

In Power BI set up a Table containing Products Colour and Unit Price (Set to Average rather than Sum)

BlackAveragePrice = CALCULATE(AVERAGE(SalesOrderDetail[UnitPrice]),Products[Color]=”Black”)

This Calculates the Sales Total Only on the colour Black. This overrides the implicit filters of colour. It changes the Current Evaluation context, e.g. White or yellow

Add a filter and notice that this is also ignored because the Explicit context filter ignores the Implicit Context Filter of red

Next create a measure

IsColorFiltered = ISFILTERED(Products[Color])

ISFILTERED Return value. TRUE when columnName is being filtered directly. 

Colour is filtering each of the line items but is not filtered for the total because it’s the average of everything

Now the Grand total is also filtered by Grey once we have filtered on grey using the slicer so both are now true

Calculate is designed to be fast. For more complicated examples we need to use Filter.

Calculate only lets you compare a single column to a fixed value

Filter

CALCULATE( SUM ( [Total] ), FILTER (Sales, QTY * Price < 100 ))

Filter takes in two values. The Quantity and Price and then applies the filter of less than 100

Now, We want to see The Average Unit Price Discount Where Order Quantity * Unit Price * Unit Price Discount Any time the total discount is greater than 10 dollars, we want that to apply to the average

As an example, In Power BI we need another table of Color and UnitPriceDiscount (Set as Average)

Create a measure in SalesOrderDetail

DAX is a functional language, its created from functions like SUMX etc.

DAX is executed from the innermost parameter so don’t create all your DAX on one line.

AverageDiscountGreaterthanTen =
CALCULATE(AVERAGE
(SalesOrderDetail[UnitPriceDiscount]),
FILTER(SalesOrderDetail,
SalesOrderDetail[OrderQty] *
SalesOrderDetail[UnitPrice] *
SalesOrderDetail[UnitPriceDiscount]>10
)
)

Filter takes in two parameters. We need to tell it the table and then the expression we want.

Filter is creating a table of values to work with because it is a table function. Its always embedded in other functions, in this case CALCULATE,

You cant have measures inside a filter.

to start to understand this I’m going to add details for a specific order and product. Im going in at the lowest level of granularity

I wanted to see exactly what was going on in the filter so I created a calculated column working on the row context

Qty*Price*Discount = SalesOrderDetail[OrderQty]*SalesOrderDetail[UnitPrice]*SalesOrderDetail[UnitPriceDiscount]

So for Full Finger Gloves L.44 were bought, Around 17 dollars each with a discount of 15%

the Average of UnitPriceDiscount is the same as the Summed UnitPriceDiscount because the products are all identical with the same discount applied

44 $17 gloves would bring in $752. Applying the discount of 15% (0.15) shows is that £112.83 has been discounted from the total price.

This is above 10 so we can simply display the average Unit Price

ALL

Historical Sales = CALCULATE (SUM ([Total]), ALL (‘Date’) )

ALL clears any filters in the entire date table

For this example, we want the Sum of Order Quantity for the specific context filter divided by (We now need calculate because we are manipulating filters) Sum of all Order Quantities.

ALL takes in just the product table because we want to undo any filters from the Product table (New Measure in the Products Table)

% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]), ALL(Products))

So for Black context filter the Order Quantity is 81937 / the total (We have removed the expression filter for the entire product table) of 274914 so we can see that Black is 29.80% or the order total.

With a slicer on the report. We choose One Product and even though its the only row in the visual Total Sales shows -.13% instead of 100% which is just what we want.

We still have 18% of Total Sales instead of it showing 100% because that’s the only row left in the visual. Perfect. This is what would have happened without ALL

% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]))

Looping over the Data with Iterators

Allows you to use multiple columns in an aggregation

What is an iterator?

Power BI’s DAX engine is optimized for working with columns not rows.

Iterators can process on a row by row manner. Its less performant because its working with rows. 

Takes in a table parameter (What is it looping through)

Then takes in an expression (For each row, evaluate this expression)

We have used something before that takes in the table and then an expression. The Filter function. The Filter function is an iterator.

Many of them end in X. SUMX MAXX CONCTENATEX etc

Average Gross Sales =

AVERAGEX(Category, [Quantity] * [Price] )

You could create Quantity* Price as a calculated column so each row gets this valuated first, and then average this column separately. The iterator does both in one formula

Add a new Measure into SalesOrderDetail and start typing in the following (Type it in so you bring up the intellisense)

AverageGrossSales = AVERAGE(SalesOrderDetail[OrderQty] * Uni

If you type in the above you notice intellisense stops working when you get to Unit price. This doesn’t work. The average function only accepts a single column.

This is where you could create this part as a calculated column but we want everything in one measure

AverageGrossSales with Discount =

AVERAGEX(SalesOrderDetail,

SalesOrderDetail[OrderQty] *SalesOrderDetail[UnitPrice]*

(1 – SalesOrderDetail[UnitPriceDiscount])

)

Also, you cant use AVERAGE because Average only accepts one value. AVERAGEX can be used in a measure because it iterates through each row and then aggregates

CONCATENATEX example

Add the following measure to Colours

Colours Incorrect = CONCATENATEX(Products, Products[Color],” “)

Its only called Incorrect because this is the wrong way to do it

This measure should show us what colours we have filtered on but as you can see below, now I have added a Card. We get the filtered colours for every row and we don’t want that. The Concatenate is iterating through every row and adding every colour together

Colours = CONCATENATEX(DISTINCT(Products[Color]),Products[Color],” “)

If you change the Table part of the expression to Distinct Colour, this gives you the distinct colour filtered on

This time we added a Distinct and if we choose more than one colour a space will be between each colour.

Evaluation Context

Defines what values an expression can see

Nesting Evaluations

If the above is a little confusing (and it will be) here it is again

RANKX is an iterator

Its iterating through products so we are going to start at the first product. Tulip

Its passing everything from Internet Sales related to Tulips into basically a temporary table

We are passing this information into the SUM of X and we repeat for every Product

And now the RANKX has information to work with

Analysing History with Time intelligence

We need to be able to compare periods. We can make use of DAX time intelligence

  • Single date functions – Return a single date First date
  • Date table functions – Returns a table as the output. Year to date. Takes in a date and takes in a column filtered year to date
  • Date aggregation functions Total Year to Date

Create a Date Table

Why can’t I use the date table in sales?  Dates need to be contiguous

How to 1:

  • Get Data > Blank Query
  • Next click on Advanced Editor

let

    Source = #date(2011,1,1),

    Dates = List.Dates(Source,3650,#duration(1,0,0,0))

in

    Dates

Create the source date which is the 1st Jan 2011. Then take the source and create Dates. I’m creating 3650 rows with a duration of 1 day between each

Then click on ‘To Table’ to convert to table. Then I renamed the column to Date,

I can then add more dates using M query by duplicating the date and changing to Year, Month and Day. Make sure Date is a Date

How to 2: You can also make this table using DAX

  • Modelling tab > New Table

DatesDaxCALENDAR = CALENDAR(“01/01/2011″,”01/01/2021”)

This is your starting point, but we can amend this to include other date columns

DatesDaxCALENDAR = ADDCOLUMNS(CALENDAR(“01/01/2011″,”01/01/2021″),”Year”,YEAR([Date]),”Month”,MONTH([Date]),”Day”,DAY([Date]))

If you used CALENDARAUTO it looks at the date that you used and it automatically calculates the start and end based on your data set

Connect your new date table up to the Sales table

For the Power BI exercise, I am going to use the Date table created with M.

Dates as Surrogate Keys

Sometimes when you work with data warehouses you have an integer as a date for the surrogate key. 20190206 for example

If you have to join to a surrogate key.

  • Modelling > Mark as Date table. Say which is the date and now this will work. The only reason to mark as date table is if you are joining to a surrogate key.

If you move everything to an Analysis Service, you MUST have a date table marked as date

Year over Year and Year to Date Analysis

First, Add Sales Header to your Power BI Report from the Database

  • Edit Queries > Recent Source > Choose the Adventureworks Source > Sales.SalesOrderHeader > Import
  • Rename to Sales Header and close and apply
  • Go into relationships and set as follows

Sales Header OrderDate to Date in DateM (Ensuring they are both dates)

Disconnect the other Date table. This is for reference only

Now create this year to date measure in the DateM table which uses date from the date table

YTD DatesM = TOTALYTD(SUM(‘Sales Header’[TotalDue]),DatesM[Date].Date)

As you can see this works. Year to date resets when we get to the next year.

Get the Sales from the Previous Year (Added measure to the Sales Header Table)

Prev Year = CALCULATE(SUM(‘Sales Header'[TotalDue]),PREVIOUSYEAR(DatesM[Date]))

YoY Growth = ([Prev Year] – Sum(‘Sales Header'[TotalDue])) / Sum(‘Sales Header'[TotalDue])

So take the current total Due from the Previous years Total Due. Then divide by the total Due

This is only a starting point really but gives you a good idea of some basic DAX queries, How to write it and why.

Create your website with WordPress.com
Get started