Introduction to DAX

I have gone through a fair few training courses on DAX now because its an important part of Microsoft Analytics. I thought It would be nice to include all my notes from one of the first DAX courses I attempted.

You don’t have to be an expert on DAX to get started with Power BI. You can start with a few of the basics and get lots of help along the way in the forums.

DAX (Data Analysis Expressions)  is an expression language for slicing and dicing analytical data

Where is DAX Used?

  • Power BI
  • Power Pivot (Excel)           
  • SSAS Tabular Mode (You can’t use it in Multidimensional mode)
  • Azure Analysis Services (This is only available in Tabular Mode)

DAX IS USED HEAVILY IN THE MICROSOFT BI STACK

Where Does DAX Shine?

  • Aggregations and filtering. Its optimized for both

What you need

  • An easy way of defining key metrics.
  • You need to be able to slice and dice
  • You need to be able to do historical analysis
  • DAX is an easy way of defining key metrics.

What is DAX not good at?

  • Operational Reporting – Detail heavy used for day to day operating. Line by Line Reports
  • Wide tables – Tables with a lot of columns
  • Many to Many relationships. There are ways around this, but it can be difficult to resolve

Its worth noting that this visual was provided over a year ago when Azure Analysis Services was the only way of creating the back end centralised model. we are now at December 2020 and Power BI Has moved on to become not only Self Service, but a really great way to implement a Standard centralised model for Enterprise reporting. You do this by creating data flows in Service, which establishes your transformations in one specific area for reuse. Then over this, data sets can be created that can be reused by other developers once you have promoted them.

Thinking in columns not rows

  • OLTP – Online Transactional Processing. (Normalised schemas for frequent updates)
  • OLAP – Online analytical Processing (Fact and Dimensions. Star Schemas etc.)
  • Single column aggregations – Sales by area etc. (OLAP)
  • Large number of rows – Sales by year, we have 10 years of data. Lots of rows (OLAP)
  • Repeated values – Data is flattened unlike in an OLTP
  • Need to quickly apply filters By Area, Postcode, Year etc.

We may only want 3 columns in a table that has 50 columns. We just want the slice of those 3 columns and we don’t want to read the other columns

For this we store data as columns Not Rows

Vertipaq

Also known as xVelocity. It is the engine used to store data as columns. The data is imported into Power BI data center as Columns not rows

Direct Query

If you set up your connection as a direct Query, Power BI has to translates DAX formulas into relational SQL queries so you lose a lot of the functionality like Time DAX because its too complex for SQL.

Compression and Encoding

This is how Power BI Compresses your data. Each column is compressed separately

  • Value Encoding
  • Dictionary Encoding
  • Run-Length Encoding

Value Encoding

Let’s take the lowest value in the range

and then store everything else as the delta (Difference) of that value C. 

This makes the Code shorter in length, compressing the data

Dictionary Encoding

Actual Column

The value gets assigned a number which is then used in place of the actual data item

This takes up far less space.

Run-length encoding

The data is sorted

This just keeps the colour and the number of repeats. Repeating values creates excellent compression. Unique values, not so much.

Adding Business Logic with Calculated Columns and Measures

The following examples have been set up using the old faithful Adventureworks data base.

Im initially bringing through two tables

  • Sales.SalesOrderDetail
  • Production.Product

and renaming them to Products and SalesOrderDetail

We are now ready to try some examples

Calculated Columns

Where should you be creating calculated columns? In Power Query using M Language or in DAX.

the speedy answer is, if you can, always create them in Power Query. However, you do need to understand how the calculated column works in DAX in order to understand measures

  • Expands a table by adding a new column.
  • Operational, Detailed information. Can only look at the specific row
  • It’s an expression that produces a column. Computes at the time of data refresh and is stored within the table
  • Limited by a row context. Price-Discount. They don’t take advantage of the columnar database

In Power BI we are going to set up a Line total in SalesOrderDetail because you need where possible to put the new column in the right table with the data that is being used to create it

Click on … Against the table under fields in power BI and New Column

LineTotal = SalesOrderDetail[UnitPrice] * SalesOrderDetail[OrderQty] * (1- SalesOrderDetail[UnitPriceDiscount])

It calculates the values for each row

The new column now sits in the model with the other fields (Go to Data Tab)

And you can use this like any other metric. In this table we are filtering by product Colour. Line Total has already been calculated within the row context.

You can now use Line Total in a table visualization because we can treat this column like any other column and its now being set in the implicit filter context which in this case is Color.

Implicit context filters are when you add a description column which aggregates your metric within the visual. Or if you add a filter to the visual. We will look at Explicit filters later.

Related

When you create a column you can only create it from data in the same table unless you add RELATED or RELATEDTABLE into your DAX

Adding Color into the SalesOrderDetailTable from Products

1 Product can be sold Many times.

Colour = RELATED(Products[Color])

I have added data from another table into a table. RELATED allows you to pull data across from another related table. Note that in the previous DAX I only used data from one table, so I didn’t need to use RELATED.

RELATED is for the One to Many Relationship. Using data from the one side in the many side. Colour (from 1) now sits in salesOrderDetail (M)

RELATEDTABLE

RELATEDTABLE uses the data from the many in the one side

We are in Product

TotalSales = SUMX(RELATEDTABLE(SalesOrderDetail),SalesOrderDetail[LineTotal])

The SUMX is an Iterator. Takes a table and an expression to evaluate. This means We are going through each row in the Products table and its running the evaluation. It’s using the related table to go through all of the LineTotal’s that we created in SalesOrderDetail

So for example

Row for Product ID 707. Sum up the Line total in salesOrder Detail where ProductID 707 and add the line total into Product

Next iteration. Row for ProductID 708. Sum up the Line total in salesOrder Detail where ProductID 708 and add the line total into Product

And repeat until you have iterated through the entire table for each row

Again, a quick rule of thumb is, if you are creating a calculated column using data from the same table, do it in Power Query Editor. If you have to use data from other tables, use DAX.

Your data will be compressed like any other field if its been created with Power Query Editor but it wont be if its a calculated column because it happens after compression.

Measures

  • Summarises all the data into a single value. Not stored on the table.
  • Analytical. Takes a column and brings back a summary
  • They are computed at runtime so stored temporarily
  • Every time you open a report, your Measures are computed
  • Limited by a filter context (Rather than a row context)
  • A measure looks at the data minus what has been filtered out by the user at that time.
  • They are more loosely associated with a table, so you don’t need to RELATE tables

Again, In Power BI on the SalesOrderDetail Table:  Create Measure

Minimum Price = Min(SalesOrderDetail[UnitPrice])

You can’t add your measure to a slicer or use as a filter because its created after the filters have been set

Implicit Measure – Underneath the hood is a measure for anything that you aggregate that is created by the DAX engine. For example your Total column. Here we are implicitly filtering UnitPrice by Colour to get the minimum value.

If you go into the Data tab, you wont see this measure because it only exists after you create your visual.

Filter data Using Calculate

Implicit

Applied by the user or the layout of the report. The visuals implicitly filter the data. When you drill up and down on a visual for example.

Explicit

Coded into the DAX Expression. Explicit filtering overrides Implicit filtering

Calculate

Calculate changes the current evaluation context

In Power BI set up a Table containing Products Colour and Unit Price (Set to Average rather than Sum)

BlackAveragePrice = CALCULATE(AVERAGE(SalesOrderDetail[UnitPrice]),Products[Color]=”Black”)

This Calculates the Sales Total Only on the colour Black. This overrides the implicit filters of colour. It changes the Current Evaluation context, e.g. White or yellow

Add a filter and notice that this is also ignored because the Explicit context filter ignores the Implicit Context Filter of red

Next create a measure

IsColorFiltered = ISFILTERED(Products[Color])

ISFILTERED Return value. TRUE when columnName is being filtered directly. 

Colour is filtering each of the line items but is not filtered for the total because it’s the average of everything

Now the Grand total is also filtered by Grey once we have filtered on grey using the slicer so both are now true

Calculate is designed to be fast. For more complicated examples we need to use Filter.

Calculate only lets you compare a single column to a fixed value

Filter

CALCULATE( SUM ( [Total] ), FILTER (Sales, QTY * Price < 100 ))

Filter takes in two values. The Quantity and Price and then applies the filter of less than 100

Now, We want to see The Average Unit Price Discount Where Order Quantity * Unit Price * Unit Price Discount Any time the total discount is greater than 10 dollars, we want that to apply to the average

As an example, In Power BI we need another table of Color and UnitPriceDiscount (Set as Average)

Create a measure in SalesOrderDetail

DAX is a functional language, its created from functions like SUMX etc.

DAX is executed from the innermost parameter so don’t create all your DAX on one line.

AverageDiscountGreaterthanTen =
CALCULATE(AVERAGE
(SalesOrderDetail[UnitPriceDiscount]),
FILTER(SalesOrderDetail,
SalesOrderDetail[OrderQty] *
SalesOrderDetail[UnitPrice] *
SalesOrderDetail[UnitPriceDiscount]>10
)
)

Filter takes in two parameters. We need to tell it the table and then the expression we want.

Filter is creating a table of values to work with because it is a table function. Its always embedded in other functions, in this case CALCULATE,

You cant have measures inside a filter.

to start to understand this I’m going to add details for a specific order and product. Im going in at the lowest level of granularity

I wanted to see exactly what was going on in the filter so I created a calculated column working on the row context

Qty*Price*Discount = SalesOrderDetail[OrderQty]*SalesOrderDetail[UnitPrice]*SalesOrderDetail[UnitPriceDiscount]

So for Full Finger Gloves L.44 were bought, Around 17 dollars each with a discount of 15%

the Average of UnitPriceDiscount is the same as the Summed UnitPriceDiscount because the products are all identical with the same discount applied

44 $17 gloves would bring in $752. Applying the discount of 15% (0.15) shows is that £112.83 has been discounted from the total price.

This is above 10 so we can simply display the average Unit Price

ALL

Historical Sales = CALCULATE (SUM ([Total]), ALL (‘Date’) )

ALL clears any filters in the entire date table

For this example, we want the Sum of Order Quantity for the specific context filter divided by (We now need calculate because we are manipulating filters) Sum of all Order Quantities.

ALL takes in just the product table because we want to undo any filters from the Product table (New Measure in the Products Table)

% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]), ALL(Products))

So for Black context filter the Order Quantity is 81937 / the total (We have removed the expression filter for the entire product table) of 274914 so we can see that Black is 29.80% or the order total.

With a slicer on the report. We choose One Product and even though its the only row in the visual Total Sales shows -.13% instead of 100% which is just what we want.

We still have 18% of Total Sales instead of it showing 100% because that’s the only row left in the visual. Perfect. This is what would have happened without ALL

% of total sales = SUM(SalesOrderDetail[OrderQty]) / CALCULATE(SUM(SalesOrderDetail[OrderQty]))

Looping over the Data with Iterators

Allows you to use multiple columns in an aggregation

What is an iterator?

Power BI’s DAX engine is optimized for working with columns not rows.

Iterators can process on a row by row manner. Its less performant because its working with rows. 

Takes in a table parameter (What is it looping through)

Then takes in an expression (For each row, evaluate this expression)

We have used something before that takes in the table and then an expression. The Filter function. The Filter function is an iterator.

Many of them end in X. SUMX MAXX CONCTENATEX etc

Average Gross Sales =

AVERAGEX(Category, [Quantity] * [Price] )

You could create Quantity* Price as a calculated column so each row gets this valuated first, and then average this column separately. The iterator does both in one formula

Add a new Measure into SalesOrderDetail and start typing in the following (Type it in so you bring up the intellisense)

AverageGrossSales = AVERAGE(SalesOrderDetail[OrderQty] * Uni

If you type in the above you notice intellisense stops working when you get to Unit price. This doesn’t work. The average function only accepts a single column.

This is where you could create this part as a calculated column but we want everything in one measure

AverageGrossSales with Discount =

AVERAGEX(SalesOrderDetail,

SalesOrderDetail[OrderQty] *SalesOrderDetail[UnitPrice]*

(1 – SalesOrderDetail[UnitPriceDiscount])

)

Also, you cant use AVERAGE because Average only accepts one value. AVERAGEX can be used in a measure because it iterates through each row and then aggregates

CONCATENATEX example

Add the following measure to Colours

Colours Incorrect = CONCATENATEX(Products, Products[Color],” “)

Its only called Incorrect because this is the wrong way to do it

This measure should show us what colours we have filtered on but as you can see below, now I have added a Card. We get the filtered colours for every row and we don’t want that. The Concatenate is iterating through every row and adding every colour together

Colours = CONCATENATEX(DISTINCT(Products[Color]),Products[Color],” “)

If you change the Table part of the expression to Distinct Colour, this gives you the distinct colour filtered on

This time we added a Distinct and if we choose more than one colour a space will be between each colour.

Evaluation Context

Defines what values an expression can see

Nesting Evaluations

If the above is a little confusing (and it will be) here it is again

RANKX is an iterator

Its iterating through products so we are going to start at the first product. Tulip

Its passing everything from Internet Sales related to Tulips into basically a temporary table

We are passing this information into the SUM of X and we repeat for every Product

And now the RANKX has information to work with

Analysing History with Time intelligence

We need to be able to compare periods. We can make use of DAX time intelligence

  • Single date functions – Return a single date First date
  • Date table functions – Returns a table as the output. Year to date. Takes in a date and takes in a column filtered year to date
  • Date aggregation functions Total Year to Date

Create a Date Table

Why can’t I use the date table in sales?  Dates need to be contiguous

How to 1:

  • Get Data > Blank Query
  • Next click on Advanced Editor

let

    Source = #date(2011,1,1),

    Dates = List.Dates(Source,3650,#duration(1,0,0,0))

in

    Dates

Create the source date which is the 1st Jan 2011. Then take the source and create Dates. I’m creating 3650 rows with a duration of 1 day between each

Then click on ‘To Table’ to convert to table. Then I renamed the column to Date,

I can then add more dates using M query by duplicating the date and changing to Year, Month and Day. Make sure Date is a Date

How to 2: You can also make this table using DAX

  • Modelling tab > New Table

DatesDaxCALENDAR = CALENDAR(“01/01/2011″,”01/01/2021”)

This is your starting point, but we can amend this to include other date columns

DatesDaxCALENDAR = ADDCOLUMNS(CALENDAR(“01/01/2011″,”01/01/2021″),”Year”,YEAR([Date]),”Month”,MONTH([Date]),”Day”,DAY([Date]))

If you used CALENDARAUTO it looks at the date that you used and it automatically calculates the start and end based on your data set

Connect your new date table up to the Sales table

For the Power BI exercise, I am going to use the Date table created with M.

Dates as Surrogate Keys

Sometimes when you work with data warehouses you have an integer as a date for the surrogate key. 20190206 for example

If you have to join to a surrogate key.

  • Modelling > Mark as Date table. Say which is the date and now this will work. The only reason to mark as date table is if you are joining to a surrogate key.

If you move everything to an Analysis Service, you MUST have a date table marked as date

Year over Year and Year to Date Analysis

First, Add Sales Header to your Power BI Report from the Database

  • Edit Queries > Recent Source > Choose the Adventureworks Source > Sales.SalesOrderHeader > Import
  • Rename to Sales Header and close and apply
  • Go into relationships and set as follows

Sales Header OrderDate to Date in DateM (Ensuring they are both dates)

Disconnect the other Date table. This is for reference only

Now create this year to date measure in the DateM table which uses date from the date table

YTD DatesM = TOTALYTD(SUM(‘Sales Header’[TotalDue]),DatesM[Date].Date)

As you can see this works. Year to date resets when we get to the next year.

Get the Sales from the Previous Year (Added measure to the Sales Header Table)

Prev Year = CALCULATE(SUM(‘Sales Header'[TotalDue]),PREVIOUSYEAR(DatesM[Date]))

YoY Growth = ([Prev Year] – Sum(‘Sales Header'[TotalDue])) / Sum(‘Sales Header'[TotalDue])

So take the current total Due from the Previous years Total Due. Then divide by the total Due

This is only a starting point really but gives you a good idea of some basic DAX queries, How to write it and why.

DAX – ISLOGICAL

This function checks whether a value is a logical value (TRUE or FALSE), and returns TRUE or FALSE.

I just looked a question that asks, what does this DAX expression do?

is logical measure = IF(ISLOGICAL("true"),"Is boolean or Logical","Is different type")

I could just have a guess based on If “true” is logical then “Is Boolean or Logical”. Else “Is different type”.

Well, We have added “True” true within quotation marks which makes it text, not Boolean or logical.

As expected. Lets change the DAX

is logical measure = IF(ISLOGICAL(true),"Is boolean or Logical","Is different type")

True is Boolean because you can only have True or false

is logical measure = IF(ISLOGICAL(false),"Is boolean or Logical","Is different type")

Still Logical because its Boolean so the card stays the same

So why would you use ISLOGICAL?

You would mostly use this to test data so its great for making sure columns of true and false only contain boolean true and false information.

So, you always need to test how good your data is. Add an ISLOGICAL for true /false columns and you can report back on the validity of the data.

Salaried Flag in Adventureworks is a boolean column

Here is where the confusion is. You cant feed Salaried flag into this because we have created a measure and salaried flag is not an aggregation. You cant aggregate true or false

So instead I create a calculated column on the Employee table

Is Salary flag true or false? = IF(ISLOGICAL(DimEmployee[SalariedFlag]),"Is boolean or Logical","Is different type")

Great, Salaried flag has passed the test.

So ISLOGICAL, great for testing within a calculated column

Buying and Pausing a Power BI A SKU

I want to set up Power BI Dedicated Capacity specifically for development and testing Premium features.

the Power BI Premium SKU’s should only be used in production. We want something that we can pause when not in use and help us to research all the Premium capabilities.

Thankfully we can go for the A SKU (1 through 6)

  • A4 provides the same functionality as the P1 SKU
  • A2 is the same as P2
  • A6 is the same as the P3

The only difference is that you don’t get full user access like you do with Premium. Only users with Pro accounts can access the content (Which makes sense because we haven’t moved to Production)

And you can Pause when not it use. This is fantastic and will save lots of money in the development and testing phases

The question is, which SKU do we require. I know I need to test Paginated Reports and this is only available from A4 so clearly we need the SKU that corresponds with P1

Buying A4 SKU

Using an account that has at least capacity admin permissions in Power BI.

Log into Azure and Search for Power BI embedded

Here I have decided to choose our training subscription and training resource group.

 The capacity admin must be a member user or a service principal in your Azure AD tenant. I have selected myself for this role for this development Power BI area

At this point you need to choose the resource size and I know that we need the A4

If we left this one the whole time it would cost £4,467.41 for the month but obviously we only need this when we are testing Premium functionality so we would be pausing this most of the time.

Its extremely important that this is paused after use

I have added a few tags to this service before creating

Go to resource because we want to immediately Pause it

And Pause. I will only unpause the resource when I want to set up an App Workspace on this test capacity to check the Paginated report and other functionality.

The A4 SKU Will cost around £6 an hour so ensure you plan your development time effectively

What if Parameters in Power BI

I wanted to have a look at What if Parameters in Power BI. This had passed me by so using Adventureworks I wanted to get a handle on it.

I have a simple problem. I have a sales goal Id like to achieve over the months and Id like to see how adding percentages onto sales would help me hit my goal.

For example, If I add 1% onto sales, would that help hit the targets?

Management have decided that we need to be selling £170M each month as their big sales target after analysing current sales

The first thing we do is add this into the report using a constant line

We want to know how to set Sales Quotas, what do we need to do to push towards the targets. From Modeling, select the What if New Parameter

This is all fairly standard stuff. We want to increase by percentages for the sales forecast so we use a Fixed Decimal number. Then we want to be able to see changes from 1% to 150%

We want to go up in increments of half a percent

Now we have a slicer which we can move to choose percentage increases and a new table called % Sales Forecast.

Within the table are the following:

% Sales Forecast = GENERATESERIES(CURRENCY(1), CURRENCY(1.5), CURRENCY(0.05))

This will return a single column table containing values. We give it a start and and end and an increment

GENERATESERIES(<startValue>, <endValue>[, <incrementValue>])

If you go into data view you can look at the table created and see that this has created our calculated column of data for the forecast (Single Column table)

% Sales Forecast Value = SELECTEDVALUE('% Sales Forecast'[% Sales Forecast], 1)

this is the measure that has been created in the table so it doesn’t exist in the table structure. It returns the value, in this case when % Sales Forecast calculated column has been filtered down to one distinct value

Create a measure in new % sales Forecast table to take sales and multiply by the measure

Total Sales Forecast = [Total Sales]* '% Sales Forecast'[% Sales Forecast Value]

Total sales was created the by creating this measure

Total Sales = SUM(FactResellerSalesXL_CCI[Sales Amount])

I decided to add this measure here because it makes sense to hold this in the same area as all the other forecast information.

When you add this new forecast measure you get the following

Because we are only * by 1 its exactly the same as our sales but lets change the forecast

Great. If we get 1.15% more sales then March should exceed our target. So if we can hit this next March, we have achieved our targets.

Power BI December 2019 Updates – KPI Updates (Goal)

Finally, something we have been waiting for for a long time.

Its amazing how such a little thing can case so much extra complexity.

Previously you could only show this on the KPI

The Goal States 1749 but this isnt a Goal. This is Last Months measure. Not a goal

This basically means that for the KPI you can only use them in this way if your data makes sense for there to be a Goal. Maybe you might have to add more information about the Goal in the Visual Header Tool tip icon.

However, for my report I need it to be stated that this is last month figure as compared to this months. I have lots of KPIs based on Current and this time last month so I had to create them like this

Instead of 1 visual I have 2. A KPI and a card, Hiding Goal in the KPI. this creates a lot more complexity in the reports but I had to do it to avoid Confusion.

Now, we can reset the Goal to whatever we want

Finally I can reset all my reports so I can reduce the amount of visuals shown.

Such a small change but it makes a big difference.

I give this update 10/10

Creating an open Account Measure using DAX

Thank goodness for the amazing people on the Power BI Forums helping with this. I have the solution but what I wanted to do was to create a post to explain the why

The Problem

I have accounts and events occurring on each account. What I want to know is the time period they are open

Just for this example I have taken One account with 11 activities

If you run the visual against start date all you get is a measure againt the start time period.

We are more interested in creating a measure to show how long the activity went on for.

For this example I am going to choose Jan 28 2017 to work with to continue because the end date for this is March 21st 2017 so this activity should span 3 months

The Date Dimension

To do this you must have a date dimension. For the purposes of this example I’m going to create one in DAX (Create table) Using the following:

Notice that I have used Start and End Dates to set the min and Max years. You should already have a date dimension to work with though.

Set up the relationship between the Fact table and Date dim

Start date is the Active join and End Date is the Inactive Join

You could create a role playing dimension for the End date and have that as an active join but in this case I’m going to go for the above design.

Account Start flag (New Measure)

  • CALCULATE – You can modify your expression using a filter
  • ALLSELECTED – If you have a slicer set to Year (Date) , ALLSELECTED takes the ‘Context’ from the slicer Like 2018.
  • Date is less than or equal to the Max Date in the Filter Context

Account End Flag (New Measure)

  • ENDOFMONTH – Returns the last date of the month in the current context for the specified column of dates.
  • The date at the end of the month is less than the Max date in the Filter Context
  • USERELATIONSHIP Because End Date is the inactive relationship we need to specify that this relationship is used, Not Start Date

Add an Open Measure

To understand the results we need to try and understand how the new measures work

Jan

  • Start Flag =        1 because 18th Jan is less than Jan 31st
  • End Flag  =        NULL because March 21st (Reset to March 31st) is NOT Less than Jan 31st
  • Open =  1- null = 1

Feb

  • Start Flag =        1 because 18th Jan is less than Feb 28th
  • End Flag  =        NULL because March 21st (Reset to March 31st) is NOT Less than Feb 28th
  • Open =  1- null = 1

Mar

  • Start Flag =        1 because 18th Jan is less than Mar 31st
  • End Flag  =        NULL because March 21st (Reset to March 31st) is NOT Less than March 31st
  • Open =  1- null = 1

Apr

  • Start Flag =        1 because 18th Jan is less than April 30th
  • End Flag  =        1 because March 21st (Reset to March 31st) is Less than April 31st
  • Open =  1- 1 = 0

There are usually lots of ways to answer one question so this is one of many but the Open Flag now allows me to pull up all my activities and show in a Matrix where the are open.

Use Data Lake Storage V2 as Dataflow Storage

This blog post follows on from https://debbiesmspowerbiazureblog.home.blog/2019/11/28/setting-up-an-azure-data-lake-v2-to-use-with-power-bi-data-flows-in-service-as-a-data-source/

Dataflows are essentially an online collection and storage tool. Power Query Connects to data at source and collects and transforms that data. The dataflow then stores the data in a table within the cloud. They are stored in Data Lakes which is all automated for you.

Dataflows unify data from all your different sources. It should be noted that a Data Warehouse is still the recommended architecture with Data Flows over the top.

Dataflows also introduce the concept of the Common Data Service (CDS) and the Common Data Model (CDM). CDM allows organisations to use data formats to provide consistency across deployments. Now we have Azure Data Lake gen2 storage can be combined with data flows to store the data flows to provide and structured centralised data source.

Thanks to https://docs.microsoft.com/en-us/common-data-model/use for helping me understand the differences between the two

When you integrate CDM with Data Lake Gen 2 you get structural consistency and you can use CDM Folders in the lake that contain your schemas in standard CDM format.

Dataflow definitions and data are stored in Model.json format. If you have a Model.json file, it shows that you are compliant with CDM.

Dataflows store their definition and data in CDM folders, in the Model.json formats. If you have Model.Json it shows you are adhering to CDM.

Of course, this can be quite difficult when you are working with data that does not adhere to CDM format. I can see it being really useful when you are starting from scratch but I have done some work looking at my data sources and they are always quite far from CDM format.

You can find information about the CDM here https://docs.microsoft.com/en-us/common-data-model/

And more information about CDS here

https://docs.microsoft.com/en-us/powerapps/maker/common-data-service/data-platform-intro

Advantages to setting up your own Data Lake Gen2 Store

Essentially, you can get by using the internal Data Lake Store but there are lots of reasons (And some of these advantages will be coming in future releases) why setting your own store up is a good thing.

  • Great for Re-use, If you are given access to the workspace you can use the dataflows already created for you.
  • Centralise your prepped data

Remember this doesn’t replace your data warehouse. It just adds a preparation and transformation layer above the data warehouse. Instead of having to wait to get your small change added to the warehouse you can add it to this layer.

  • Data within your data lake can be integrated into other solutions by developers
  • Data Lake Gen 2 is enormously scale-able for data
  • Dataflow data and definition file can be leveraged by developers for AI Services

Steps to Set Azure Data Lake Gen2 as Dataflow Storage

  • The storage account must be created in the same Azure Active Directory tenant as the Power BI tenant.
  • The storage account must be created in the same region as your Power BI tenant. To determine where you Power BI tenant is located
  • The storage account must have the Hierarchical Name Space feature enabled.
  • Power BI service must be granted a Reader role on the storage account.
  • A File system named powerbi must be created.
  • Power BI services must be authorized to the powerbi filesystem you create.

Once configured, it cant be changed. You cant go back to the default data store for your dataflows

Set Storage Account in Same Region as your Power BI tenant

Log into Power BI service , then Help ? and About Power BI

The Data is Stored in North Europe Ireland

When you set up your Data Lake Ensure North Europe region is selected

Set up Azure Data Lake V2 in Azure Portal

When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1

So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)

First of all I go to the Subscription where I want to add the new data Lake v2

Open up the Portal menu (Now hidden to the left of the screen)

Choose Create a resource

next Choose Storage and Storage Account

Note that the Account kind is Storage V2 (General Purpose)

Make sure Location is the same as Power BI Service. I’m not using this functionality at the moment but there is no harm in applying this logic.

It is recommended to set replication setting to Read-access geo-redundant storage (RA-GRS)

For the time being, I am leaving everything else as standard

Next go to Advanced

the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2

Click Review and Create

Grant Reader Role to Power BI Service

This is all built in so it should be fairly straightforward.

In Azure go to your new storage account (If you aren’t already in it) and go to Add role Assignment

One there, choose the reader role and select Power BI Service which as you can see is already in the list.

It takes about 30 minutes for this to take effect.

Create a powerbi file System

Power BI Needs to use a Filesystem specifically named powerbi  so data flows can be stored in this specific file system.

We now have a few options available to us. I have some files to add so I am going to add them to a container

Click on Containers And then + File System

Note that to store dataflows its called powerbi Click OK

clicking on your new storage account(s) you are told to Download Azure Storage Explorer.

I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.

Once downloaded Open Azure Storage Explorer

You will need to Add in your Azure Storage Accounts by clicking the little connector icon

You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication

This will log you into all your Subscriptions and Services

You are good to go

Here you find your subscription, Then go to the data Lake Storage Gen 2 and find the new File system powerbi.

Grant Power BI permissions to the file system

Before we connect we need to grant permission for Power BI to use the File System (Again this is specific to using DLV2 as a data flow store but at this point we may as well set up the permissions)

go to Azure Portal and Azure Active Directory

then select Enterprise Applications

Change the Application type Drop down to All Applications

Power Query Online and Power BI Premium and Power BI Service are in the list.

You will need the Object IDs of these applications.

Back to Azure Storage Explorer (Manage Access)

Navigate back to powerbi file system, Right click and Choose Manage Access

Click Add, Grab the object ID of Power BI Service to Manage Access

Set Read, Write and Execute Access to Service and Repeat the Process for Power BI Premium

Repeat for Power Query Online but Set Write and Execute Access

Other Also needs setting up as follows

Connect the datalake Gen Storage Account to Power BI Dataflows

To do this you need to be Power BI Admin. go to Power BI Service and navigate to the Admin Portal

From here Connect your Azure Data Lake Storage Gen2.

Add your Subscription ID, Resource group Name and Storage Account name of your Data Lake

It is now connected to Power BI

Allow Admins to Assign Workspaces

Finally, still in Admin Portal, go to dataflow Settings

Switch Allow Workspace admins to assign workspaces to this storage account to On

Workspace admins can now assign workflows to the filesystem created

Things to Consider

Here is where it starts to get a little bit hazy

  • This is all very much still in preview and there will be lots of updates coming
  • Once your dataflow storage location is configured it Cant be changed so dont do this on a whim.
  • You have to be an owner of the dataflow or be authorised to the CDM folder in the data lake to use the data flow
  • Once you have created a dataflow you ant change the storage location
  • It is your organisations data flow so there can only be one.

because of this, and the fact that its still in development I am going to wait to set up a central storage account for our workflows.

Im still unsure what you would do with Workflows that are already set up, Do they stay in the default area or can you reprocess them into the central data lake.

What happens if you want to move the data lake to a new subscription? is it not possible?

I will be going back to this when I have a few more answers to these questions

Setting up an Azure Data Lake V2 to use with power BI dataflows in Service (As a data source)

Previous to the brand new Azure Data Lake, I was adding all the files into Blob Storage. However Azure Data Lake V2 is built on Blob storage and DataLake V1

its built for big data and a fundamental change is that we now have a hierarchical namespace. This organises your files into directories.

So now, we can do things like use all files from a specific Directory, delete all files from a specific directory. We can categorise our files within the data lake.

Set up Azure Data Lake V2 in Azure Portal

When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1

So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)

First of all I go to the Subscription where I want to add the new data Lake v2

Open up the Portal menu (Now hidden to the left of the screen)

Choose Create a resource

next Choose Storage and Storage Account

Note that the Account kind is Storage V2 (General Purpose)

Ive set the Location to North Europe, Simply because I know thats where our Power BI Data in Services is stored and I may as well stick with this.

For the time being, I am leaving everything else as standard

Next go to Advanced

the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2

Click Review and Create

Create a file System within a Container

We now have a few options available to us. I have some files to add so I am going to add them to a container

Click on Containers And then + File System

Click OK

clicking on your new storage account(s) you are told to Download Azure Storage Explorer.

I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.

Once downloaded Open Azure Storage Explorer

You will need to Add in your Azure Storage Accounts by clicking the little connector icon

You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication

This will log you into all your Subscriptions and Services

You are good to go

Here you find your subscription, Then go to the Data Lake Storage Gen 2 and find the new File system.

I have added a folder here called Workshop1Files to my File System

Obviously Data Lake Storage gives you so many ways of working with files and automating the files to the storage area. In this case I am going to simply move a file into my new folder to work with

Double click on the folder and then Click Upload and Upload Files

And now your file is in the cloud, in an Azure Data Lake ready to use.

Connect to your Azure File with Power BI Desktop

The first test is can we access this data within Power BI Desktop.

Open Power BI Desktop and Get Data

Choose Azure Data Lake Storage Gen2 (Currently in Beta)

Add the URL

Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder> 

Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder> 

If you go to Right click on the file in Storage Explorer and go to properties, there is a difference in structure

http://<accountname&gt;.blob.core.windows.net/<filesystemname>/<subfolder>

If you try to connect with the original URL from Data Storage you get the following error

And if you change the URL from blob to dfs

There is a missing part to the puzzle. Go back to the Azure Data Lake Storage Account in Azure and Add Storage Blob Data Reader to your account

Then try again and hopefully you are in .

No need to combine because we have specified the file.

There are different ways you can load the file. I loaded one file but you can load all files in the File System

https://storageaccount.dfs.core.windows.net/filesystemname

or all files under a directory in the file system (You can include sub directories in this)

https://storageaccount.dfs.core.windows.net/filesystemname/directoryname/directoryname

Connect to your Azure File with Power BI Data Flow

I am creating data flows in the power BI Service to ensure they can be reused across the company. The question is, Can I Connect to the above File in Service via a data flow

In Power BI Service, add a Data Flow which takes you into Power BI Query Editor in the Service. I already had some data flows connected to an Azure database.

The data is in Azure Data Lake Storage so the first think I do is try the Azure route

However, there is no Azure Data Lake Storage Gen 2. This must be something coming in the future. so then I go to File and click on Get Data text / csv

You will need to add the File Path and your Credentials (As per previous advice use dfs not blob in the URL. this seems a little flaky at the moment. I choose Organisational Account first before adding the URL and then it seems to work.

Remember Go back to Azure Storage Explorer. if you click on properties, you can grab the URL from here

We don’t need a Gateway Setting up because everything is now in the cloud.

Clicking next, Nothing happens, it just keeps bouncing back to the same window.

Attempting to use the Blob Storage connector also doesn’t work (Using the Azure Account Key as authentication).

with blob in the URL
With dfs in the URL

It would appear that currently I have hit a brick wall and there is no current DLGen2 connector for Data Flows.

I will be keeping an eye open on this because obviously, when you are pushing the new generation of Data Lakes and Data Flows then there needs to be a DLGen2 Connector for Data Flows.

Update

Had a reply back on the Power BI Forum (Not a good one)

The feature haven’t been planed. If there is any new message, the document: What’s new and planned for Common Data Model and data integration  will be updated.

I have found this in Ideas

https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/38930032-add-azure-data-lake-storage-gen2-as-a-data-sourc

Please help us get this working by voting for this idea.

Power BI November 2019 updates

To look at the new power BI Updates I went for my go to Data Set Adventureworks data warehouse

I have added the following tables

  • DimEmployee
  • Dimdate
  • DimSalesTerritory
  • DimProduct
  • DimProductSubcategory
  • DimProductcategory
  • Dimcustomer
  • FactInternetSales

Use the Updated Ribbon

Once I have added the data I want to start using the new power BI Ribbon in preview. go into Options Preview Features and select the Updated Ribbon feature

You have to restart to get the new ribbon

It looks quite different. Lets see how it goes…..

Decomposition Tree

A brand new AI visual available in November is the Decision tree. I have used the Key influencers with social media data which is great so i am very excited about this new visual.

With the new visual, we can do route cause analysis, Look how individual categories contribute to a whole and we can also rank categories against the selected measure.

Again, this is a preview feature so it needs enabling in options

Just like the Key influencers visual, we need something to Analyse and data to explain the analysis by.

Unlike Key influencers where we analyse by category (e.g positive, negative and neutral) Here we are analysing a metric

Lets pick something to look at

class breaks down the sales amount

Next, i clicked on the + next to H and chose product category. We can now see that Bikes takes up the majority of the H category

Next try using the AI feature, Options are: High and Low value. I’m going to choose high value. It will look though all the categories that are left and pick the category with the highest contributor to that metric

The lightbulb against the title lets you know its picked by AI

lets do the same but look at lowest value

Its is picking out Central region as the very lowest of the explain by values

This is fantastic, You can also lock a level so a user cant remove it, change the colours etc. I cant wait to use this on my actual working data sets

Advanced Gauge

Straight away, the new ribbon changes how you go to the marketplace

Now you go to More Visuals and From Appsource to get to the marketplace

go for the Advanced Gauge from xViz

I have Year to date Internet Sales and Quantity of products bought

I also have measures for Last year Year to Date Sales and quantity so I can add these into the gauge.

There are other options like setting minimum and maximum values and tooltips

Hierarchical Filter

I have used the Hierarchical slicer before, lets see how the new Filter looks (get from Appsource – Hierarchical Filter by Eviz)

You need a hierarchy in your data to use this visual. I have the Group Hierarchy (Group Country and region) One of the best things about this visual is that, not only do you use it to slice your data but you can add a measure which gives you even more information right on the slicer its self.

However, I don’t think it comes across as a slicer, something that the user can interact with.

Financial reporting Matrix by Profitbase

A Matrix with financial formatting options

The video talks about Sticky rows and sticky columns in that you can set them to always be on screen even while scrolling which is good.

However the Information on the Overview video uses

There is a note on this one. Its really hard to delete, it took me a fair while the other day to click it in the right place to delete what is possibly a measure called DateCalculations in the columns section so this may be one to look at in more detail later.

Distribution Visual

More Visuals – From App Source

This is supposed to be great for setting up a global legend, however no matter what I tried, I couldn’t get the visual to display any information so this one is a fail for me

Conclusion

Some Pretty good updates this month, Im specifically excited about the Decomposition tree.

Another major one for me is the LinkedIn Sales Navigator connector. However I’m not sure I have the right set up to use it at the moment.

the video explains that you need Enterprise Plan in either admin or reporting user roles you can use the connector but what is not mentioned is what this is? Is this Linked In?

I will be looking at this in more detail hopefully at a later stage

Create a visual Axis Category from a measure (Create Table with DAX)

Take this data as an example

  • ID      Total
  • 99                   1
  • 99                   1
  • 101                  1
  • 101                  1
  • 101                  1
  • 333                  1
  • 333                  1
  • 333                  1
  • 333                  1
  • 635                  1
  • 635                  1
  • 635                  1
  • 635                  1

If you do the usual measure SUM(Total) you would get 

  • ID       Total
  • 99                   2
  • 101                  3
  • 333                  4
  • 635                  4

But I don’t want the visual to show the above. I want the number of occurences on the axis and then the number of IDs within that occurence as the value

You cant use a measure on the axis, they can only be used as values. How you you solve this problem?

Go to Modelling and New Table

SummaryTable = 
SUMMARIZE (
    'fact Fact',
    'fact Fact'[ID],
    "Frequency", COUNT ( 'fact Fact'[ID])) 

SUMMARIZE Returns a summary table for the requested totals over a set of groups. In this case, Our group is the IDs.

For the Summary we are simply counting our IDs as the Frequency

Next, Add a new Table

PotentialFreqs = GENERATESERIES ( MIN (SummaryTable[Frequency] ),max('SummaryTable'[Frequency]), 1 )

This creates a list of Frequencies starting at 1 and Ending at our Max Frequency. there is is issue here. we have a lot of NULL Values that creates a large Frequency number that we don’t want in our series of frequencies. this means we have values from 1 to 98 and there should only be values 1 to 4. How do we remove these values?

SummaryTable = 
SUMMARIZE (    
        'fact Fact',    
        'fact Fact'[ID],    
        "Frequency", IF(
                       'fact Fact'[ID] 
                       <>"0",COUNT 
                       ('fact Fact'[ID]))  
          )

Note the addition of the IF statement. If the ID is not zero then count the IDs. Else do nothing

Create relationships

In order to use the new tables we need to create relationships between them

the Summary table is joined to the fact table by the ID (Business key) within the fact table

PotentialFreqs is joined to the SummaryTable via Value to Frequency

the first visual to try this out is a stacked bar chart. Value from PotentialFreqs table is in the Axis and Frequency from the Summary Table is used as the Value.

So we can see that 33 of our IDs have 3 records in the table.

the PotentialFreqs table is really useful if you want to plot your data in a histogram as it gives you a continuous type to plot against rather than categorical

In conclusion, if you need to use a measure on an Axis rather than a value, Create a summary table and then join to the main tables in your model

Update

Unfortunatly the above solution doesnt quite work because I want to base my metrics against each year. sometimes the IDs are split into Years. I will be looking at this later.

Design a site like this with WordPress.com
Get started