Power BI Dataflows, Computed Entities and Query Folding

When you create a dataflow in Power BI Service without using Premium capacity you may hit issues Like ‘cant refresh dataflow

Linked Entities are when you link objects from different dataflows within Power Query Editor. to create transformations over a linked entity you need to create a computed entity that references the linked entity. There are none of these in this particular project but I do have…..

Computed entities are entities that reference linked entities, and which rely on in-storage calculations for performance and scale. Unlike “normal” entities which use the Power Query M engine to extract data from external data sources, and to load it into CDM Folders in Azure Data Lake Storage gen2
https://ssbipolar.com/2018/10/23/dataflows-in-power-bi-overview-part-6-linked-and-computed-entities/#:~:text=Computed%20entities%20are%20entities%20that,Azure%20Data%20Lake%20Storage%20gen2.

Data had been duplicated from the original table in the dataflow, but it had accidentally been created as a reference so it couldn’t be done.

As soon as you reference another query in a dataflow a Computed Entity is created.

This led to a bit of research on differences between Query Editor in Desktop and Service and how you can avoid the above issues.

Its also very possible that the dataflow in Service is slower than in desktop. After advocating the use of dataflows when creating solutions within teams and large scale organisations I wanted to get more information.

To get more understanding on what is going on in the Dataflow:

The dataflow is stored as csv data. It may be partitioned into multiple physical files.
Power BI service ingests the contents of all of these files and loads them into memory.
It then performs a table-scan on each set of data to do the join or other transformations
It doesn’t use indexing of data typing because its schema-less (without the benefit of any indexing or data typing because schema-less.)
This is inefficient, and slow.
There are possibilities of moving your dataflows into your own Gen2 DataLake to extend capabilities over the data with Azure machine learning

Lets have a look at some Computed entity Examples

Referencing tables

Right click on customer and create reference to create additional the reference table

Note that the new table shows an icon of ⚡

Merging tables

Merging also created computed entities too which means that there will be no way to merge keys into fact tables when doing star schema modelling.

Append

Computed entities also occur with appended queries

So far it would seem that in order for dataflows to be the best option you should be in Premium capacity to make use of computed entities OR your transformations should be done at source.

Dataflows and Query Folding

There are also other things to consider when working with dataflows

It would seem that if you use the dataflow in Service, Power BI Query Editor cant use Query Folding at all.

Query Folding

This happens in Power Query editor to help increase performance. Query folding takes your edits and transformations which are also tracked as native queries in SQL (For example)

This ensures that the queries can done on the original data source without taking up Power BI resources

There are benefits like faster refresh and automatic compatibility with Direct query. This is because the transformations in Power Query Editor are done at source and uses the results to load into power BI tabular model.

You achieve best data refresh performance by ensuring that query folding occurs
https://docs.microsoft.com/en-us/power-bi/guidance/power-query-folding

Before we have a look at Query Folding in a Desktop File lets get some answers on what does and doesn’t support Query Folding. there is no definitive guidance on this so these are only a few examples

What does support Query Folding

Filtering (on rows or columns)
Aggregates and GROUP BY
Joins
Pivot and unpivot
Duplicating columns
Numeric calculations
Simple transformations, such as UPPER, LOWER etc
Removing columns.
Renaming columns
Filtering rows
Merging queries
Appending queries
Custom columns with simple logic
Pivot and Unpivot

What doesn’t support Query Folding

Merge columns
Indexes
Changing Date to Year, Month name etc
Merging queries based on different sources.
Appending queries based on different sources.
Adding custom columns with complex logic.
Changing a column data type.

Desktop Power Query Editor

To look at query folding, the source can’t be a Power BI report over a data file because only a database can support Query folding like Azure SQL DB for example.

Right click on your step. If View Native Query is enabled, the SQL can track the M Query

You can also look at the SQL that has been created

select [].[DateKey] as [DateKey], [].[FullDateAlternateKey] as [FullDateAlternateKey],
[].[DayNumberOfWeek] as [DayNumberOfWeek], [].[EnglishDayNameOfWeek] as [EnglishDayNameOfWeek],
[].[SpanishDayNameOfWeek] as [SpanishDayNameOfWeek], [].[FrenchDayNameOfWeek] as [FrenchDayNameOfWeek],
[].[DayNumberOfMonth] as [DayNumberOfMonth], [].[DayNumberOfYear] as [DayNumberOfYear],
[].[WeekNumberOfYear] as [WeekNumberOfYear], [].[EnglishMonthName] as [EnglishMonthName],
[].[SpanishMonthName] as [SpanishMonthName], [].[FrenchMonthName] as [FrenchMonthName],
[].[MonthNumberOfYear] as [MonthNumberOfYear], [].[CalendarQuarter] as [CalendarQuarter],
[].[CalendarYear] as [CalendarYear], [].[CalendarSemester] as [CalendarSemester],
[].[FiscalQuarter] as [FiscalQuarter], [].[FiscalYear] as [FiscalYear],
[].[FiscalSemester] as [FiscalSemester], [].[FullDateAlternateKey] as [FullDateAlternateKey - Copy]
from [dbo].[DimDate] as [_]

Moving onto the next step. Creating month name from date

Query folding is greyed out. You cant query fold on this step because SQL cant be created to represent creating the month name from a date.

And If the date is duplicated again after this, View Native Query is also greyed out. Once you have done a step that prevents query folding, no more query folding can happen.

This represents quite a complex problem. If you want to use Query folding for faster refresh you need a very in-depth understanding on what you can and cant do with Query folding (Like creating an Index etc) and attempt to do everything first before you introduce steps that will stop query folding.

After developing lots of Power BI reports, I cant even imagine being able to think at this level.

Merge

lets have a look at a step you can do that works with Query folding

Merging your Query allows you to keep Native Query

from [dbo].[DimProduct] as [_]
) as [$Outer]
left outer join [dbo].[DimProductSubcategory] as [$Inner] on ([$Outer].[ProductSubcategoryKey2] = [$Inner].[ProductSubcategoryKey])

So, its really worth doing, especially if you are working with a large data set. However its a difficult one to achieve.

Service Power Query Editor

We have looked at Query folding in Desktop. Now lets look at Query Editor in Service, which is the recommended way to go at enterprise level (dataflows).

When you right click on a Step in Service, the option is not available.

You cant query fold in the dataflow and there currently doesn’t seem to be any plans to do so.

Tips

Anything that cant be query folded would be better off created within your SQL Database reporting area
Look through all your applied steps. find where Native query option is disabled. Are there any steps after this that could use Query folding. It may be worth trying to restructure the order of steps
If you use dataflows you cant do any Query folding which needs to be an accepted negative for going the dataflow route
If you cant query fold there is a possibility of using Native Queries. However be aware that once you do this, no query folding can happen if you are in Desktop.

.xlsx, .csv, .txt files (etc) cant use query folding but don’t tend to be large data sets
Do as much transformational processing as you can at source. for example a SQL Reporting database

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30