When you create a dataflow in Power BI Service without using Premium capacity you may hit issues Like ‘cant refresh dataflow
Linked Entities are when you link objects from different dataflows within Power Query Editor. to create transformations over a linked entity you need to create a computed entity that references the linked entity. There are none of these in this particular project but I do have…..
Computed entities are entities that reference linked entities, and which rely on in-storage calculations for performance and scale. Unlike “normal” entities which use the Power Query M engine to extract data from external data sources, and to load it into CDM Folders in Azure Data Lake Storage gen2https://ssbipolar.com/2018/10/23/dataflows-in-power-bi-overview-part-6-linked-and-computed-entities/#:~:text=Computed%20entities%20are%20entities%20that,Azure%20Data%20Lake%20Storage%20gen2.
Data had been duplicated from the original table in the dataflow, but it had accidentally been created as a reference so it couldn’t be done.
As soon as you reference another query in a dataflow a Computed Entity is created.
This led to a bit of research on differences between Query Editor in Desktop and Service and how you can avoid the above issues.
Its also very possible that the dataflow in Service is slower than in desktop. After advocating the use of dataflows when creating solutions within teams and large scale organisations I wanted to get more information.
To get more understanding on what is going on in the Dataflow:
- The dataflow is stored as csv data. It may be partitioned into multiple physical files.
- Power BI service ingests the contents of all of these files and loads them into memory.
- It then performs a table-scan on each set of data to do the join or other transformations
- It doesn’t use indexing of data typing because its schema-less (without the benefit of any indexing or data typing because schema-less.)
- This is inefficient, and slow.
- There are possibilities of moving your dataflows into your own Gen2 DataLake to extend capabilities over the data with Azure machine learning
Lets have a look at some Computed entity Examples
Right click on customer and create reference to create additional the reference table
Note that the new table shows an icon of ⚡
Merging also created computed entities too which means that there will be no way to merge keys into fact tables when doing star schema modelling.
Computed entities also occur with appended queries
So far it would seem that in order for dataflows to be the best option you should be in Premium capacity to make use of computed entities OR your transformations should be done at source.
Dataflows and Query Folding
There are also other things to consider when working with dataflows
It would seem that if you use the dataflow in Service, Power BI Query Editor cant use Query Folding at all.
This happens in Power Query editor to help increase performance. Query folding takes your edits and transformations which are also tracked as native queries in SQL (For example)
This ensures that the queries can done on the original data source without taking up Power BI resources
There are benefits like faster refresh and automatic compatibility with Direct query. This is because the transformations in Power Query Editor are done at source and uses the results to load into power BI tabular model.
You achieve best data refresh performance by ensuring that query folding occurshttps://docs.microsoft.com/en-us/power-bi/guidance/power-query-folding
Before we have a look at Query Folding in a Desktop File lets get some answers on what does and doesn’t support Query Folding. there is no definitive guidance on this so these are only a few examples
What does support Query Folding
- Filtering (on rows or columns)
- Aggregates and GROUP BY
- Pivot and unpivot
- Duplicating columns
- Numeric calculations
- Simple transformations, such as UPPER, LOWER etc
- Removing columns.
- Renaming columns
- Filtering rows
- Merging queries
- Appending queries
- Custom columns with simple logic
- Pivot and Unpivot
What doesn’t support Query Folding
- Merge columns
- Changing Date to Year, Month name etc
- Merging queries based on different sources.
- Appending queries based on different sources.
- Adding custom columns with complex logic.
- Changing a column data type.
Desktop Power Query Editor
To look at query folding, the source can’t be a Power BI report over a data file because only a database can support Query folding like Azure SQL DB for example.
Right click on your step. If View Native Query is enabled, the SQL can track the M Query
You can also look at the SQL that has been created
select .[DateKey] as [DateKey], .[FullDateAlternateKey] as [FullDateAlternateKey],
.[DayNumberOfWeek] as [DayNumberOfWeek], .[EnglishDayNameOfWeek] as [EnglishDayNameOfWeek],
.[SpanishDayNameOfWeek] as [SpanishDayNameOfWeek], .[FrenchDayNameOfWeek] as [FrenchDayNameOfWeek],
.[DayNumberOfMonth] as [DayNumberOfMonth], .[DayNumberOfYear] as [DayNumberOfYear],
.[WeekNumberOfYear] as [WeekNumberOfYear], .[EnglishMonthName] as [EnglishMonthName],
.[SpanishMonthName] as [SpanishMonthName], .[FrenchMonthName] as [FrenchMonthName],
.[MonthNumberOfYear] as [MonthNumberOfYear], .[CalendarQuarter] as [CalendarQuarter],
.[CalendarYear] as [CalendarYear], .[CalendarSemester] as [CalendarSemester],
.[FiscalQuarter] as [FiscalQuarter], .[FiscalYear] as [FiscalYear],
.[FiscalSemester] as [FiscalSemester], .[FullDateAlternateKey] as [FullDateAlternateKey - Copy]
from [dbo].[DimDate] as [_]
Moving onto the next step. Creating month name from date
Query folding is greyed out. You cant query fold on this step because SQL cant be created to represent creating the month name from a date.
And If the date is duplicated again after this, View Native Query is also greyed out. Once you have done a step that prevents query folding, no more query folding can happen.
This represents quite a complex problem. If you want to use Query folding for faster refresh you need a very in-depth understanding on what you can and cant do with Query folding (Like creating an Index etc) and attempt to do everything first before you introduce steps that will stop query folding.
After developing lots of Power BI reports, I cant even imagine being able to think at this level.
lets have a look at a step you can do that works with Query folding
Merging your Query allows you to keep Native Query
from [dbo].[DimProduct] as [_]
) as [$Outer]
left outer join [dbo].[DimProductSubcategory] as [$Inner] on ([$Outer].[ProductSubcategoryKey2] = [$Inner].[ProductSubcategoryKey])
So, its really worth doing, especially if you are working with a large data set. However its a difficult one to achieve.
Service Power Query Editor
We have looked at Query folding in Desktop. Now lets look at Query Editor in Service, which is the recommended way to go at enterprise level (dataflows).
When you right click on a Step in Service, the option is not available.
You cant query fold in the dataflow and there currently doesn’t seem to be any plans to do so.
- Anything that cant be query folded would be better off created within your SQL Database reporting area
- Look through all your applied steps. find where Native query option is disabled. Are there any steps after this that could use Query folding. It may be worth trying to restructure the order of steps
- If you use dataflows you cant do any Query folding which needs to be an accepted negative for going the dataflow route
- If you cant query fold there is a possibility of using Native Queries. However be aware that once you do this, no query folding can happen if you are in Desktop.
- .xlsx, .csv, .txt files (etc) cant use query folding but don’t tend to be large data sets
- Do as much transformational processing as you can at source. for example a SQL Reporting database