How do you do schema changes to your model without refreshing the entire model?
This is especially applicable for large models using incremental refresh where refreshing the entire data set takes a long time to process.
The ALM toolkit really helps with this process
The examples I am going to use aren’t incremental refresh but it will hopefully give some idea on how to use the ALM toolkit
And a big proviso here. You can only do this with Premium workspaces
Open your model in Power BI desktop. From here go to External tools and open the ALM Toolkit (Ensure you have the latest version of the AML toolkit downloaded)
The source has already been populated against the open model. We now need to set the target
Go to the service. Workspace Settings.
At this point the workspace needs to be in Premium
You need to copy the workspace Connection which is only available in Premium. Once copied, add into Workspace
We can see that the pbix file and the data set in service are identical. ‘Same Definition‘
Lets make some changes
Adding a new measure
add a new measure in your desktp pbix dataset.
Total Product Standard Cost = SUM(FactResellerSales[ProductStandardCost])
And back to ALM toolkit.
Before continuing, go into Options
Make sure that the Processing Option is default and that we are only going to process affected tables
For table updates, retain partitions should be checked, This is very specific to Power BI Incremental loading. Incremental loading works with partitions and we don’t want ALM toolkit to go to the database and affect any of the partitions ‘For example you could have a partition for every year’
And then Select Actions
Hide Skip objects will get rid of everything that hasn’t changed.
Now we can see the one new measure
We want to add the new measure without completely refreshing the data set
Then go to Update
Are you sure? message is returned
Quick note on XMLA Endpoints
At this point you need to have Read Write enabled on the XMLA endpoint
“XMLA endpoints enable open-platform connectivity to Power BI datasets. With these capabilities, customers can leverage a single one-version-of-the-truth semantic model across a range of data-visualization tools”https://powerbi.microsoft.com/en-us/blog/power-bi-open-platform-connectivity-with-xmla-endpoints-public-preview/
To do this you need to be the Power BI Admin or global Admin. In Admin Portal > Capacity settings > Workloads > XMLA Endpoint = read Write
If the next step doesn’t work you may need to speak to your Admin about the above. this is because we are using an external tool to update the schema.
When I tried this, I didn’t notice errors but the changes started to add up as I went along the process.
Back to the Update
Click OK to are you sure? and the measure is quickly updated. the database wasn’t read or updated. All that happened was a quick schema update.
Why would you want to run an upload, full or Incremental for new measures? this is the perfect way of introducing those new measures as quickly as possible.
Adding a calculated Column
Adding calculated columns aren’t advised because they don’t get compressed in the tabular database, Its always worth doing this in Power Query Editor or ideally in your source reporting database.
Back in the desktop file
A simple calculated column is created.
In ALM toolkit go back to Options
Again, the following options should be set
Then go to Compare. Select Actions and Hide Skipped Objects
And here is the new calculated Column.
Next click Validate Selection
If we were working with Incremental refresh and partitions you would see all your partitions here along with the above information.
If you have these partitions when you update you will see a process message, but don’t worry, it hasn’t done any data processing at the source, its simply added the new column and processed only within the model.
You can go into Power BI Service and do a quick Edit on a report,
drag the new column in to see that there is data for that column, but nothing else has been processed.
Adding new tables or new columns at source or in Power Query Editor
In this case we actually need the data to process.
- The measure just needed the DAX to be added to the model.
- The calculated column simply used the data already within the data model without doing a process.
For this new logic we do need to process the data.
Back in the ALM toolkit you need to untick, for table updates retain partitions because we now need to pull the data in for all the partitions and process all that data.
Process only affected tables is an important option here, We don’t want to process anything but the new table or column.
Once you have the settings set above you can go on to validate and update. And importantly this time the new data will be processed into the model, rather than just a schema change.
And every partition will be processed on an incremental load because of the change to the settings.
This is an excellent way of dealing with updates on large models but its important to note
- Only available for Premium
- Ensure Admin Settings – Read Write enabled on the XMLA endpoint