Azure – Debbies Microsoft Power BI, SQL Fabric and Azure Blog

June 6, 2024July 4, 2024

Microsoft Fabric Part 10. Taskmaster Project. Creating the Reporting and the Project File (PBIP)

So far we have created our Delta PARQUET files in the Delta Lake using Notebooks with Pyspark.

We have created a Semantic Model – Storage mode Direct Lake in fabric

Its time to create some visuals. but the question is. Do we create them in Fabric. or in a bpix file?

Reporting Fabric or Pbix File?

Fabric – Centrally managed

Pbix – offline Development and version controlling available for the files with .pbip (Project file)

For this project we are going to go for the hybrid approach. The semantic model in Fabric. And reports developed in Desktop with Devops Version controlling which are published into Fabric.

This gives us better opportunities for version control and collaboration.

Get Data

In Power BI Desktop

Power BI Reporting

Our front end reporting wont be gone into too much detail here. We are more interested in other areas of Fabric. So here are the basic pages

Drill through to Series

And Drill through to Episode

We now have a pbix Report we can publish through to our Fabric Workspace

Create the Power BI Project (PBIP)

Instead of simply saving as a pbix (black box) file, lets save as a project file instead and see how this can really change how we work with others in Power BI. We should see benefits like:

Items are stored in JSON format instead of being unreadable in one file
JSON text files are readable and contain the semantic model and report meta data
Source Control. Finally real source control for Power BI
Amendable by more than one person at a time?
The possibility of using (CI/CD) Continuous Integration and Continuous Delivery with Power BI

Saving as a project is in preview so lets turn it on.

Options and Settings / Options

TMDL

This was mentioned in the Power BI June 2024 Updates.

TMDL is the improvement to PBIP as the semantic model file format for the Power BI Project files.

Our Semantic model has been created within Fabric. we might look at this in more detail later.

And now we can Save as

And we can see that its a project in the title.

Clicking on the title shows us file paths of the objects. we only have the report because the semantic model has been created within Fabric.

Lets have a look at what we have created in Explorer

The main project file

The reporting folder

Our objects within the reporting folder.

Currently this is in One Drive. We want to get this added into Devops and GIT so it can be added into our full source control process along with all the Fabric code.

Azure DevOps build pipelines for continuous integration

First of all we need to make sure Fabric is connected to Azure Devops which it is

And cloned to a local drive. This was done using Visual Studio but there are other ways you can clone.

It would have been better to have saved into this cloned local project, But we can create a folder and move instead.

Created a Power BI Folder.

And Moved all the objects mentioned above into this folder

Its in local but not yet in the cloud.

I have installed Git Bash to work with https://git-scm.com/downloads

In GIT Bash I change directory to the correct GIT Directory (Local) and us ls to list the information in there

cd source/repos/DebbiesTraining

ls

git status

We can see we have powerbi code that is uncommitted and needs pushing to the central GIT repository.

We don’t really want to work on the main branch. A better cleaner process is to create a feature branch. InitialPowerBIFile

git add .
The . in git add . is a wildcard that represents all files and directories. basically telling GIT to stage all changes.

git add powerbi

just adds the power BI folder. We dont really want to do anything with fabric as this is taken care of already.

git commit -m "InitialPowerBIFile"

-m flag is allowing you to provide a commit message in the command. And our command is committing changes

So far we have added the files to the staging area and committed them. Time to Push to the cloud

git push --set-upstream origin InitialPowerBIFile

You will initially get a log in screen to authenticate the push

And this is the line you want to see. We know that we have pushed it to devops

Back in DevOps

Remember to change to the correct branch. You wont see it in Main.

We can see the Power BI Items.

Create a Pull Request

Very simple. We are in Dev and don’t have any reviewers or work on Boards to connect to

Create and Complete

Power BI Files are now in Main

How do you now work with the power BI Project file?

Open the project file from your local source control

lets create a new quick report page for Demographics

And save

Back to GIT

git checkout -b "InitialPowerBIFileAllDemographicsPowerBIPage"

git status

We can see we have modifications

git add .
git commit -m "InitialPowerBIFileAllDemographicsPowerBIPage"
git push --set-upstream origin InitialPowerBIFileAllDemographicsPowerBIPage

Back in Devops

Create and Complete merge. (If you don’t use the reviewing process in Development)

We can see the page is in with lots of work added into this part of the JSON script.

Where is the Semantic Model?

The Semantic model has been created in Fabric.

It says synced on GIT Status.

And here is is in Devops.

Conclusion

The above can be done in Power BI Pro as well as Fabric. Which is good news.

However, my worry about this is that someone creating Pbix Reporting won’t take to the whole GIT process. I suspect that when it comes down to it, it simply wont be done properly.

If you were to go the pbix file route, this would need a lot of governance work to get people to use GIT. At Enterprise level this would be doable but I doubt this would become a real part of the process at self service level.

I did start to question the process. after doing some more reading. It feels like if you created the reporting inside of Fabric, The GIT Process would be simplified. Even though the documentation states that for full GIT control use PBIX. I will be really interested in creating a Power BI Report inside of Fabric to see how this would work in the process at some point.

In the next blogs we are going to look at some new features of Fabric. We can also look at using this feature along with with Devops Pipelines later. (Premium and Fabric only)

October 20, 2023

Azure Synapse – Creating a Type 1 Upload with Pipelines

Working through exercises and learning paths for Microsoft Synapse is a really good way of becoming familiar with concepts and tools. And whilst these are a fantastic source of learning. the more you do, the more questions you end up asking, Why use this option over this option for example?

SO, lets look at one of the initial Pipeline creation exercises for Azure Synapse in more detail and try and answer questions that arise.

https://microsoftlearning.github.io/dp-203-azure-data-engineer/Instructions/Labs/10-Synpase-pipeline.html

The end game is to take a file from the Serverless SQL Pool (Data Lake) and load it into the dedicated SQL Pool. The Synapse Data Warehouse.

But if the data is already there, we simply update it without adding it a duplicate record.

for this you need Azure Synapse and a Data Lake

Analytics Pools

In Synapse Studio, Go to Manage

Our starting points are the Built in, Serverless SQL Pool Connected to a datalake (storage account resource in Azure)

We also have a Dedicated SQL Pool which you can see in data

This DB can also be used as Serverless without having to un pause the dedicated capacity by adding External data source, External File Format and External Tables.

Back to Manage. We now know we have SQL Pools. Both Serverless and Dedicated. Serverless will be our source. Dedicated is the destination.

Linked Services

Default Linked Services

The following were already created. Its useful to understand what is used and not used for this dataflow

Name synapse*********-WorkspaceDefaultStorage
Type: Azure Data Lake Storage Gen 2

This is the default linked service to the Data Lake Storage Gen 2 in Azure. You can find this in Azure as your Storage account resource.

Name synapse********WorkspaceDefaultSqlServer
Type: Azure Synapse Analytics
Authentication Type :System Assigned managed Identity

This one isn’t used for this particular process. However what makes it different to the other Azure Synapse Analytics Linked Service that has been set up (Below)? We will answer that shortly

Name DataWarehouse_ls
Type: Azure Synapse Analytics
Authentication Type :System Assigned managed Identity
Database Name: TrainingDB

This was added for this project and is the Dedicated SQL Pool Database.

The answer seems to be that there is no difference, apart from the original is parameterised. Here are the differences.

Fully qualified Domain Name: In the one created it was called synapse*****.sql.azuresynapse,net . In the automated Linked Service its tcp:synapse*****.sql.azuresynapse,net,1433. This has more detail, like the transformation control protocol and the tcp port.

DatabaseName: In the one created it was called TrainingDB after the database in Synapse Workspace. In the automated Linked Service its @{linkedService<>.DBName)

Parameters: the created one doesnt have any parameters. The default ls has Name DBName type String.

It will be interesting to see if changing the linked service to the default one will change things for us.

Integrate

For this task we create a Pipeline Load Product Data Type 1 and, in Move and transform

We use a data flow.

Copy activity is the simpler process to load data. Specify the source, sink and data mapping. the data flow can also be used to transform the data as well as copy. At run time the data flow is executed in a Spark environment rather than the Data Factory Execution runtime

Copy Activity: is around £0.083 and hour. the orchestration is around £1.237 per 100b runs

Data Flows: starts a t£0.0228 an hour

The DataFlow is called LoadProductsData

Settings

Staging should only be configured when your data flow has Azure Synapse Analytics as a sink or source. Which it does with the Data Warehouse Destination (Sink)

Double click on the dataflow to get into the details

Source

Sources are the first to be created

Source Type – Integration dataset, Inline, Workspace DB

There is no information in the dataflow as to what the source types are and what is best to choose. Looking at the documentation we can glean the following:

Some Formats can support both inline and dataset objects. Dataset objects are reusable and can be used in other data flow and copy activities. they are good for hard schemas. Datasets are NOT based on Spark.

Inline datasets are good for flexible schemas or one off instances. They are also good for parameterised sources. Inline datasets are based on Spark.

The integration dataset allows you to connect straight to the linked service object. Whereas inline connects to the linked service set up in Data Factory linked Services.

Basically. the integration Dataset bypasses an extra step. And Workspace DB allows you to do the same but its only available for Azure Synapse Analytics objects (Like the data warehouse)

So for this project. The integration data set has been used which is not based on spark.

Data set

A Data set needs to be created.

Type: Azure Datalake Storage Gen2
Format: Delimited text (Because we are working with csv files)
Name: Products_Csv
Linked service: synapsexxxxxxx-WorkspaceDefaultStorage (Find in Manage, Linked Services. this is the Azure Data Lake Storage Gen2 Linked Service)
File path: files/data/Product.csv (In the data lake we have a files collection. then a data folder)
First row as header: Selected (there is a header in the csv files)
Import schema: From connection/store (Take from the actual data source)
Allow schema drift: Selected

In Data Factory there is a dataset folder where you can find all your data sets. And of course, parameterise them so you can use for example, the same csv data set for all your csv file. However there is no data set folder within Integrate.

So where can you find your datasets one created in Synapse?

Schema Drift

What is Schema Drift? If you have a schema which changes. For example, new columns are added and old deleted. You would have to develop against this drift, making sure that you are capturing the right information. this could mean a lot of development work and affects your projects, from source, through to your Power BI Analytics models.

It would be really useful if we could remove some development work and allow the Pipeline to deal with this.

Schema Drift allows for schema remodelling without constant redevelopment of upstream schemas.

Projection

Here we see the projected columns from the data source. Hopefully in the flow we could also parameterise so a data flow with the same functionality could control multiple dataflows. e.g. 5 csv files being moved to 5 tables in a database.

Another source is also created

Data transformation – Lookup

from the + under the first source. Lookup is selected

Here we take the Products in the csv file and look them up against the destination table in the dedicated SQL Pool.

Match on

Why has last row been chosen? If we chose Any row. we just specify the lookup conditions. Which is from the csv files Product ID to the dedicated SQL pools ProductAltKey. So this is the only lookup required. Great when you know there are no duplicates.

Because last row has been chosen, a sort order is required to the data which is the AltProductKey.

Match on

Note that this is all that is being used in this instance. Does the business key already exist in the data warehouse?

Alter Row

From the lookup, a new activity is selected. Alter row

Now we are loading into the DW.

The incoming stream is the previous activity. Here we add Alter Row conditions.

In this example the join between the tables find two matches. Where it doesn’t match there is Null for both the ProductKey and ProductAltKey in the results set.

Therefore, If ProductKey in the warehouse is null in the results set. Insert the record. If its not null then UPSERT. (Update and Insert)

We could refine this with more logic where we check the actual data row and if its not changed we can do nothing. Only update and insert when the row has changed. This is better for larger data sets.

Debugging

Now we have a simple data flow its time to turn on the spark cluster and debug.

Once the spark pool has started, you can click on the end sink and go to Preview

The icons to the left show New and Upserted records. There were three ProductIDs that already existed in the data warehouse. Now we can also go back to the pipeline and trigger now

So what has happened?

In monitor you can check how the flow is doing. Once it shows as succeeded, go to the table and run a SQL statement

The first thing to note is that we now have all the records from the csv file in the dedicated SQL pool data warehouse. But what about the staging options in the dataflow settings?

If we now go into Data – Linked and to the Primary datalake store we can see we have a folder called Stage_Products. Its empty. So what does it do?

If you run it again and refresh, we can see a PARQUET file is created before being deleted.

Basically the source table is loaded into the staging table. Then the transformation process can work over the staging table, rather than using the actual source file. Once completed the staging file is removed.

So throughout the creation of the basic type 1 dataflow, a lot of questions have been asked and answered. And a lot of ideas for more complex load processes. How to use Parameterisation of objects where possible. How to create Reuseable content?

So lots more to come.

March 28, 2022March 28, 2022

Azure Logic App – Copying a file from Sharepoint to a Data Lake

I have been asked to set up a Logic app in Azure (That is Power Automate for anyone outside Azure) to copy specific file(s) from a Sharepoint folder and add to an Azure Data Lake.

The first example file is around 16,00 rows and not likely to grow too significantly. This is the same with the other files.

There is a specific use case behind this First logic app:

The Data in the csv file(s) is updated every day so the file name remains the same
We need to copy the File and overwrite the file in the data lake every day after the task has been done to update the Sharepoint File (Around 5PM every day)
we want the Logic App to run via Data Factory
Once the logic app has run we want to trigger the pipeline to populate the SQL database from the file in the data lake.

Set up the Logic App

In azure go to Logic App and New

Log Analytics: to get richer debugging information about your logic apps during runtime

Consumption Plan: Easiest to get started and fully managed (Pay as you go model). Workflows increase slowly or are fairly static

Standard Plan: Newer than the consumption plan. Works on a single tenant. Works on a flat monthly fee which gives you potential cost savings.

Create the Logic App

Once you have added tags and created its time to create the logic App

Because we want to trigger in Azure Data Factory we want to go for When a HTTP request is triggered

The HTTP Post URL will be used in Data Factory to trigger the Logic App.

I have added a JSON Schema that supports some of the important information for this project. Like Container for the data lake, Folder , File name and isFolder (Which becomes more important a little later.

{   
     "properties": {       
        "Container": {            
           "type": "string"        
        },        
        "fileName": {            
           "type": "string"        
        },       
       "folder": {           
            "type": "string"        
       },        
       "isFolder": {            
           "type": "boolean"        
       }    
  },    
"type": "object"
}

List Folder

Now we want to List Sharepoint folder. So create a new step and search for List Folder

Returns files contained in a Sharepoint Folder.

Next you have to Sign into Sharepoint with a valid account that has access to the Sharepoint site.

Here is where we have a question. For this test, my own username and password has been used but obviously I change my password at certain points which means that this will need manually updating when this happens.

What we need is a way of logging into Sharepoint that isn’t user related and we can use within the logic app. This needs further thought and investigation.

When you log in you create a Sharepoint API connection in Azure Resource Group

To get the site address you can go into Sharepoint, Click on the … against the file and copy link.

The link needed amending slightly because it needs to be tenant.sharepoint.com/sites/ProjectMainArea

If you have access you should then be able to click the folder against File Identifier and select the correct area

For Each

Next Stop, For each ‘Body’ from the List Folder step. We get the File Content. Go to Next Step and choose the For Each Condition (Because there will be multiple files)

Get File Content

Now We want to Get File Content From Sharepoint

Gets File contents using the File Identifier. The contents can be copied somewhere else or used as an attachment

You need to access the same Sharepoint site address as before. Then click on File identifier and choose ID from the Sharepoint Dynamic Content pop up

so here we can see that from the list folder step we have lots of file metadata we can use like DisplayName. ID, LastModified etc.

We know we need ID for Get File Content

We are at a point where we can run this now as a test.

Note that so far we have this set up

but we hit specific issues

Status 404 File not found

cannot write more bytes to the buffer than the configured maximum buffer size of 10457600

So we have two issues to resolve and after a bit of help on the Q&A Forums we find out that:

List Files “Returns files contained in a Sharepoint Folder. ” Actually also returns folders which are erroring because they are not files

Logic Apps aren’t really set up for large files. There doesn’t appear any way we can get past the size issue. So we need to check our files and also think of ways to bring through smaller data sets if needs be.

Thankfully our files are way below the threshold and the business thinks that they won’t increase too much.

So here is where we can start applying criteria, which we want to do anyway because we only want certain files.

If its a folder we don’t want to use it
If its over 10457600 in size we don’t want to use it
Only bring through files called…….

So we need to change our For Each

Within For each add a new step and search for Condition

And add your conditions (And Or)

Then you can move the Get File content into True

So If IsFolder is false and the size is less than 10457600 we can grab file A OR File B.

When you now test this Logic App Get File content should succeed with most not even hitting the criteria.

Create Blob

Finally within the True section we need to add the file to our Data Lake.

Search for Create Blob

Here you have to sign into your Blob Storage which again creates another API Connection in Azure

You have to supply the Storage account name and choose an authentication type. Access Key has been used, the details added here. Normally in data Factory the Access Key is obtained through a Key Vault so, more information is needed to come up with the most secure way of doing this. There are two other authentication types to choose from.

More investigation is needed into these other approaches.

Now we can do a full test of the Logic App

Testing the Logic App

When you trigger the logic app

The Body contains a long list of every object. Really handy to know what the details are inside this action.

To test this was copied into a word document.

Next comes the Get File Content

Now most of the files don’t satisfy the condition.

Next was clicked to get to a file in Get File Content (first one appeared as number 32)

And now we can see the Body of the Create Blob. (This happens for every file specified in the criteria

And if you use Microsoft Storage Explorer app you can check that they have indeed been updated (Either its a new file or it updates what is already there)

Data Factory

Now we have saved the Logic App we want to trigger it in Data Factory

Create a pipeline and choose a web activity

Copy the URL from the Logic App and paste here

For the Body I simply used the Simply JSON at the start of this article.

Now you can trigger this pipeline along with all your other pipelines to run the Data into your Data Lake and then into SQL to be used for Analytics.

https://www.mssqltips.com/sqlservertip/5893/transfer-files-from-sharepoint-to-blob-storage-with-azure-logic-apps/

August 13, 2021

Azure SQL Database Dev to Production Part 4

I have had quite a lot of issues in regards to the whole dev to prod process with the SQL database. my last attempt which I wrote in this blog worked well until I shut the project down. Then once reopened I would always lose my Project or GIT, so I went back to the drawing board and did a lot more research and here are my new findings.

There is a Data Factory part to this but I have already made a blog about this and it has consistently worked ever since setting up the dev to prod process

Resources used

Azure SQL Database and Server
Visual Studio (Enterprise)
Azure DevOps

Azure Devops Repository

First of all you need to have an Azure Devops set up (I wont go into detail on this here)

In the Devops repos I have a Folder for Data Factory in the repository. The Folder for SQLDB will be created later.

In Devops Ensure you have a GIT repos Created Then Click CLONE to copy the GIT location URL

In this example I am cloning right at the top of the repository

Click the Copy Button

Visual Studio

You cannot do this as yet in Visual Studio code. It has to be Visual Studio and I have Visual Studio Enterprise 2019

Ensure SSDT is installed in Visual Studio https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15
Go to visual Studio and In Get Started select Clone Repository

You can Copy the Repository Location from the Clone Copy (Or Browse a Repository – Azure DevOps or GitHub)
make sure you are happy with the path for the Local Copy

Click Clone. Your Local repository is then shows in Solution Explorer
This has added the folder on your C Drive (It added the top level and the dataFactory and SQL DB Folders)
And You can see this project in Solution Explorer

Copy path of C Drive (And the folder for example SQLDB)
Visual Studio – In the top bar choose create File – New – Project
Choose SQL Server Database Project – Next

The Project with be SQLDB and will contain the SQL Objects

Click Create

In Visual Studio SOLUTION EXPLORER: You can see your empty database objects.

On the C Drive, Note you now have a SQLDB folder along with the Data Factory Folder

Right click on the database name in Solution Explorer and go to properties

Its important to be in the right version for the target platform

Right click on database name and Import – database

Select the Connection location of the development database (and Show connection properties to make sure your username and password are ok and the database connects.

Import objects into the Local Project (No need to Select it in the above box)
then Click Finish Note that all your objects are now in Solution Explorer and on the C: Drive (Your local copy)
Is the project Complete? – Build – Rebuild solution which checks and validates the objects

Any time anything changes you need to rebuild your solution to update the code.

Warnings and Errors

For Warning and errors you can see all the issues by clicking on them. The build may fail because of errors. These always need resolving before you send to the target DB e.g. Production

Error Example Warning: SQL71558: The object reference [staging].[].[KEY] differs only by case from the object definition [staging].[ST2].[Key].

Click click on the database in Solution Explorer and go to Properties.
In project Settings untick validate casing on identities

Error Example Warning SQL71502: Procedure: [dim].[USP_Date] has an unresolved reference to object [sys].[all_objects].

You can add the master database as a reference (Right Click on references)

Add Database Reference

Rebuild your codebase. Its important here to make sure your warnings and errors have been dealt with

Rebuild updates your project locally after updating – I will look at how making changes with for example SQL Server Management Studio changes the process in a later blog.

Publish to Git Repository – GIT Changes and make a note of your change Commit All

Then click the arrow to Push changes to the GIT repository

We now have the code in the repository in DevOps

It seems annoyingly easy to slightly mess your folder structure up. Here I have a SQL DB Folder and another SQLDB Folder inside.

I only wanted the one. This does keep happening to me and its very frustrating. Any pointers to where I went wrong would be really appreciated

Create your CI (Continuous Integration) Pipeline in Azure Devops

Now we have the Code in GIT we can create our artifacts for the release pipeline.

In Azure Devops go to Pipelines and Release Pipelines
Click New Pipeline

And choose your repository
Select a template

Right click on tasks and remove selected tasks until you are left with the following

You don’t really have to do much with these three jobs

At the Pipeline level ensure you use the right Agent. For example Windows -2019. We had errors because we use an OpenJson function in the SQL code but setting the right agent resolved this issue

All the other jobs are parameterised. This should now be all set

Save and Queue and you can then run the pipeline to create your artifacts Save and Run
There may be warnings here. For some reason the warnings you clear in Visual Studio seem to show in Devops. I would like to do a bit more research on this.
But If a warning hasn’t failed the process you should now have your Continuous Integration artifacts.

Create your CD (Continuous Delivery) Release Pipeline in Azure Devops

Now we are onto Continuous Delivery. Moving the new code into Azure SQL DB

In Azure Devops to to Pipelines – Releases – New release

So for Artifacts click Add

We want to use the latest build

now Add a Stage. In our case we are using Prod so its a simple release

Start with an Empty Job

Add a task to the job

Here we chose the production Subscription

And we link to the DACPAC file that was created with the build from visual Studio. The DACPAC contains all the objects in SQL

The database uses variables and you can set these up in the variables tab

You can create a release to update your Production database

Once pushed, check your SQL database in Production to make sure you are happy that your changes have gone through.

And you can save your visual studio project and reopen. the next stage is to update some objects and go through the process again so watch this space

April 28, 2021

Autoscaling with Power BI Premium Gen 2

We have been working with Power BI Premium for a few weeks. Simply switching it on and seeing how it goes.

However, Its worth paying some attention to Autoscaling that you can get with Premium Gen 2 (Preview)

Auto Scaling

Previously our Power BI Premium capacity could struggle when there was high capacity. For example If we reached full capacity, someone’s automatic refreshes would fail. Or if too many things were happening on the server, report users would find that reports were taking longer to render.

There are lots of use cases where this could happen so Auto scaling is definitely something that can help with these issues.

You can now scale and autoscale using Azure Pay as you go, which is around £62 per vCore for a 24 hour period

We use the DTU Pricing structure in Azure for SQL DBs. What is the difference between the DTU and the vCore pricing structure?

Autoscaling is an opt in feature and can be charged to an Azure Subscription

Once the spike is over, scale down happens and you stop paying for the scale up.

Autoscale Notifications

Toasts pop up in Admin Portals Capacity Settings to let your admin know when autoscale is running.

It would be good to be able to tell everyone working with Power BI when this is happening simply for reference

Get Started with setting up Autoscaling in Azure

Go into Azure and Select Subscriptions. You need to have made the decision before hand on which subscription can be used for Autoscaling billing.

Next create an auto scaling resource group

Enable Autoscale in Power BI Admin Portal

To do this you need to be the Power BI Admin (Or Global Admin)

Another addition is that the person needs to be at least a contributor on the Azure Subscription to go through all the steps succesfully

Go to Capacity Settings

Make sure Premium Generation 2 is already enabled

Then Select Manage Auto Scale

Enable auto scale and then select your Azure Subscription

And then assign the number of vCores to the Autoscale

Here we have set the max of 2.

How many vCores does a Premium P1 capacity have?

8 virtual Cores

Once completed you are all set. There are some questions. Apart from the Toast pop ups are there other ways to monitor and log the usage of Autoscaling?

This needs its own page but there are Apps you can try like https://appsource.microsoft.com/en-us/product/power-bi/pbi_pcmm.capacity-metrics-dxt?tab=Overview

The big takeaway from all this is that we should never be in a situation where we are surprised that we have reached capacity. Or, if we do set up Auto Scaling it should not be used on a day to day basis.

More investigation is needed on how to set up proper monitoring so we have full knowledge about what is going on in Premium Capacity. And we must never forget that there will be Pro workspaces already set up and these shouldn’t go under the radar either.

We will have a look at these issues in future posts

February 12, 2021

Azure Data Factory Moving from development and Production – Part 2. Using Key vault for Linked Services

In Azure Data Factory Moving from development and Production We looked at how we can use Azure DevOps to move the Json Code for Development Data Factory from development to Production.

Its going well, I have however been left with an issue. every time I move into Production details for the Linked Services have to be re added. Lets have a look at the SQL Server and the Data Lake gen 2 account.

Development

Notice that the information has been entered manually including the Storage account Key.

Again, in this instance the information has been entered manually. SQL Server Authentication is being used because we have a user in the SQL DB with all the privileges that Data Factory Needs.

DevOps Data Factory release Pipeline

Go into Edit of the Release Pipeline

Within Prod Stage we have an Agent Process

We are looking for the Section Override Template Parameters

-factoryName “prd-uks-Project-adf”
-AzureDataLakeStorageGen2_LS_accountKey “”
-AzureSqlDatabaseTPRS_LS_connectionString “”
-AzureDataLakeStorageGen2_LS_properties_typeProperties_url “https://prduksProjectsa.dfs.core.windows.net/”
-AzureKeyVault1_properties_typeProperties_baseUrl “https://prd-uks-Project-kv.vault.azure.net/”

Note that currently Account Key and SQL Database Connection String are null.

Provisioning Azure Key vault to hold the Secrets

Managed Identity for Data Factory

Copy your Azure Data Factory Name from Data Factory in Azure

You need to have a Key vault set up in Development

GET and LIST allows Data Factory to get information from the Key Vault for secrets

Paste the data factory name into Select Principal

Key Vault, create a Secret for the Azure Data Lake Storage

For the Key Vault Secret. I gave it the Secret value by copying across the Access Key from the Azure Storage Account Keys Section

The Content type was simply set as the name of the Storage Account for this excercise

In Data Factory Create a Linked Service to the Key Vault

Test and ensure it successfully connects

Use the New Key Vault to reset the data Lake Linked Service

How does this Data Lake Linked Service change the DevOps Release Pipeline?

Its time to release our new Data factory settings into Production. Make sure you have Published Data Factory into Devops Git.

Production Key vault Updates

We need to update Production in the same way as Development

In Production Key vault add the Production data factory name to Access Policies (as an Application) With Get and List on the Secret
Ensure that there is a Secret for the Production Data Lake Key AzureDataLakeStorageGen2_LS_accountKey
Check your Key vault connection works in Production before the next step

Azure DevOps Repos

In Azure Devops go to your Data Factory Repos

Notice that your Linked Service information for the Data Lake now mentions the Key Vault Secret. its not hardcoded anymore which is exactly what we want

Azure DevOps Release Pipeline

Go to Edit in the Data Factory release pipeline

When the job in Prod is clicked on, you can go to the Override Parameters Section. And notice there is now an error

AzureKeyVault1_properties_typeProperties_baseUrl is the missing Parameter. Basically at this point you need to delete the code in the Override template Parameters box and then click the button to regenerate the new parameters

Override with production information (I saved the code so I could re copy the bits I need.

Once done, notice that the -AzureDataLakeStorageGen2_LS_accountKey “” parameter is now gone because its being handled by the key vault.

Lets Save and Create a Release

New failures in the Release

2021-02-08T13:45:13.7321486Z ##[error]ResourceNotFound: The Resource ‘Microsoft.DataFactory/factories/prod-uks-Project-adf’ under resource group ‘prd-uks-Project-rg’ was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix

Make sure that your override parameters are ok. I updated:

Data Factory name from Data Factory
Primary endpoint data Lake Storage from Properties
Vault URI from Key vault Properties

Repeat the Process for SQL Database

With everything in place we need to introduce a connection string into Key Vault

I have a user set up in my SQL database. the user has GRANT SELECT, INSERT, UPDATE, DELETE, EXEC, ALTER on all Schemas

I want to include the user name and password in the Connection string and use SQL authentication

Secret Name

AzureSQLDatabase-project-ConnectionString

Secret

 Server=tcp:dev-uks-project-sql.database.windows.net,1433; Database= dev-uks-project-sqldb; User Id= projectDBowner;Password= Password1;

the connection string has been set as above. For more information on connection strings see SQL Server connection Strings

Go back to Data factory and set up the new secret for SQL Server

This is successful

Data Factory and DevOps

back in Data Factory Publish the new linked Service Code
go into Dev Repos and check in Linked Service code you are happy with the new Key vault information
Go to Prod Key vault and make sure you are the Secret is set with the Connection String for SQL DB
Test the Key vault secret works in Prod
Back in DevOps Go to Release pipelines and Edit for the adf Release CD pipeline (Release Pipelines are Continuous Delivery. Build pipelines are CI for Continuous Integration)
Edit Prod Stage (I only have Prod) Arm Template Deployment Task, Copy Overwrite Template Parameters code into a file for later
Delete the code and click the … to get the latest parameter information
Re add your production parameters, most can be taken from the code you just copied.
Create a new Release
go to Linked Services in Data Factory and check they are still Production. They still use Key vault and they still work

Now this is all in place, Development Data factory can be published up to production. there is no need to reset Linked Services and all your information about Keys and passwords are hidden in the Key Vault

January 26, 2021

Azure SQL Database. Publishing from Development to Production Part 2

The Dev to Prod Process

Initially in part one we set up the process with Visual Studio 2019 and Devops and moved all our objects across to Production. Then with Data Factory we moved data from the production data lake into Production Azure SQL DB

We have all the Source data in a data lake (and its been assured that the Production data lake is the same as the Development data Lake)

We have a Data Factory in Production that goes through a DevOps release pipeline so we should now be able to use the Production Data Factory to Load all the Production Data into the Production SQL database on a regular basis.

What happens when you already have Objects and data in Target?

Only the changes will be released. So the next time you release into production you are releasing the delta

Lets see this in action

The Initial SQL database was released into Production with Visual Studio
A Production Data Factory moved all the data into the new Objects
Now we have an updated Dev SQL Database.

Open your visual Studio Project

Open the project that was created in the last session.

In SQL Server Object Explorer

You have the Azure Server and Database. Right click and Refresh

And you have the Local Project DB which contains all the Objects. We can Check the Schema differences between the project DB and the latest DB within Azure

Go to the dev SQL Database (Click on the icon to the left of the server to open it up)

on the Azure SQL Database. Right click and Choose Schema Compare

For Select Target , select Select Target

Open the Local DB Project. The Database in this case is the same name as your Solution in Solution Explorer. (Now I know I should have given my solution the project name and my Project an extension of _DB to differentiate the two)

Click OK

Click Compare.

Now you get to see what has been deleted. in this case a table and a Procedure has been dropped

Next we can see changes. If the Table Change is clicked on, you get more information about that change in the object definitions. In this case a test column has been added.

This creates a quandary when it comes to loading in the data because this table should be fully populated but the new column will be blank. Is it possible to do a full load for these updated tables with Data Factory, OR do we need to look at something more complex?

And finally additions. In this case there are lots of new Tables procedures and two new functions.

Once happy Click Update and your changes will be published into Solution Explorer

To Check, have a look for some of the new tables, SPs etc in Solution Explorer

Once completed you can click the x icon to close the comparison window and you can save your Comparison information

Rebuild the Project in Visual Studio

Now we want to Rebuild our Project within Solution Explorer

Right click on the Project in Solution Explorer and choose Rebuild. this rebuilds all the files.

Rebuild rebuild your entire project
Build just rebuilds on the changes

Process your Changes with GIT

Now its in your project you need to process those changes with GIT

in Git changes. Commit all and Push

And remember to add a message

This image has an empty alt attribute; its file name is image-94.png

These objects should now be in Devops. You can go to Devops Repos. then to your database specific project and check for new tables, SPs etc

My new junk dimension object are there so this is all working.

Release the new database objects into Production database

now all the code is in the repos we can push the new and updated objects into production with a DevOps Release Pipeline.

There is already data in my production database. As as initial starting point I do a quick check on a few tables to get a feel of the data.

This SQL Script allows you to do a quick check on row counts in the production database

SELECT
QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.name) AS [TableName]
, SUM(sPTN.Rows) AS [RowCount]
FROM
sys.objects AS sOBJ
INNER JOIN sys.partitions AS sPTN
ON sOBJ.object_id = sPTN.object_id
WHERE
sOBJ.type = 'U'
AND sOBJ.is_ms_shipped = 0x0
AND index_id < 2 -- 0:Heap, 1:Clustered
GROUP BY
sOBJ.schema_id
, sOBJ.name
ORDER BY [TableName]
GO

Azure DevOps

Choose the database repository (You should also have a repository for data factory)

Build Pipelines

Go to Pipelines. Before releasing to Prod we actually have to build all our code into an artifact for the release pipeline

Click on your Project _DBComponentsCI pipeline (Continuous integration) set up in Part 1

Lets remind ourselves of this pipeline by clicking Edit

This image has an empty alt attribute; its file name is image-188.png

We build the solution file from the Repos in devops. Then Copy files to the staging directory. Finally publish the artifact ready for release.

Come out of Edit and this time choose Run Pipeline

And Run.

Once its run, there are warnings again but for the time being I’m going to ignore these

Otherwise the build pipeline runs successfully.

Release pipeline

Now we have rebuilt the artifact using the build pipeline, go to pipelines, Releases and to your DB_CD release (continuous delivery)

We have a successful release. I can run the above SQL and check for differences. for a start, there were 39 objects and now there are 43 so you can immediately see that our production database has been updated.

The row count shows that we haven’t lost any data. We have simply updated the objects.

Part three will allow us to look more in-depth at the data side. How we deal with the data in Dev and Prod

January 6, 2021January 6, 2021

Azure Data Factory – Moving from Development to Production

When working on larger projects we need to merge changes from Developers. When all the changes are in the central branch we can then have an automated process to move development to Production

Smoke tests

In computer programming and software testing, smoke testing is preliminary testing to reveal simple failures severe enough to, for example, reject a prospective software release

Integration testing

Integration testing is the phase in software testing in which individual software modules are combined and tested as a group. Integration testing is conducted to evaluate the compliance of a system or component with specified functional requirements. It occurs after unit testing and before validation testing

Resources Involved with the current Project

Azure DevOps
Azure SQL Server
Azure SQL Database
Azure Data Factory
Azure Data Lake Gen 2 Storage
Azure Blob Storage
Azure Key vault

Each resource has its own specific requirements when moving from Dev to Prod.

We will be looking at all of them separately along with all the security requirements that are required to ensure that everything works on the Production side

This post specifically relates to Azure Data Factory and DevOps

Azure data factory CI/DC Lifecycle

GIT does all the creating of the feature branches and then merging them back into main (Master)

Git is used for version controlling.

In terms of Data Factories, you will have a Dev Factory, a UAT factory (If Used) and a Prod Data factory. You only need to intergrate your development data factory with GIT.

The Pull request merges feature into master

Once published we need to move the changes to the next environment, in this case Prod (When ready)

This is where Azure Devops Pipelines come into play

If we are using the Azure Devops Pipelines for continuous development the following things will happen

The devops Pipeline will get the powershell script from the master branch
The get the ARM template from the publish branch
Deploy the Power Shell script to the next environment
Deploy the arm template to the next environment

Why use Git with data Factory

Source control allows you to track and audit changes
You can do partial saves when for example you have an error. Data Factory wont allow you to publish but with Git you can save where you are and then resolve issues another time
It allows you to collaborate more with team members
Better CI/CD when deploying to multiple environments
Data Factory is time times faster with a GIT back end that it is when authoring against the data factory service because resources are downloaded from GIT
Adding your code into Git rather than simply into the Azure Service is actually more secure and faster to process

Setting Up Git

We already have an Azure Devops Project with Repos and Pipelines turned on

We already have an Azure Subscriptions and Resource Groups for both Production and Development Environments

There is already a working Data Factory in development

In this example Git was set up through the Data Factory management hub (the toolbox)

DevOps Git was used for this project rather than GitHub because we have Azure DevOps

Settings

The Project Name matches the Project in Devops

The Collaboration branch is used for Publishing and by default it’s the master branch. You can change the setting in case you want to publish from another branch.

Import existing resources to repository means that all the work done before adding Git can be added to the repository.

Devops is now set up.

Close Azure Data Factory so we can reopen it again to go through the GIT process (If it is open)

Where to find your Azure DevOps

You should now be able to select your own Area in Devops / Repos and select the created project within Azure Devops and Repos

You will need an Account in Azure DevOPs Click on Repos and that account must be higher than Stakeholder to access Repos

You can then select the Project you have created

Using Git with Azure Data Factory

In Azure Open up your Data Factory

Git has been Enabled (Go to Manage to review Git)

The master branch is the main branch with all the development work on it

We now develop a new Feature. Create a Feature Branch + New Branch

We are now in the feature branch and I am simply adding a description to a Stored Procedure Activity in the pipeline. However this is where you will now do your development work rather than within the master

For the test, the description of a Pipeline is updated. Once completed my changes are in the feature 1 branch. I can now save my feature

You don’t need to publish to save the work. Save all will save your feature, even if there are errors

You can go across to Devops to see your Files and history created in Azure Devops for the feature branch (We will look at this once merged back into Production)

Once happy Create the pull request

This takes you to a screen to include more details

Here I have also included the Iteration we are currently working on in Devops Boards.

A few tags are also added. Usually, someone will review the work and will also be added here.

The next screen allows you to approve the change and Complete the change

In this case I have approved. You can also do other things like make suggestions and reject

Completing allows us to complete the work and removes the feature branch. Now all the development in the feature branch will be added to the main, master branch.

In Data Factory, go back to the master branch and note that your feature updates are included

We now publish the changes in Master Branch which creates the adf publish branch. This publish branch creates the ARM template that represents the pipelines, linked services, triggers etc.

Once published, In Devops Repos , there are now files to work with

You can see your change within the master branch

(The changes would normally be highlighted on the two comparison screens)

Here we open the Pipelines folder got the compare tab and find the before and after code

And you can also see your history

The Arm templates is in the adf_Publish branch, if you select this branch

Once done we need to move the changes to the next environment, in this case Prod (When ready)

This is where Azure Devops Pipelines come into play

Continuous Development using Azure DevOps

We need another Data Factory object to publish changes to

In this case, the Production has been created with Azure Portal within the Production Subscription and Production Resource Group

Git Configuration is not needed on the Production resource. Skip this step

Create your tags and Review and Create

DevOps Pipelines

For this specific Project, We don’t want to update production automatically when we publish to Dev. We want this to be something that we can do manually.

Go to Pipelines and create a new release Pipeline (In DevOps)

Click on Empty job because we don’t want to start with a template

And because for this project there is no UAT, just Production name the Release Pipeline Prod

Click on the X to close the blade

We need to sort out the Artefact section of the Pipeline

Click on Add an Artefact and choose an artefact from Azure Repos

We may as well add adf_Publish branch which contains the ARM templates
And the Master branch

the Source alias was updated with _adf_publish

Both Pipelines are Azure Repos artefacts

Next We move to Prod and Start adding tasks

Click on 1 job, 0 tasks to get to tasks

Click + against Agent Job to add the task Our task is for ARM Template deployment

Click Add

Then click on the new Task to configure

The first section is where you select your production environment

Next you need to select the ARM template and the ARM template parameters file. These are always updated in the Devops artefact everytime you publish to dev.

The JSON templates are in the adf_publish branch

Now you need to override the template parameters because these are all for Dev and we need them to be production. These are:

These will be specific to your own data Factory environment. In this instance we need to sort out the information for the Key vault and data lake storage account

factoryName

This one is easy. The only difference is changing dev to prd

AzureDataLakeStorageGen2_LS_properties_typeProperties_url

The Production Data lake Storage account must be set up in dev and prod before continuing. Go to this Storage account resource

This information is also stored in our Key Vault as a secret which we can hopefully use at a later date.

It is taken from Storage Account, Properties. We want the Primary endpoint Storage for the data lake

Copy the Primary Endpoint URL and override the old with the new Prod URL in DevOps

AzureKeyVault1_properties_typeProperties_baseUrl

We need to update https://dev-uks-project-kv.vault.azure.net/

Lets get this overridden. We already have a Key vault Set up in production. Get the URI from Overview in the Production Key Vault Service

and lets add this into our DevOps parameter

AzureDataLakeStorageGen2_LS_accountKey

This is empty but we could add to it later in the process.

Account keys are the kind of things that should be kept as secrets in Key vault in both Dev And Prod

Lets get them set up. Just for the time being, lets ensure we have the Data Lake storage account key within our development and Production Key vaults

Key Vault.

Within Key vault in development create a secret with the nameAzureDataLakeStorageGen2LSaccountKey

And the key from the storage account comes from……

And Repeat for Production Key vault

For the time being through lets leave this blank now we have captured the information in the key vault. It should come useful at a later date

AzureSqlDatabaseTPRS_LS_connectionString

This was also empty within the parameters for dev.

You can get the connection string value by going to your SQL data base. Connection Strings. PHP and finding the try statement

And here is the Connection String value for production

Server=tcp: prd-uks-project-sql.database.windows.net,1433; Database= prd-uks-project-sqldb;

You can also add this information into Key Vault as a secret and repeat for Production

For the first instance we are going to leave empty as per the dev parameters. At some point we should be able to set up the Security Principal so we can change the hardcoded values to Secrets

The parameters created in dev are now overridden with the production values

The Pipeline is then named

Create a release

Once Saved. Click back on Releases

For this tye of release we only want to do it manually

Create a Release for our very first manual release

Click back on releases

And click on release 1 to see how it is doing

You can click on Logs under the Stages box to get more information

Now you should be able to go back to the production data Factory and see that everything has been set up exactly like Dev.

Go and have a look at linked Services in the Production data Factory

Note that they are all set with the Production information

We now have a process to move Dev to Prod whenever we want

The Process

Throughout the sprint, the development team will have been working on Feature branches. These branches are then commited into the master pipeline and deployed to Dev

Once you are happy that you want to move your Data Factory across from dev into Prod. Go to DevOps Release pipeline

Create Release to create a new release

It uses the Artefact of the Arm template which is always up to date after a publish.

This will create a new release and move the new information to Prod

All your resources will be to be able to quickly move from Dev to Prod and we will look at this in further posts

October 7, 2020

Data Factory, Moving multiple lookup worksheets from Excel to one lookup table in SQL Server

A current project has an xlsx containing around 40 lookups in individual worksheets

Each worksheet consists of a code and a description

We decide that we want every single lookup in one lookups table in SQL Server.

This will have a Lookup Name, Code and Description that we can then use for the rest of the project

We want to do everything in one go in Data Factory.

For this Im going to use a simple example with just 3 worksheets

Azure Data Lake Gen 2

We are going to store the source data within a data lake.

The Source data looks like this

Lookup B worksheet

Lookup C Worksheet

SQL Server

I have an Azure SQL Database and on it I create the one table that all the reference lookups will go into

GO

CREATE TABLE [staging].[Lookups](
[LabelKey] [int] IDENTITY(1,1) NOT NULL,
[LabelName] varchar NULL,
[Code] [int] NULL,
[LabelDescr] varchar NULL,
[Importdate] [datetime] NULL
) ON [PRIMARY]
GO

ALTER TABLE [staging].[Lookups] ADD DEFAULT (getdate()) FOR [Importdate]
GO

LabelKey has been added just to create a valid key for the table. LabelName has also been added which will be the name of the worksheet.

Finally ImportDate is added because we want to know exactly what time this data was imported into the table

Now we need to provide Data Factory with a list of worksheets

CREATE TABLE [staging].[LookupNames](
[LabelKey] [int] IDENTITY(1,1) NOT NULL,
[Labels] varchar NULL,
[Importdate] [datetime] NULL
) ON [PRIMARY]
GO

ALTER TABLE [staging].[LookupNames] ADD DEFAULT (getdate()) FOR [Importdate]
GO

Lookup Names is our seed table and will provide us with the worksheet names

we have populated it like this

SELECT 'Lookup A' UNION

SELECT 'Lookup B' UNION

SELECT 'Lookup C'

Data Factory

Linked Services

Firstly we need to provide our linked services. Source and destination

go to Linked services via

and choose new.

call it ADLS_LS and select your Azure Subscription and Storage account.

At this point the connection was tested and was successful so we didn’t need to do anything further

Next, create your Azure SQL Database Linked Service

And call is SQLDB_LS (Or what ever you feel is the right naming convention. _LS is good because you can see exactly what are the linked services in the JSON script created

Again add in your details (We used a role that we created in SQL Server DB specifically for data factory with GRANT EXEC, SELECT, INSERT, UPDATE, DELETE on all the schemas)

Ensure the connection is successful

Data Sets

Now to come up with the actual source and destination datasets. If we parameterise them then we can reuse a single data set for lots of other activities within the pipeline

Click on the … and choose New dataset

Choose the Format. In this case its Excel

We don’t want to specify any of the location values until we get to the pipeline, including the worksheet

Make sure First row as header is ticked (Unless you don’t have a header in Excel)

And create parameters

This means we can use this one Data set for all the SQL data sources

Pipelines

Now to create the pipeline specifically for the lookup

This is the basic pipeline we are going to add.

Lookup

First of all In Activities search for lookup and drag this into the pane

This uses the SQL dataset because we are going to use our SQL table that contains all the names of the worksheets.

Note that first row only is not ticked because we are bringing all the information through

ForEach

@activity(‘GetLookups’).output.Value

We are going to get the entire data set (Value) fed into the GetLookups Lookup.

Sequential is ticked because we are going to move through all the worksheets names in the table (Ensure that your Worksheets have exactly the same name as what is specified in your table)

Click on the Activities (1) to get to the activity

Copy Activity within the Foreach

We now set up the source of the copy activity

We use all the parameters within the dataset and add in the information from our Azure data Lake Gen 2 in the Storage Resource.

Within our Lookups table there is a column called labelname and we are going to populate this with the Labels column from our item. Out Item in the foreach loop and was created via the Lookup. And that lookup contained all the columns from our LookupNames SQL Table

The data will go into the Lookups table

Thats everything. You should be able to test your Pipeline by clicking debug and the Foreach should move through worksheet specified within your lookupnames table and add your information into SQL

Truncating lookup tables before re adding data

we want to be able to repeat this process and unless we add a truncate statement into our process we will keep readding the same information

We can add the following Stored procedure into SQL

/*
05/10/2020 Debbie Edwards - Peak - Truncate lookups
EXEC [staging].[USP_Truncatelookups]
*/

Create PROCEDURE [staging].[USP_Truncatelookups]
AS

BEGIN

IF EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE Name = 'lookups')
TRUNCATE TABLE [staging].[Lookups]

DBCC CHECKIDENT ('Staging.Lookups', RESEED, 1)
END

And this can be added to the the Pipeline before the foreach loop and Lookup with a Stored Procedure Activity

You wont be able to see the Stored procedure if you havent granted EXEC access to the specific Database Role name and schema

Always give the least amount of privileges and them move up if you need to

--Bring back information about the members in your roles
SELECT DP1.name AS DatabaseRoleName,
isnull (DP2.name, 'No members') AS DatabaseUserName
FROM sys.database_role_members AS DRM
RIGHT OUTER JOIN sys.database_principals AS DP1
ON DRM.role_principal_id = DP1.principal_id
LEFT OUTER JOIN sys.database_principals AS DP2
ON DRM.member_principal_id = DP2.principal_id
WHERE DP1.type = 'R'
ORDER BY DP1.name;

SELECT DISTINCT rp.name,
ObjectType = rp.type_desc,
PermissionType = pm.class_desc,
pm.permission_name,
pm.state_desc,
ObjectType = CASE
WHEN obj.type_desc IS NULL
OR obj.type_desc = 'SYSTEM_TABLE' THEN
pm.class_desc
ELSE obj.type_desc
END,
s.Name as SchemaName,
[ObjectName] = Isnull(ss.name, Object_name(pm.major_id))
FROM sys.database_principals rp
INNER JOIN sys.database_permissions pm
ON pm.grantee_principal_id = rp.principal_id
LEFT JOIN sys.schemas ss
ON pm.major_id = ss.schema_id
LEFT JOIN sys.objects obj
ON pm.[major_id] = obj.[object_id]
LEFT JOIN sys.schemas s
ON s.schema_id = obj.schema_id
WHERE rp.type_desc = 'DATABASE_ROLE'
AND pm.class_desc <> 'DATABASE'
AND rp.name = 'db_NameofRole'

you should hopefully have a good pipeline to run in your lookup information into one lookup table and truncate that table when ever you run the process

September 22, 2020September 29, 2020

Power BI Dataflow issues. Let the whole dev team know

Currently, if your dataflow fails the only person who will be notified is the owner of the dataflow.

We want all our developers within the team to know. There doesn’t appear to be any way to do this at the moment but there is a workaround that was suggested to me by on the Power BI Forums by collinq as an idea starter and I thought I would run with it and see what happens.

It all relies on a Refresh date

Refresh date

In my main dataflow I have the following Query

This was created from a blank query

let  Source = #table(type table[LastRefresh=datetime], {{DateTime.LocalNow()}})in  Source

This gets updated every time there is a refresh on the main dataflow

Create a New Report in Power BI Desktop

Go to Power BI desktop and Get Data

Dataflow is the source of the data

And we only need this object

I am going to edit the Query in Power Query Editor

Last Refresh date has been split into Date and time.

Then a custom column was created for todays date

DateTime.LocalNow()

This was split into Date and Time. It is very likely that we may decide to use time later so this is why it has been added for now.

Now we find the number of days between the last refresh and today

0 -Duration.Days(Duration.From([last Refresh Date]-[Todays Date]))

0- is added to remove the minus at the start of the number so -1 becomes 1

Close and Apply and a card is added

Publish to Service

Power BI Service

Go to the report that has just been created

And Pin the card to a dashboard. In this case, an Issues dashboard has been created

The idea at this point is. If the Refresh date isn’t refreshed the number of days between will become 2 because todays date will change and we can be notified

This will need testing

Power BI Dashboard (Create Alerts)

Go to the dashboard

Choose Manage Alerts

We want to know if it goes above 1. this means that last nights refresh has failed to happen.

(But if it fails to happen, will the Current date refresh?)

At present, an alert will only be sent to the owner of the report and it will be seen within Power BI but we want everyone to know.

This is why we are going to use Power Automate / Logic Apps

We have an Azure Subscription so I can add a Logic App within our subscription

Instead of clicking the above link we are going to go into Azure, but the principal will be the same

Save the Alert

Schedule a refresh

The new report has created a dataset

go to Settings and Scheduled refresh to keep this up to date

Azure Logic Apps

Create a new Logic App in Azure

Search for Power BI. We want the trigger ‘When a data driven alert is triggered(Preview)

I am signing in with my own credentials (Note that my password is updated every month so if these credentials are used it will need adding into the governance.

Our alert has been saved and added to the alert list

Just for the time being its being left at 3 times a day

We have our trigger, now we need to know what will happen on the trigger

New Step

for the time being chose a fairly easy option of sending an email

You can search for the dynamic content as you create the body and subject. Here we want to bring to attention the value in the tile and the alert threshold.

The HTML <li> element is used to represent an item in a list
The <strong> tag is used to separate the text from the rest of the content. Browsers traditionally bold the text found within the <strong> tag
The <big> tag is used to make the text one size bigger
The <ul> tag defines an unordered (bulleted) list
The <a> tag defines a hyperlink, which is used to link from one page to another. The most important attribute of the <a> element is the href attribute, which indicates the link’s destination.

I have added two users to the email so they can both be notified

Save your logic app. Its ready

Testing the New Process

The dataflow is schedule to refresh at 11 PM

Dataflow issues data flow is scheduled at 12 AM

On the night of the 28th of September, everything failed. I got the emails because I am the dataflow owner but no email from the actual set up.

Testing has failed

lets have a look to see whats happened.

We have two fails, and one on the dataflow we have set up

It looks like the refresh token expired. Please go to this dataset’s settings page, and reenter the OAuth2 credentials for the Extension data source.

Going into the report and we still see this

Which is incorrect.

We would get more of an understanding if we could match up the dates to what is happening.

However its clearly not updated

Dataflow Settings

Scheduled refresh is on and set to run at 12 midnight. The errors were emailed through just after 11.

The alert is there.

lets go back to desktop and add some more information.

After a refresh in Desktop we can now see this information

Which is correct. this says to me that even though we have a refresh going on, it didn’t refresh in Service possibly. The new report is published up to Service.

Back in Service

This is interesting. Our new multi row card shows the correct information. However our card still says 1 day which isn’t correct.

A quick refresh of the data set and we can still see one on the card so we have a difference in between Service and Desktop.

Refresh of the report and now its worked and we can see 2 days difference

So there are a few issues here. Why did it not refresh the card on the data set refresh but it did when the actual report was refreshed?

Its actually the dashboard that is doing the work here. the new multi card is pinned to the dashboard. lets go and have a look at it.

The dashboard only updated once the new visual was pinned to it

So the failure has been that the report and dashboard didn’t refresh, even though it is set to refresh.

You can get to the data set refresh history in Data sets and then Refresh History

And you can get to the Dataflow refresh history via Dataflows

Data Set Issues Refresh History

Dataflow Issues Refresh History

The actual Schedule seems to be fine. All I can think of is that possibly at 12 it is still 1 day so I could possibly introduce more data refreshes to the dataflow issues data set

Test 2 Adding more refreshes on the Dataflow issues data set

Its a very quick refresh because its just two dates. Lets see if this changes things.