Setting up an Azure Data Lake V2 to use with power BI dataflows in Service (As a data source)

Previous to the brand new Azure Data Lake, I was adding all the files into Blob Storage. However Azure Data Lake V2 is built on Blob storage and DataLake V1

its built for big data and a fundamental change is that we now have a hierarchical namespace. This organises your files into directories.

So now, we can do things like use all files from a specific Directory, delete all files from a specific directory. We can categorise our files within the data lake.

Set up Azure Data Lake V2 in Azure Portal

When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1

So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)

First of all I go to the Subscription where I want to add the new data Lake v2

Open up the Portal menu (Now hidden to the left of the screen)

Choose Create a resource

next Choose Storage and Storage Account

Note that the Account kind is Storage V2 (General Purpose)

Ive set the Location to North Europe, Simply because I know thats where our Power BI Data in Services is stored and I may as well stick with this.

For the time being, I am leaving everything else as standard

Next go to Advanced

the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2

Click Review and Create

Create a file System within a Container

We now have a few options available to us. I have some files to add so I am going to add them to a container

Click on Containers And then + File System

Click OK

clicking on your new storage account(s) you are told to Download Azure Storage Explorer.

I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.

Once downloaded Open Azure Storage Explorer

You will need to Add in your Azure Storage Accounts by clicking the little connector icon

You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication

This will log you into all your Subscriptions and Services

You are good to go

Here you find your subscription, Then go to the Data Lake Storage Gen 2 and find the new File system.

I have added a folder here called Workshop1Files to my File System

Obviously Data Lake Storage gives you so many ways of working with files and automating the files to the storage area. In this case I am going to simply move a file into my new folder to work with

Double click on the folder and then Click Upload and Upload Files

And now your file is in the cloud, in an Azure Data Lake ready to use.

Connect to your Azure File with Power BI Desktop

The first test is can we access this data within Power BI Desktop.

Open Power BI Desktop and Get Data

Choose Azure Data Lake Storage Gen2 (Currently in Beta)

Add the URL

Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder> 

Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder> 

If you go to Right click on the file in Storage Explorer and go to properties, there is a difference in structure

http://<accountname&gt;.blob.core.windows.net/<filesystemname>/<subfolder>

If you try to connect with the original URL from Data Storage you get the following error

And if you change the URL from blob to dfs

There is a missing part to the puzzle. Go back to the Azure Data Lake Storage Account in Azure and Add Storage Blob Data Reader to your account

Then try again and hopefully you are in .

No need to combine because we have specified the file.

There are different ways you can load the file. I loaded one file but you can load all files in the File System

https://storageaccount.dfs.core.windows.net/filesystemname

or all files under a directory in the file system (You can include sub directories in this)

https://storageaccount.dfs.core.windows.net/filesystemname/directoryname/directoryname

Connect to your Azure File with Power BI Data Flow

I am creating data flows in the power BI Service to ensure they can be reused across the company. The question is, Can I Connect to the above File in Service via a data flow

In Power BI Service, add a Data Flow which takes you into Power BI Query Editor in the Service. I already had some data flows connected to an Azure database.

The data is in Azure Data Lake Storage so the first think I do is try the Azure route

However, there is no Azure Data Lake Storage Gen 2. This must be something coming in the future. so then I go to File and click on Get Data text / csv

You will need to add the File Path and your Credentials (As per previous advice use dfs not blob in the URL. this seems a little flaky at the moment. I choose Organisational Account first before adding the URL and then it seems to work.

Remember Go back to Azure Storage Explorer. if you click on properties, you can grab the URL from here

We don’t need a Gateway Setting up because everything is now in the cloud.

Clicking next, Nothing happens, it just keeps bouncing back to the same window.

Attempting to use the Blob Storage connector also doesn’t work (Using the Azure Account Key as authentication).

with blob in the URL
With dfs in the URL

It would appear that currently I have hit a brick wall and there is no current DLGen2 connector for Data Flows.

I will be keeping an eye open on this because obviously, when you are pushing the new generation of Data Lakes and Data Flows then there needs to be a DLGen2 Connector for Data Flows.

Update

Had a reply back on the Power BI Forum (Not a good one)

The feature haven’t been planed. If there is any new message, the document: What’s new and planned for Common Data Model and data integration  will be updated.

I have found this in Ideas

https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/38930032-add-azure-data-lake-storage-gen2-as-a-data-sourc

Please help us get this working by voting for this idea.

Power BI November 2019 updates

To look at the new power BI Updates I went for my go to Data Set Adventureworks data warehouse

I have added the following tables

  • DimEmployee
  • Dimdate
  • DimSalesTerritory
  • DimProduct
  • DimProductSubcategory
  • DimProductcategory
  • Dimcustomer
  • FactInternetSales

Use the Updated Ribbon

Once I have added the data I want to start using the new power BI Ribbon in preview. go into Options Preview Features and select the Updated Ribbon feature

You have to restart to get the new ribbon

It looks quite different. Lets see how it goes…..

Decomposition Tree

A brand new AI visual available in November is the Decision tree. I have used the Key influencers with social media data which is great so i am very excited about this new visual.

With the new visual, we can do route cause analysis, Look how individual categories contribute to a whole and we can also rank categories against the selected measure.

Again, this is a preview feature so it needs enabling in options

Just like the Key influencers visual, we need something to Analyse and data to explain the analysis by.

Unlike Key influencers where we analyse by category (e.g positive, negative and neutral) Here we are analysing a metric

Lets pick something to look at

class breaks down the sales amount

Next, i clicked on the + next to H and chose product category. We can now see that Bikes takes up the majority of the H category

Next try using the AI feature, Options are: High and Low value. I’m going to choose high value. It will look though all the categories that are left and pick the category with the highest contributor to that metric

The lightbulb against the title lets you know its picked by AI

lets do the same but look at lowest value

Its is picking out Central region as the very lowest of the explain by values

This is fantastic, You can also lock a level so a user cant remove it, change the colours etc. I cant wait to use this on my actual working data sets

Advanced Gauge

Straight away, the new ribbon changes how you go to the marketplace

Now you go to More Visuals and From Appsource to get to the marketplace

go for the Advanced Gauge from xViz

I have Year to date Internet Sales and Quantity of products bought

I also have measures for Last year Year to Date Sales and quantity so I can add these into the gauge.

There are other options like setting minimum and maximum values and tooltips

Hierarchical Filter

I have used the Hierarchical slicer before, lets see how the new Filter looks (get from Appsource – Hierarchical Filter by Eviz)

You need a hierarchy in your data to use this visual. I have the Group Hierarchy (Group Country and region) One of the best things about this visual is that, not only do you use it to slice your data but you can add a measure which gives you even more information right on the slicer its self.

However, I don’t think it comes across as a slicer, something that the user can interact with.

Financial reporting Matrix by Profitbase

A Matrix with financial formatting options

The video talks about Sticky rows and sticky columns in that you can set them to always be on screen even while scrolling which is good.

However the Information on the Overview video uses

There is a note on this one. Its really hard to delete, it took me a fair while the other day to click it in the right place to delete what is possibly a measure called DateCalculations in the columns section so this may be one to look at in more detail later.

Distribution Visual

More Visuals – From App Source

This is supposed to be great for setting up a global legend, however no matter what I tried, I couldn’t get the visual to display any information so this one is a fail for me

Conclusion

Some Pretty good updates this month, Im specifically excited about the Decomposition tree.

Another major one for me is the LinkedIn Sales Navigator connector. However I’m not sure I have the right set up to use it at the moment.

the video explains that you need Enterprise Plan in either admin or reporting user roles you can use the connector but what is not mentioned is what this is? Is this Linked In?

I will be looking at this in more detail hopefully at a later stage

Azure Analysis Services Security

I have recently been setting up training sessions for Analysis Services and Azure Security has been one of the biggest learning curves.

Lets have a look at what we needed to do with our Adventureworks demo and all the areas of Possibility

Azure Analysis Service Access Control (AIM)

I have created an analysis Service in Azure which makes me the Administrator. I can do everything, including add models that I create in Visual Studio 2019.

However I will be working with lots of other people. In the first instance they will be using the models I have set up to create reports in Power BI

This is only a training demo and is not how I would set up a fully functioning Live Analysis Service model

in Azure we can add role assignments (Click on Access Control (IAM) but what does this mean in terms of actually using the service?

I am an owner. Azure has a number of built in roles role-based access control (RBAC) to use

  • Owner – Has full access to all resources including the right to delegate access to others.
  • Contributor – Can create and manage all types of Azure resources but can’t grant access to others.
  • Reader – Can view existing Azure resources.
  • User Access Administrator – Lets you manage user access to Azure resources.

The model above shows how you can use RBAC to segregate duties.

Using the following information, I set my users up as contributors in the hope they would be able to access the models

With Contributor in place the users could see the Analysis Service in Power BI but didn’t have access to the model

As a test the users were bumped up to Owner and still couldn’t see the models in Power BI

Azure AIM is in relation to the Analysis Service, Not the models created on the Analysis Service

Visual Studio 2019 (Roles)

The next step is to see what happens when we create a role in Analysis Services model in Visual Studio

In tabular Explorer (Make sure you have downloaded the Analysis Services extension for Visual studio) click on Roles. I have created a training role for this demo

I gave all the Users the permission of Read and Process

PermissionsDescription
NoneCant modify the model in SSMS. Cant query data.
ReadCant modify the model in SSMS. Can query data.
Read and Process Can query data. Can run process operations by running a script or package Cant make any changes to the database. Cant view the model in SSMS
Process Cant query data. Can run process operations by running a script or package Cant make any changes to the database. 
Administratorcan modify the model schema and query data

Once the training role was set , the model was reprocessed

Users now have access to the Analysis Services model and can use the model in Power BI

The question is, can the users access the model if they are in this Role, but they are not in Analysis Services (AIM)

Both users were taken out of AIM in Azure and still had access to use the models

New requirement for users to add new models to the Analysis services Database

the next stage was to allow the users to create their own adventureworks models in the Analysis Service.

With all the above security in place, when a user Processed their tabular model, the following error occurred

This user has owner Access in Azure IAM and Read and Process Access on the Analysis Services model

How do we allow users to actual create and modify models?

SSMS – Roles

SQL Server Management Studio is the place to grant model permissions

Open up SSMS as Administrator by Right Clicking and Run as Administrator.

Note that you can see the Training role set up in visual Studio. originally, Full control was unticked.

I ticked Full Control which is immediately set on the model. No need to process anything.

The user tried again and still couldn’t add the new model to the Analysis Service

Only server administrators, or database administrators having Full Control permissions, can deploy a cube from source files to a server, or create roles and assign members.

It would appear that Full Control (Administrator) is not the same as server administrator because you cant deploy a tabular model

Question, If I Re-process the visual Studio model with permission set as read and Process, will this override the new selection in SSMS?

Yes. If you reprocess your tabular model (Because you have updated the model, Added new measures etc) , Role permissions from here will override the Role permissions set in SSMS. Ensure that any Roles changed in SSMS are reflected in Visual Studio

SSMS – Analysis Services Properties

right click on the Analysis Service model (Under Connect) and Choose Properties

then select Security

Note that this is where Server Administrators are assigned and only these people can create and deploy their own models to the analysis Service

Going back to Azure, you can also set up Admins in here

Note that everything we just did in SSMS is available in Azure. Also, if we redeploy the Analysis Services Tabular model via Visual studio, it doesn’t do anything to Analysis Services admins

Our roles aren’t available to view in Azure . Just Analysis Services Admins

Everything is now working

There is still questions to be answered. Feel free to let me know your thoughts

  • When would you need to add users to Azure IAM if you can use the tabular models and also deploy models by adding users to Roles and the server administrator role in visual Studio and SSMS?

Incremental Processing in Data Factory using Watermark Table

I am pulling tweets into an Azure Table Storage area and then processing them into a Warehouse

The following shows the very basic Data factory set up

Connections

I have created a Linked Service for the Azure Storage Table PowerBIMentions

And another Linked Service for my Azure SQL Server Table PowerBIMentions

Datasets

the Storage Table was set up in Logic Apps. I wont go into detail about the schema here

I have created a SQL Table to Mirror the Azure Table Storage

Watermark Table in SQL

The Watermark Table is as follows

CREATE TABLE [staging].[watermarktable](
[TableName] varchar NOT NULL,
[WaterMarkDate] [datetime] NULL,
[WaterMarkValue] varchar NULL,
[WaterMarkOffSetDate] datetimeoffset NULL
) ON [PRIMARY]

For the time being, the Watermark value is to set the Date in the same format as is in the Azure Table storage.

Ignore the offest date for the time being

I have then created a stored Procedure to add the table name storage.PowerBIMentions, the maximum created Date from what is in the table and then some extra script to format that date into the correct format in Azure Table storage.

To understand how this should be formatted you can look at the Azure Table using Microsoft Azure Storage Explorer

https://azure.microsoft.com/en-gb/features/storage-explorer/

As you can see the format is YYYY-MM-DDHH:MM:SS.000Z

I haven’t included the code for the SP here but it is fairly straightforward. You take the max date from the table you are interested in and also include code for format the value to the correct value

Pipeline

Added a Pipeline called CopyPowerBIMentionsIncremental

The Source is the Storage mentions table

the Sink (destination) is the Azure SQL Table

And then we can easily import the schemas and map

Then run the very simple Pipeline which just moved the data. Then Runs the Watermark Stored procedure to recreate the MAX Date time

Change Pipeline to include a lookup activity

Now the next time we want to process we have to only process new records into the table.

the Query to select the correct date from the Watermark table

SELECT MAX(WatermarkValue) AS WatermarkValue From [staging].[watermarktable]
WHERE WatermarkValue IS NOT NULL
AND TableName = ‘staging.PowerBImentions’

You can test this in SQL Server Management Studio

CreatedAt gt ‘@{activity(‘LookupWatermarkOld’).output.firstRow.WaterMarkValue}’

CreatedAt is the column in the source Azure SQL Table table which has the following date format

OperatorURI expression
Equaleq
GreaterThangt
GreaterThanOrEqualge
LessThanlt
LessThanOrEqualle
NotEqualne

Then it uses the Lookup activity and the WaterMarkValue from the SQL Table

Test the New Watermark Logic

in SSMS

SELECT COUNT(*) FROM [staging].[PowerBImentions]

there are 16 records and the max date is 2019-11-01T16:00:55.000Z

Run the new data Factory (Debug)

The new record count is 21

The new Max Date is 2019-11-01T17:01:52.000Z

And Lets to a quick check to make sure there are no duplicates

SELECT TweetText , COUNT() FROM [staging].[PowerBImentions] GROUP BY TweetText HAVING COUNT() > 1

its all working. We have no duplicates and this will ultimately help keep the processing down

There is another way to do incremental processing by using change tracking. Ill be looking at that in a later post

Quick ‘Logic App’ Tip. Error on Foreach loop. An action failed. no dependent actions succeeded

I have been setting up a Logic App to add tweets to Azure table storage. While setting up a loop to add in Keywords into a table (1 tweet may add multiple rows to the Keywords table) I came across this error

the tweet has generated 10 rows of data (Because there are 10 keywords)

I have set up an increment counter to add to the ID so each one will get 1 2 3 and so on. If you toggle through the records you notice this

  • record Increment
  • 1 1
  • 2 6
  • 3 3
  • 4 7
  • 5 2
  • 6 4
  • 7 10
  • 8 8
  • 9 9
  • 10 5

And the proceeding table storage doesn’t work because of conflicts. clearly the increment isn’t happening in order. there is a quick solution to this one

go into Edit mode in the Logic App

every for each with an increment variable click … to get to Settings

make sure concurrency is set to on and that the degree of parellelism is one only. Setting to 1 will ensure the iterations are sequential

Adding a Logic App to Refresh your Analysis Services Data

You have a shiny new Analysis Services model containing all the data you need for your Power BI reports and Dashboards

It has been added into an Analysis Service because A There is a lot of data, possibly too much for a Power BI Imports, B You want to do incremental processing without needing Power BI Premium and C You want to refresh more than 8 times a day.

Everything is all set up but the final part of the puzzle is how to schedule the refresh into the Analysis Services Tabular model.

Its easy enough in Power BI, You simply set the schedule in the Service. Lets have a look at how to do this using one of the Options. Logic Apps

Create a Service Principal (SPN)

First we need to Create a new Azure Active Directory Application and Service Principal to use with Role based Access Control. Whenever you have code that needs to access and or modify resources (The logic App will Refresh the Analysis Services data) You need to create an Identity for the App.

Sign into Azure

Go to Azure Active Directory

Next go to App Registrations and + New registration

The web URL is simply one set like the example in the documentation

Then click Register

Assign the Application to a role

Before you assign resources,, the app must be assigned a role. For this example we are going to assign a role at the subscription scope for our Proof of Concept Subscription

Go to All Services and then Subscriptions

Im going to select the Proof of Concept Subscription

Next select Access Control and add Role assignment

Note I have selected the new analysisservicesapp

Click save to add the Role Assignment

The Service Principal is set up. Now we need to get the values so we can sign into the app

Get Sign in Values

Go to Azure Active Directory

Go to App Registrations again and select the new application

Copy the Tenant ID for later use

Copy the Application ID for later use

Set up your Credentials

I was going to create a certificate but I found it an extremely complicated process and I couldn’t in the end export the created certificate in Powershell

Because I haven’t been able to Create and Export a certificate I am going to use Client Secrets instead

Click on New Client Secret

Once added you need to copy the ID for use later. You wont be able to access this information again so DONT LOSE IT

Give the Service Principal Name Authority to administer the Analysis Services

Still in your App click on API Permissions

Next View API Permissions

+ Add a permission and find Azure Analysis Services in APIs my organisation uses

Ensure that the Permission allows you to read and write all models (Tick the box)

then Add permissions

Note that although permissions have changed we still need admin consent

Grant Admin Consent

Granting admin consent requires you to sign in as global administrator, an application administrator, or a cloud application administrator.

From Azure Active Directory go to App registrations and select the app

Within the App go to App Permissions

grant Admin Consent

Configure Permissions in Azure Analysis Services

Next we need to ensure the new Server principal created as server administrator permissions in Analysis Services

Open SQL Server Management Studio by Right clicking and Run as Administrator, Connect to the Analysis Service

Add in your User name (In my case the email address) but not the Password.

You add your Office 365 password when you sign into your Account (Next)

Right click on the Analysis Services name and go to Properties, Then Security

Click add, then Search for the app that has been created. This can then be added as a server administrator (Allows it to add models, amend models, refresh data etc)

Click OK

Create the Logic App

Finally, we can create the Logic App in Azure

The Logic App will be triggered by a HTTP Request which will be triggered by Data Factory the Azure Orchestration tool

In Azure Go to Logic Apps and Add

Then New Step and search for HTTP and then HTTP

Method

Post. Post appends the form data inside the body of the HTTP request

URI (Unified Resource Indicator)

URI = https://your server region/servers/aas server name/models/your database name/refreshes

Here is my example:

https://ukwest.asazure.windows.net/servers/adventureworksas/models/AdventureWorksV2/refreshes

The Server is the Analysis Services created in Visual Studio.

The Model is the model that we want to refresh. There may be multiple models on the server

Headers

As per the example I have set to Content-Type and application/json

Queries

Nothing set

Body

This is where you set up al the processing information

{

    “Type”: “Full”,

    “CommitMode”: “transactional”,

    “MaxParallelism”: 2,

    “RetryCount”: 2,

    “Objects”: [

        {

            “table”: “DimCustomer”

        },

        {

            “table”: “DimDate”

        },

      {

            “table”: “DimEmployee”

        },

      {

            “table”: “DimProduct”

        },

      {

            “table”: “DimProductCategory”

        },

     {

            “table”: “DimProductSubCategory”

        },

     {

            “table”: “DimSalesTerritory”

        },

     {

            “table”: “FactResellerSales”

        }

    ]

}

Type: Full

the type of Processing to perform. refresh command types can be

  • full – Process the Entire Model and recalculate all the dependents (Measures, columns etc)
  • clearValues – Clear values in the objects and dependents
  • calculate – recalculate your formulas
  • dataOnly – refresh the data in the objects, does not force recalculation
  • automatic – If the object needs refreshing and recalculating then do so.
  • add- Append data to the partition and recalculate dependents (measures, columns etc)
  • defragment – Defragmentation option will clean up values in dictionaries that are no longer used

CommitMode: transactional

Determines if objects are committed in batches of when complete. Models include

  • default
  • transactional
  • partialBatch

Its always a good idea to set up Partitions within your data, especially on the Fact table to process the data

However the above is an example specifiying what to process and when. You can simplify the code if you wish

{

    “CommitMode”: “transactional”,

    “MaxParallelism”: 2,

    “RetryCount”: 2,

    “Type”: “Full”,

}

The following is an example of adding Partitions into your code. In this example, DimCustomer is one Partition.

“table”: “DimCustomer”,

“partition”: “DimCustomer”

The fact table can be made up of many partitions. For example you could have one for each year. Then you can decide which partitions to process. For example, you may only want to process the current years data.

Authentication

Active Directory OAuth

Tenant

Use the Tenant ID that we collected previously

Audience

https:/*asazure.windows.net

Client ID

Use the Client ID that we collected Previously

Credential Type

Secret (Remember that we added the Secret to the App)

Secret

The Secret ID that we saved (Remember, for this one you cant go back to Azure and look at the secret. Its a one time ID Only)

Save and test the Logic App by Clicking Run

Consume Logic App with Azure Data Factory

We now have a logic app that will incrementally process the data. We now need to schedule this by creating a Data Factory or orchestrate the processing

First we need to copy the HTTP POST URL within the logic App When a HTTP request is received activity

And then create a new Data Factory in Azure

for the time being I am not enabling GIT

In author and Monitor create a pipeline and drag a web activity across to the design pane.

And then set up the schedule by adding a trigger

this should now be enough to process all your data into the Analysis Service model which has a line connection into Power BI.

What we need to do is to check that this is doing its job. i will be looking at this in a later blog post