After 14 years using Microsoft On Premise BI Tools (SQL Server, Reporting Services, Integration Services and Analysis Services) Its time to embrace Business Intelligence in the cloud.
Thank goodness for the amazing people on the Power BI Forums helping with this. I have the solution but what I wanted to do was to create a post to explain the why
The Problem
I have accounts and events occurring on each account. What I want to know is the time period they are open
Just for this example I have taken One account with 11 activities
If you run the visual against start date all you get is a measure againt the start time period.
We are more interested in creating a measure to show how long the activity went on for.
For this example I am going to choose Jan 28 2017 to work with to continue because the end date for this is March 21st 2017 so this activity should span 3 months
The Date Dimension
To do this you must have a date dimension. For the purposes of this example I’m going to create one in DAX (Create table) Using the following:
Notice that I have used Start and End Dates to set the min and Max years. You should already have a date dimension to work with though.
Set up the relationship between the Fact table and Date dim
Start date is the Active join and End Date is the Inactive Join
You could create a role playing dimension for the End date and have that as an active join but in this case I’m going to go for the above design.
Account Start flag (New Measure)
CALCULATE – You can modify your expression using a filter
ALLSELECTED – If you have a slicer set to Year (Date) , ALLSELECTED takes the ‘Context’ from the slicer Like 2018.
Date is less than or equal to the Max Date in the Filter Context
Account End Flag (New Measure)
ENDOFMONTH – Returns the last date of the month in the current context for the specified column of dates.
The date at the end of the month is less than the Max date in the Filter Context
USERELATIONSHIP Because End Date is the inactive relationship we need to specify that this relationship is used, Not Start Date
Add an Open Measure
To understand the results we need to try and understand how the new measures work
Jan
Start Flag =Â Â Â Â Â Â Â 1 because 18th Jan is less than Jan 31st
End Flag  =       NULL because March 21st (Reset to March 31st) is NOT Less than Jan 31st
Open =Â 1- null = 1
Feb
Start Flag =Â Â Â Â Â Â Â 1 because 18th Jan is less than Feb 28th
End Flag  =       NULL because March 21st (Reset to March 31st) is NOT Less than Feb 28th
Open =Â 1- null = 1
Mar
Start Flag =Â Â Â Â Â Â Â 1 because 18th Jan is less than Mar 31st
End Flag  =       NULL because March 21st (Reset to March 31st) is NOT Less than March 31st
Open =Â 1- null = 1
Apr
Start Flag =Â Â Â Â Â Â Â 1 because 18th Jan is less than April 30th
End Flag  =       1 because March 21st (Reset to March 31st) is Less than April 31st
Open =Â 1- 1 = 0
There are usually lots of ways to answer one question so this is one of many but the Open Flag now allows me to pull up all my activities and show in a Matrix where the are open.
Dataflows are essentially an online collection and storage tool. Power Query Connects to data at source and collects and transforms that data. The dataflow then stores the data in a table within the cloud. They are stored in Data Lakes which is all automated for you.
Dataflows unify data from all your different sources. It should be noted that a Data Warehouse is still the recommended architecture with Data Flows over the top.
Dataflows also introduce the concept of the Common Data Service (CDS) and the Common Data Model (CDM). CDM allows organisations to use data formats to provide consistency across deployments. Now we have Azure Data Lake gen2 storage can be combined with data flows to store the data flows to provide and structured centralised data source.
When you integrate CDM with Data Lake Gen 2 you get structural consistency and you can use CDM Folders in the lake that contain your schemas in standard CDM format.
Dataflow definitions and data are stored in Model.json format. If you have a Model.json file, it shows that you are compliant with CDM.
Dataflows store their definition and data in CDM folders, in the Model.json formats. If you have Model.Json it shows you are adhering to CDM.
Of course, this can be quite difficult when you are working with data that does not adhere to CDM format. I can see it being really useful when you are starting from scratch but I have done some work looking at my data sources and they are always quite far from CDM format.
Advantages to setting up your own Data Lake Gen2 Store
Essentially, you can get by using the internal Data Lake Store but there are lots of reasons (And some of these advantages will be coming in future releases) why setting your own store up is a good thing.
Great for Re-use, If you are given access to the workspace you can use the dataflows already created for you.
Centralise your prepped data
Remember this doesn’t replace your data warehouse. It just adds a preparation and transformation layer above the data warehouse. Instead of having to wait to get your small change added to the warehouse you can add it to this layer.
Data within your data lake can be integrated into other solutions by developers
Data Lake Gen 2 is enormously scale-able for data
Dataflow data and definition file can be leveraged by developers for AI Services
Steps to Set Azure Data Lake Gen2 as Dataflow Storage
The storage account must be created in the same Azure Active Directory tenant as the Power BI tenant.
The storage account must be created in the same region as your Power BI tenant. To determine where you Power BI tenant is located
The storage account must have the Hierarchical Name Space feature enabled.
Power BI service must be granted a Reader role on the storage account.
A File system named powerbi must be created.
Power BI services must be authorized to the powerbi filesystem you create.
Once configured, it cant be changed. You cant go back to the default data store for your dataflows
Set Storage Account in Same Region as your Power BI tenant
Log into Power BI service , then Help ? and About PowerBI
The Data is Stored in North Europe Ireland
When you set up your Data Lake Ensure North Europe region is selected
Set up Azure Data Lake V2 in Azure Portal
When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1
So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)
First of all I go to the Subscription where I want to add the new data Lake v2
Open up the Portal menu (Now hidden to the left of the screen)
Choose Create a resource
next Choose Storage and Storage Account
Note that the Account kind is Storage V2 (General Purpose)
Make sure Location is the same as Power BI Service. I’m not using this functionality at the moment but there is no harm in applying this logic.
It is recommended to set replication setting to Read-access geo-redundant storage (RA-GRS)
For the time being, I am leaving everything else as standard
Next go to Advanced
the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2
Click Review and Create
Grant Reader Role to Power BI Service
This is all built in so it should be fairly straightforward.
In Azure go to your new storage account (If you aren’t already in it) and go to Add role Assignment
One there, choose the reader role and select Power BI Service which as you can see is already in the list.
It takes about 30 minutes for this to take effect.
Create a powerbi file System
Power BI Needs to use a Filesystem specifically named powerbi so data flows can be stored in this specific file system.
We now have a few options available to us. I have some files to add so I am going to add them to a container
Click on Containers And then + File System
Note that to store dataflows its called powerbi Click OK
clicking on your new storage account(s) you are told to Download Azure Storage Explorer.
I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.
Once downloaded Open Azure Storage Explorer
You will need to Add in your Azure Storage Accounts by clicking the little connector icon
You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication
This will log you into all your Subscriptions and Services
You are good to go
Here you find your subscription, Then go to the data Lake Storage Gen 2 and find the new File system powerbi.
Grant Power BI permissions to the file system
Before we connect we need to grant permission for Power BI to use the File System (Again this is specific to using DLV2 as a data flow store but at this point we may as well set up the permissions)
go to Azure Portal and Azure Active Directory
then select Enterprise Applications
Change the Application type Drop down to All Applications
Power Query Online and Power BI Premium and Power BI Service are in the list.
You will need the Object IDs of these applications.
Back to Azure Storage Explorer (Manage Access)
Navigate back to powerbi file system, Right click and Choose Manage Access
Click Add, Grab the object ID of Power BI Service to Manage Access
Set Read, Write and Execute Access to Service and Repeat the Process for Power BI Premium
Repeat for Power Query Online but Set Write and Execute Access
Other Also needs setting up as follows
Connect the datalake Gen Storage Account to Power BI Dataflows
To do this you need to be Power BI Admin. go to Power BI Service and navigate to the Admin Portal
From here Connect your Azure Data Lake Storage Gen2.
Add your Subscription ID, Resource group Name and Storage Account name of your Data Lake
It is now connected to Power BI
Allow Admins to Assign Workspaces
Finally, still in Admin Portal, go to dataflow Settings
Switch Allow Workspace admins to assign workspaces to this storage account to On
Workspace admins can now assign workflows to the filesystem created
Things to Consider
Here is where it starts to get a little bit hazy
This is all very much still in preview and there will be lots of updates coming
Once your dataflow storage location is configured it Cant be changed so dont do this on a whim.
You have to be an owner of the dataflow or be authorised to the CDM folder in the data lake to use the data flow
Once you have created a dataflow you ant change the storage location
It is your organisations data flow so there can only be one.
because of this, and the fact that its still in development I am going to wait to set up a central storage account for our workflows.
Im still unsure what you would do with Workflows that are already set up, Do they stay in the default area or can you reprocess them into the central data lake.
What happens if you want to move the data lake to a new subscription? is it not possible?
I will be going back to this when I have a few more answers to these questions
Previous to the brand new Azure Data Lake, I was adding all the files into Blob Storage. However Azure Data Lake V2 is built on Blob storage and DataLake V1
its built for big data and a fundamental change is that we now have a hierarchical namespace. This organises your files into directories.
So now, we can do things like use all files from a specific Directory, delete all files from a specific directory. We can categorise our files within the data lake.
Set up Azure Data Lake V2 in Azure Portal
When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1
So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)
First of all I go to the Subscription where I want to add the new data Lake v2
Open up the Portal menu (Now hidden to the left of the screen)
Choose Create a resource
next Choose Storage and Storage Account
Note that the Account kind is Storage V2 (General Purpose)
Ive set the Location to North Europe, Simply because I know thats where our Power BI Data in Services is stored and I may as well stick with this.
For the time being, I am leaving everything else as standard
Next go to Advanced
the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2
Click Review and Create
Create a file System within a Container
We now have a few options available to us. I have some files to add so I am going to add them to a container
Click on Containers And then + File System
Click OK
clicking on your new storage account(s) you are told to Download Azure Storage Explorer.
I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.
Once downloaded Open Azure Storage Explorer
You will need to Add in your Azure Storage Accounts by clicking the little connector icon
You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication
This will log you into all your Subscriptions and Services
You are good to go
Here you find your subscription, Then go to the Data Lake Storage Gen 2 and find the new File system.
I have added a folder here called Workshop1Files to my File System
Obviously Data Lake Storage gives you so many ways of working with files and automating the files to the storage area. In this case I am going to simply move a file into my new folder to work with
Double click on the folder and then Click Upload and Upload Files
And now your file is in the cloud, in an Azure Data Lake ready to use.
Connect to your Azure File with Power BI Desktop
The first test is can we access this data within Power BI Desktop.
Open Power BI Desktop and Get Data
Choose Azure Data Lake Storage Gen2 (Currently in Beta)
Add the URL
Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder>
Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder>
If you go to Right click on the file in Storage Explorer and go to properties, there is a difference in structure
Connect to your Azure File with Power BI Data Flow
I am creating data flows in the power BI Service to ensure they can be reused across the company. The question is, Can I Connect to the above File in Service via a data flow
In Power BI Service, add a Data Flow which takes you into Power BI Query Editor in the Service. I already had some data flows connected to an Azure database.
The data is in Azure Data Lake Storage so the first think I do is try the Azure route
However, there is no Azure Data Lake Storage Gen 2. This must be something coming in the future. so then I go to File and click on Get Data text / csv
You will need to add the File Path and your Credentials (As per previous advice use dfs not blob in the URL. this seems a little flaky at the moment. I choose Organisational Account first before adding the URL and then it seems to work.
Remember Go back to Azure Storage Explorer. if you click on properties, you can grab the URL from here
We don’t need a Gateway Setting up because everything is now in the cloud.
Clicking next, Nothing happens, it just keeps bouncing back to the same window.
Attempting to use the Blob Storage connector also doesn’t work (Using the Azure Account Key as authentication).
with blob in the URLWith dfs in the URL
It would appear that currently I have hit a brick wall and there is no current DLGen2 connector for Data Flows.
I will be keeping an eye open on this because obviously, when you are pushing the new generation of Data Lakes and Data Flows then there needs to be a DLGen2 Connector for Data Flows.
Update
Had a reply back on the Power BI Forum (Not a good one)
To look at the new power BI Updates I went for my go to Data Set Adventureworks data warehouse
I have added the following tables
DimEmployee
Dimdate
DimSalesTerritory
DimProduct
DimProductSubcategory
DimProductcategory
Dimcustomer
FactInternetSales
Use the Updated Ribbon
Once I have added the data I want to start using the new power BI Ribbon in preview. go into Options Preview Features and select the Updated Ribbon feature
You have to restart to get the new ribbon
It looks quite different. Lets see how it goes…..
Decomposition Tree
A brand new AI visual available in November is the Decision tree. I have used the Key influencers with social media data which is great so i am very excited about this new visual.
With the new visual, we can do route cause analysis, Look how individual categories contribute to a whole and we can also rank categories against the selected measure.
Again, this is a preview feature so it needs enabling in options
Just like the Key influencers visual, we need something to Analyse and data to explain the analysis by.
Unlike Key influencers where we analyse by category (e.g positive, negative and neutral) Here we are analysing a metric
Lets pick something to look at
class breaks down the sales amount
Next, i clicked on the + next to H and chose product category. We can now see that Bikes takes up the majority of the H category
Next try using the AI feature, Options are: High and Low value. I’m going to choose high value. It will look though all the categories that are left and pick the category with the highest contributor to that metric
The lightbulb against the title lets you know its picked by AI
lets do the same but look at lowest value
Its is picking out Central region as the very lowest of the explain by values
This is fantastic, You can also lock a level so a user cant remove it, change the colours etc. I cant wait to use this on my actual working data sets
Advanced Gauge
Straight away, the new ribbon changes how you go to the marketplace
Now you go to More Visuals and From Appsource to get to the marketplace
go for the Advanced Gauge from xViz
I have Year to date Internet Sales and Quantity of products bought
I also have measures for Last year Year to Date Sales and quantity so I can add these into the gauge.
There are other options like setting minimum and maximum values and tooltips
Hierarchical Filter
I have used the Hierarchical slicer before, lets see how the new Filter looks (get from Appsource – Hierarchical Filter by Eviz)
You need a hierarchy in your data to use this visual. I have the Group Hierarchy (Group Country and region) One of the best things about this visual is that, not only do you use it to slice your data but you can add a measure which gives you even more information right on the slicer its self.
However, I don’t think it comes across as a slicer, something that the user can interact with.
Financial reporting Matrix by Profitbase
A Matrix with financial formatting options
The video talks about Sticky rows and sticky columns in that you can set them to always be on screen even while scrolling which is good.
However the Information on the Overview video uses
There is a note on this one. Its really hard to delete, it took me a fair while the other day to click it in the right place to delete what is possibly a measure called DateCalculations in the columns section so this may be one to look at in more detail later.
Distribution Visual
More Visuals – From App Source
This is supposed to be great for setting up a global legend, however no matter what I tried, I couldn’t get the visual to display any information so this one is a fail for me
Conclusion
Some Pretty good updates this month, Im specifically excited about the Decomposition tree.
Another major one for me is the LinkedIn Sales Navigator connector. However I’m not sure I have the right set up to use it at the moment.
the video explains that you need Enterprise Plan in either admin or reporting user roles you can use the connector but what is not mentioned is what this is? Is this Linked In?
I will be looking at this in more detail hopefully at a later stage
I have recently been setting up training sessions for Analysis Services and Azure Security has been one of the biggest learning curves.
Lets have a look at what we needed to do with our Adventureworks demo and all the areas of Possibility
Azure Analysis Service Access Control (AIM)
I have created an analysis Service in Azure which makes me the Administrator. I can do everything, including add models that I create in Visual Studio 2019.
However I will be working with lots of other people. In the first instance they will be using the models I have set up to create reports in Power BI
This is only a training demo and is not how I would set up a fully functioning Live Analysis Service model
in Azure we can add role assignments (Click on Access Control (IAM) but what does this mean in terms of actually using the service?
I am an owner. Azure has a number of built in roles role-based access control (RBAC) to use
Owner – Has full access to all resources including the right to delegate access to others.
Contributor – Can create and manage all types of Azure resources but can’t grant access to others.
The model above shows how you can use RBAC to segregate duties.
Using the following information, I set my users up as contributors in the hope they would be able to access the models
With Contributor in place the users could see the Analysis Service in Power BI but didn’t have access to the model
As a test the users were bumped up to Owner and still couldn’t see the models in Power BI
Azure AIM is in relation to the Analysis Service, Not the models created on the Analysis Service
Visual Studio 2019 (Roles)
The next step is to see what happens when we create a role in Analysis Services model in Visual Studio
In tabular Explorer (Make sure you have downloaded the Analysis Services extension for Visual studio) click on Roles. I have created a training role for this demo
I gave all the Users the permission of Read and Process
Permissions
Description
None
Cant modify the model in SSMS. Cant query data.
Read
Cant modify the model in SSMS. Can query data.
Read and Process
Can query data. Can run process operations
by running a script or package Cant make any changes to the database. Cant
view the model in SSMS
Process
Cant query data. Can run process operations
by running a script or package Cant make any changes to the database.
Administrator
can
modify the model schema and query data
Once the training role was set , the model was reprocessed
Users now have access to the Analysis Services model and can use the model in Power BI
The question is, can the users access the model if they are in this Role, but they are not in Analysis Services (AIM)
Both users were taken out of AIM in Azure and still had access to use the models
New requirement for users to add new models to the Analysis services Database
the next stage was to allow the users to create their own adventureworks models in the Analysis Service.
With all the above security in place, when a user Processed their tabular model, the following error occurred
This user has owner Access in Azure IAM and Read and Process Access on the Analysis Services model
How do we allow users to actual create and modify models?
SSMS – Roles
SQL Server Management Studio is the place to grant model permissions
Open up SSMS as Administrator by Right Clicking and Run as Administrator.
Note that you can see the Training role set up in visual Studio. originally, Full control was unticked.
I ticked Full Control which is immediately set on the model. No need to process anything.
The user tried again and still couldn’t add the new model to the Analysis Service
Only server administrators, or database administrators having Full Control permissions, can deploy a cube from source files to a server, or create roles and assign members.
It would appear that Full Control (Administrator) is not the same as server administrator because you cant deploy a tabular model
Question, If I Re-process the visual Studio model with permission set as read and Process, will this override the new selection in SSMS?
Yes. If you reprocess your tabular model (Because you have updated the model, Added new measures etc) , Role permissions from here will override the Role permissions set in SSMS. Ensure that any Roles changed in SSMS are reflected in Visual Studio
SSMS – Analysis Services Properties
right click on the Analysis Service model (Under Connect) and Choose Properties
then select Security
Note that this is where Server Administrators are assigned and only these people can create and deploy their own models to the analysis Service
Going back to Azure, you can also set up Admins in here
Note that everything we just did in SSMS is available in Azure. Also, if we redeploy the Analysis Services Tabular model via Visual studio, it doesn’t do anything to Analysis Services admins
Our roles aren’t available to view in Azure . Just Analysis Services Admins
Everything is now working
There is still questions to be answered. Feel free to let me know your thoughts
When would you need to add users to Azure IAM if you can use the tabular models and also deploy models by adding users to Roles and the server administrator role in visual Studio and SSMS?
We have an Analysis Service on our Training Subscription and resource group and obviously its only being used within office hours as its only a training AS
the last thing we want is to have someone remembering to pause the service every night. As soon as you forget to do this you end up paying for the service when you don’t need it.
Clearly, this needs automating. There are lots of how to’s on line for this so lets see how easy this is to set up
What you need
Azure account
Azure Analysis Services Model
Create an Automation Account
You need an Azure Automation account to run an Azure runbook with powershell code
Search for Automation Accounts
Make sure Create Azure Run as Account is Yes
Set up Credentials
Runbook credentials are very similar to Azure SQL Server credentials. in your new Automation Account click on Credentials
Ive set up a new account for azure_trainingmantenance with the same user name and and a password that I have saved for later.
Connections
to see all your information like tenant ID and Subscription ID simply click on Connections
these fields can be added to the Powershell code later
Add Modules
All the Analysis Services methods (cmdlets) are in Powershell modules and will need to be installed if they are not already.
Go to modules and check AzureRM.Profile is available
If not then we need to add it. Go to Browse Modules Gallery
it takes a while for this to import. You need AzureRM.Profile for the next on AzureRM.AnalysisServices. repeat the process. In modules search. then go to Modules gallery, Search and Import
You now have the cmdlet for Analysis Services within the new Automation Account
Create a RunBook
click on Runbooks and add new RunBook
Now its time to add the Script to set up the Automation
You can get your Servername by going to Analysis Services in Azure and copying the Server name (This RunBook will pause and unpause this specific Analysis Service)
Another important part of the code is sorting out weekdays and weekends according to the SKU (Stock Keeping Unit)
Our training Analysis Service is in SKU S0 (Standard 40 query processing units) and we only need it running in the week
{
WeekDays:[1,2,3,4,5] ,StartTime: “”08:30:00″” ,
StopTime: “”17:59:59″” ,Sku: “”S0″”
}
Note, you could also use the code to scale your Analysis Services up and down to keep down costs. S1 has more processing units for peak hours and is more expensive. S0 with less Processing units could be for off peak hours. In this case you could set the code to not only set times to Pause and Un pause but to add and remove processing units
{
WeekDays:[1,2,3,4,5] ,StartTime: “”08:00:00″” ,
StopTime: “”17:59:59″” ,Sku: “”S1″”
} ,
{
WeekDays:[1,2,3,4,5] ,StartTime: “”18:00:00″” ,
StopTime: “”23:59:59″” ,Sku: “”S0″”
} ,
{
WeekDays:[6, 0] ,StartTime: “”08:00:00″” ,
StopTime: “”23:59:59″” ,Sku: “”S0″”
}
Test Runbook
Whilst in Edit mode you can also test your Runbook
If there are any issues, your test will flag these up. My first test brought up the fact that I had entered the wrong Resource group name
Go back to your Edit PowerShell Runbook using the breadcrumb trail (You don’t need to save the test)
Save and Publish your RunBook
Create a schedule
Azure Automation is enabled to schedule a run book one time or a reocurring schedule
I believe that in this instance we just need the scheduler to kick off our Runbook which contains all the code required to pause and restart the Analysis Service
Starts today
the schedule is now linked to the runBook
Lets see it in Action
the RunBook is now ready to go. Currently the training Analysis Services is Off, Once the Schedule kicks in will this turn everything back on again?
The Run book is Successful.
After 6 pm the Analysis Service is paused and when I get back into the office at 8.30 the server has restarted.
I am pulling tweets into an Azure Table Storage area and then processing them into a Warehouse
The following shows the very basic Data factory set up
Connections
I have created a Linked Service for the Azure Storage Table PowerBIMentions
And another Linked Service for my Azure SQL Server Table PowerBIMentions
Datasets
the Storage Table was set up in Logic Apps. I wont go into detail about the schema here
I have created a SQL Table to Mirror the Azure Table Storage
Watermark Table in SQL
The Watermark Table is as follows
CREATE TABLE [staging].[watermarktable]( [TableName] varchar NOT NULL, [WaterMarkDate] [datetime] NULL, [WaterMarkValue] varchar NULL, [WaterMarkOffSetDate] datetimeoffset NULL ) ON [PRIMARY]
For the time being, the Watermark value is to set the Date in the same format as is in the Azure Table storage.
Ignore the offest date for the time being
I have then created a stored Procedure to add the table name storage.PowerBIMentions, the maximum created Date from what is in the table and then some extra script to format that date into the correct format in Azure Table storage.
To understand how this should be formatted you can look at the Azure Table using Microsoft Azure Storage Explorer
As you can see the format is YYYY-MM-DDHH:MM:SS.000Z
I haven’t included the code for the SP here but it is fairly straightforward. You take the max date from the table you are interested in and also include code for format the value to the correct value
Pipeline
Added a Pipeline called CopyPowerBIMentionsIncremental
The Source is the Storage mentions table
the Sink (destination) is the Azure SQL Table
And then we can easily import the schemas and map
Then run the very simple Pipeline which just moved the data. Then Runs the Watermark Stored procedure to recreate the MAX Date time
Change Pipeline to include a lookup activity
Now the next time we want to process we have to only process new records into the table.
the Query to select the correct date from the Watermark table
SELECT MAX(WatermarkValue) AS WatermarkValue From [staging].[watermarktable] WHERE WatermarkValue IS NOT NULL AND TableName = ‘staging.PowerBImentions’
I have been setting up a Logic App to add tweets to Azure table storage. While setting up a loop to add in Keywords into a table (1 tweet may add multiple rows to the Keywords table) I came across this error
the tweet has generated 10 rows of data (Because there are 10 keywords)
I have set up an increment counter to add to the ID so each one will get 1 2 3 and so on. If you toggle through the records you notice this
record Increment
1 1
2 6
3 3
4 7
5 2
6 4
7 10
8 8
9 9
10 5
And the proceeding table storage doesn’t work because of conflicts. clearly the increment isn’t happening in order. there is a quick solution to this one
go into Edit mode in the Logic App
every for each with an increment variable click … to get to Settings
make sure concurrency is set to on and that the degree of parellelism is one only. Setting to 1 will ensure the iterations are sequential
You have a shiny new Analysis Services model containing all the data you need for your Power BI reports and Dashboards
It has been added into an Analysis Service because A There is a lot of data, possibly too much for a Power BI Imports, B You want to do incremental processing without needing Power BI Premium and C You want to refresh more than 8 times a day.
Everything is all set up but the final part of the puzzle is how to schedule the refresh into the Analysis Services Tabular model.
Its easy enough in Power BI, You simply set the schedule in the Service. Lets have a look at how to do this using one of the Options. Logic Apps
Create a Service Principal (SPN)
First we need to Create a new Azure Active Directory Application and Service Principal to use with Role based Access Control. Whenever you have code that needs to access and or modify resources (The logic App will Refresh the Analysis Services data) You need to create an Identity for the App.
Sign into Azure
Go to Azure Active Directory
Next go to App Registrations and + New registration
The web URL is simply one set like the example in the documentation
Then click Register
Assign the Application to a role
Before you assign resources,, the app must be assigned a role. For this example we are going to assign a role at the subscription scope for our Proof of Concept Subscription
Go to All Services and then Subscriptions
Im going to select the Proof of Concept Subscription
Next select Access Control and add Role assignment
Note I have selected the new analysisservicesapp
Click save to add the Role Assignment
The Service Principal is set up. Now we need to get the values so we can sign into the app
Get Sign in Values
Go to Azure Active Directory
Go to App Registrations again and select the new application
Copy the Tenant ID for later use
Copy the Application ID for later use
Set up your Credentials
I was going to create a certificate but I found it an extremely complicated process and I couldn’t in the end export the created certificate in Powershell
Because I haven’t been able to Create and Export a certificate I am going to use Client Secrets instead
Click on New Client Secret
Once added you need to copy the ID for use later. You wont be able to access this information again so DONT LOSE IT
Give the Service Principal Name Authority to administer the Analysis Services
Still in your App click on API Permissions
Next View API Permissions
+ Add a permission and find Azure Analysis Services in APIs my organisation uses
Ensure that the Permission allows you to read and write all models (Tick the box)
then Add permissions
Note that although permissions have changed we still need admin consent
Grant Admin Consent
Granting admin consent requires you to sign in as global administrator, an application administrator, or a cloud application administrator.
From Azure Active Directory go to App registrations and select the app
Within the App go to App Permissions
grant Admin Consent
Configure Permissions in Azure Analysis Services
Next we need to ensure the new Server principal created as server administrator permissions in Analysis Services
Open SQL Server Management Studio by Right clicking and Run as Administrator, Connect to the Analysis Service
Add in your User name (In my case the email address) but not the Password.
You add your Office 365 password when you sign into your Account (Next)
Right click on the Analysis Services name and go to Properties, Then Security
Click add, then Search for the app that has been created. This can then be added as a server administrator (Allows it to add models, amend models, refresh data etc)
Click OK
Create the Logic App
Finally, we can create the Logic App in Azure
The Logic App will be triggered by a HTTP Request which will be triggered by Data Factory the Azure Orchestration tool
In Azure Go to Logic Apps and Add
Then New Step and search for HTTP and then HTTP
Method
Post. Post appends the form data inside the body of the HTTP request
URI (Unified Resource Indicator)
URI = https://your server region/servers/aas server name/models/your database name/refreshes
The Server is the Analysis Services created in Visual Studio.
The Model is the model that we want to refresh. There may be multiple models on the server
Headers
As per the example I have set to Content-Type and application/json
Queries
Nothing set
Body
This is where you set up al the processing information
{
“Type”: “Full”,
“CommitMode”: “transactional”,
“MaxParallelism”: 2,
“RetryCount”: 2,
“Objects”: [
{
“table”: “DimCustomer”
},
{
“table”: “DimDate”
},
{
“table”: “DimEmployee”
},
{
“table”: “DimProduct”
},
{
“table”: “DimProductCategory”
},
{
“table”: “DimProductSubCategory”
},
{
“table”: “DimSalesTerritory”
},
{
“table”: “FactResellerSales”
}
]
}
Type: Full
the type of Processing to perform. refresh command types can be
full – Process the Entire Model and recalculate all the dependents (Measures, columns etc)
clearValues – Clear values in the objects and dependents
calculate – recalculate your formulas
dataOnly – refresh the data in the objects, does not force recalculation
automatic – If the object needs refreshing and recalculating then do so.
add- Append data to the partition and recalculate dependents (measures, columns etc)
defragment – Defragmentation option will clean up values in dictionaries that are no longer used
CommitMode: transactional
Determines if objects are committed in batches of when complete. Models include
default
transactional
partialBatch
Its always a good idea to set up Partitions within your data, especially on the Fact table to process the data
However the above is an example specifiying what to process and when. You can simplify the code if you wish
{
“CommitMode”: “transactional”,
“MaxParallelism”: 2,
“RetryCount”: 2,
“Type”: “Full”,
}
The following is an example of adding Partitions into your code. In this example, DimCustomer is one Partition.
“table”: “DimCustomer”,
“partition”: “DimCustomer”
The fact table can be made up of many partitions. For example you could have one for each year. Then you can decide which partitions to process. For example, you may only want to process the current years data.
Authentication
Active Directory OAuth
Tenant
Use the Tenant ID that we collected previously
Audience
https:/*asazure.windows.net
Client ID
Use the Client ID that we collected Previously
Credential Type
Secret (Remember that we added the Secret to the App)
Secret
The Secret ID that we saved (Remember, for this one you cant go back to Azure and look at the secret. Its a one time ID Only)
Save and test the Logic App by Clicking Run
Consume Logic App with Azure Data Factory
We now have a logic app that will incrementally process the data. We now need to schedule this by creating a Data Factory or orchestrate the processing
First we need to copy the HTTP POST URL within the logic App When a HTTP request is received activity
And then create a new Data Factory in Azure
for the time being I am not enabling GIT
In author and Monitor create a pipeline and drag a web activity across to the design pane.
And then set up the schedule by adding a trigger
this should now be enough to process all your data into the Analysis Service model which has a line connection into Power BI.
What we need to do is to check that this is doing its job. i will be looking at this in a later blog post
If you do the usual measure SUM(Total) you would get
ID Total
99 2
101 3
333 4
635 4
But I don’t want the visual to show the above. I want the number of occurences on the axis and then the number of IDs within that occurence as the value
You cant use a measure on the axis, they can only be used as values. How you you solve this problem?
SUMMARIZE Returns a summary table for the requested totals over a set of groups. In this case, Our group is the IDs.
For the Summary we are simply counting our IDs as the Frequency
Next, Add a new Table
PotentialFreqs = GENERATESERIES ( MIN (SummaryTable[Frequency] ),max('SummaryTable'[Frequency]), 1 )
This creates a list of Frequencies starting at 1 and Ending at our Max Frequency. there is is issue here. we have a lot of NULL Values that creates a large Frequency number that we don’t want in our series of frequencies. this means we have values from 1 to 98 and there should only be values 1 to 4. How do we remove these values?
Note the addition of the IF statement. If the ID is not zero then count the IDs. Else do nothing
Create relationships
In order to use the new tables we need to create relationships between them
the Summary table is joined to the fact table by the ID (Business key) within the fact table
PotentialFreqs is joined to the SummaryTable via Value to Frequency
the first visual to try this out is a stacked bar chart. Value from PotentialFreqs table is in the Axis and Frequency from the Summary Table is used as the Value.
So we can see that 33 of our IDs have 3 records in the table.
the PotentialFreqs table is really useful if you want to plot your data in a histogram as it gives you a continuous type to plot against rather than categorical
In conclusion, if you need to use a measure on an Axis rather than a value, Create a summary table and then join to the main tables in your model
Update
Unfortunatly the above solution doesnt quite work because I want to base my metrics against each year. sometimes the IDs are split into Years. I will be looking at this later.