SUMMARIZE Returns a summary table for the requested totals over a set of groups. In this case, Our group is the IDs.
For the Summary we are simply counting our IDs as the Frequency
Next, Add a new Table
PotentialFreqs = GENERATESERIES ( MIN (SummaryTable[Frequency] ),max('SummaryTable'[Frequency]), 1 )
This creates a list of Frequencies starting at 1 and Ending at our Max Frequency. there is is issue here. we have a lot of NULL Values that creates a large Frequency number that we don’t want in our series of frequencies. this means we have values from 1 to 98 and there should only be values 1 to 4. How do we remove these values?
I have a drill through in my crime reports so you can drill through to the data from the top level report. However, sometimes I am aware that users struggle with these cool Power BI Features.
A new feature is that if you have a drill through, you get a Right click to drill through message on the tooltip. Is that enough? for some people its not. Some people don’t understand the concept of drill through. They may need more visual help.
And its all about making sure the users are getting the full benefit of Power BI so its worth adding a few extras
How about creating a short video of the drill through in action that you can add to the visual.
To create a video of the drill through I am going to use ezvid as its free for this demo. Camtasia is the solution I want to move to because this will save the file as an animated jpg.
To start to have a look at these new visuals I have downloaded some crime stats for my local area
I want to try out some of the new visuals in Power BI in the October Release whilst also doing my first public upload of a report. the visuals in the XViz suite can be used for free, you can purchase them with full functionality. I am using the Free Versions of these visuals
Hierarchy Variance Table (xViz Visual)
Compare measures within a hierarchical table and see the variance.
the first thing I need is a hierarchy which I created using Crime and Status.
E.g. Possession of Weapons
Court Result unavailable
I went to the market place and selected the new hierarchy Variance table
to do that I simply dragged Outcome Status over category to create a hierarchy.
Crime is simply a metric I added 1 for each row in order to Sum
Time period is our date Column
Note all the information you get from this new visual. A spark line and an indicator along with the metrics that the visual creates for the visual
In Formatting you can also go into format Number formatting and switch on Semantic Formatting that colours the cells based on positive or negative values
You can also right click on the items in category and Get access to options like what to display, sort and rearrange the hierachy.
So what does this tell us? I’m going to look at Robbery.
the first thing to note is that if you have a Filter on Year, If you select a Year the data disappears and you are left with just Baseline. If you have a Year Filter, remember to ensure Interaction is off for this visual.
we have 9 records in total. It seems to be taking Baseline as the total for 2018
For target it seems to be taking the data from the first quarter of 2019. This doesnt make sense to pitch 2018 against 2019 First Quarter
Lets see if Vehicle Crime makes any more sense
Again baseline appears to be the total for 2018. target of 4 could be Quarter 1 and Quarter 2? Missing Quarter 3. So the question is, How does it arrive at the baseline and target?
Unfortunately as yet, there is nothing on line that explains how these metrics are calculated which means we have no way of knowing at the moment. Until this information is clearly provided I would say that this visual is not fit for purpose.
I certainly wouldn’t want to demo the visual for a customer to say, what does baseline and target relate too? At present I wouldn’t be able to answer that question. If you have an idea please let me know.
Linear Gauge Table (xViz Visual)
Compare a Value (resolved) to a target goal (% Number of Crimes) Ideally we want 95% of these Crimes resolved in the data set
This is very straightforward. For Actual value I want resolved crimes and the maximum value will be the number of Crimes.
I have also set Condition to On in formatting where under 60% is red. everything over 90% is blue and everything in between is amber.
Straight away we are in the red for number of crimes solved
Tag Cloud (xViz Visual)
I use the Wordcloud for Twitter data where I have already used an Azure text API to create me a table of Keywords. tag clouds are very similar to Wordclouds and allow you to look at tags of words in the data
Tag cloud takes advantage of extra-Miler tags within the data set (The words that mean more within your data set) tag cloud finds these ‘Keywords’ and displays them.
In this case I am using category of crime
category = category
Size Value = Crime (Metric)
Its not new but I thought Key influencers would be a great visual for this report. I have added a conditional column against outcome to ensure we only had a few conditions to choose from
For No action taken, No suspect identified is the top influence. Further questions could be, why is this the case? Should we not be doing better on finding the person who committed the crime?
Publish to Web
I now have all the visuals I need to get the information out their. its not private data so I can publish to the web. I want to publish to this wordpress site
First of all, Lets publish to My Workspace in Power BI Service
then log in to check that the Report is in My Workspace
Publish to Web
Once there, go to File and Publish to Web
For the time being I have copied the Link and HTML
Install WordPress Plugin
You can install the WordPress plugin for PowerBI
However you need to be on the Business Plan to do this. I’m not on the Business plan. Its £20 per month.
So to end this post, here is the link to the public report
We are going to add some new information into the data set
In Power BI Service, Click Edit … against the #Taskmaster dataset
I’m actually going to add 7 more fields
Description, Sentiment and Name are Text
CreatedAt is a date
Count, Favorited and followersCount are Numbers. Click Done
Return to Microsoft Flow
Edit the flow and go to add rows to a dataset
Count is an Expression
It is simply set as 1
You can Test and Save
Any new data that comes through will contain these data items
Creating sentiment is slightly more involved but lets go ahead and create sentiment
After Detect Sentiment update your flow with the following logic
Create Sentiment which is Neutral. If Score is over 0.7 then
Positive. If under 0.3 then negative
We can add this to our data set
We now have lots of new information to use in our streaming reports
Back in Power BI Service
I’ve attempted to add a new Custom Streaming Tile to the dashboard based on a Line chart to look at the count of records
Unfortunately this Streaming visual doesn’t seem to work and immediately I can see a fatal flaw with using streaming data set visuals for this kind of data
These visuals are for data that pulls through almost constant stream of data. They are not for data that has a feed that at some points doesn’t have much data coming through. You need to have lots of data in the time frame of the streaming dataset (For example 60 minutes).
I have the #Taskmaster Stream set up for Historical Data Analysis
when Historic data analysis is enabled, the dataset created becomes both a streaming dataset and a push dataset
A streaming dataset, has no underlying database
A Push data set
For a push dataset has a few limitations on how much data can be pushed in:
75 max columns
75 max tables
200,000 max rows stored per table in FIFO dataset
5,000,000 max rows stored per table in ‘none retention policy’ dataset
The data is then stored in Power BI and we can actually access it from Desktop.
This means I can create some really nice reports against my taskmaster data set
You cant create Hierarchies of calculated columns over the Push data set. Only Measures. You are very limited on what you can do. The Key Influencers visual is also unable to work with push data which is a shame because this is the perfect visual for analysing positive and negative tweets
I should have brought date across as a date in the first instance because Month is now just a number and I cant change this in Power BI. We have date but only against the very latest data
Time was already in the data set but this consists of date and Time which you cant reset to create a date hierarchy
I cant add day names to the day number(Taskmaster is on on Wednesday so I expect the levels to go up then)
So the push data set is fairly simple to set up but its incredibly limiting to what you can do
Our initial page by Month. I needed to Add Month Name into the Flow
Next I drill through on the 9th of October. the name of the day would be great because taskmaster is on a Wednesday
Finally we drill through to the hourly tweets.
As you can see, We have a few negatives in the data set which, on reading aren’t actually negative. The Cognitive API doesn’t recognise sarcasm for instance.
There you go, We started out with a streaming data set and ended up with a push data set.
The push data set doesn’t need a refresh which is them main reason to go for them.
One last quick thing about this push dataset. When I attempted to republish the report I got the following error
To get past this error I had to delete my report in Power BI Service before trying again. I have never seen this kind of conflict before and I’m assuming its an issue with the Push data set
I was logging into Power BI this morning when I saw this exciting new feature
we are always looking at new solutions to provide good data lineage so this is well worth a look
Data lineage includes the data origin, what happens to it and where it moves over time. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process.
I have an App workspace set up for Adventureworks so lets have a look at Lineage using this project
Column 1 is my data source. I can see I’m using a local database and I’m also using an xlsx spreadsheet to bring in data.
In most of my projects I’m working on the ETL in Data factory, transforming data in Stored Procedures etc. for example, for a social media feed, I have a logic app that moves tweets to an Azure Data Storage NOSQL table. Data Factory then transfers this data across into a central Azure Data Warehouse. The Power BI Lineage would pick up at the data Warehouse stage. It wont take into account that there is a lot of work previous to this
Column 2 is the data set in Power BI
Column 3 provides Report information
Column 4 displays the Dashboards
You can click on a data flow node to drill down into more detail
Currently you cant go any further to look at the data items
Click on the Link icon to see the data flow for that item. In this case the Report.
This is a great start but there definitely needs to be more information here to make it something that you would want to use as a proper Data Lineage Tool
It would be good to see the fields in each Entity for the Data Sets
As an extra, it would be great to see what fields are being used in Measures and calculated Fields
Reports – For me, Id like to know for every page in my report
What field am I using from the data source
What calculated columns I have created (Even better with the DAX Logic)
Any Name changes from Data Source to Power BI
What measures I have created (Even better with the DAX Logic)
For the Dashboard, What items I am using in the dashboards (Fields, Measures, Calculated Columns
An Important part of data lineage is getting and understanding of the entire process. This includes data transformations pre Power BI. If you cant do that in here, it would be great to be able to extract all the information out so you can use it in some way with your other Linage information to provide the full story. for example:
Azure Data Catalogue
Azure Data Catalog is a fully managed cloud service. Users can discover and consume data sources via the catalog and is a single , central place for all the organisation to contribute and understand all your data sources.
I have already registered Our Data Catalog, and I have downloaded the desktop app
As an Example I want to connect to Azure Table Storage (Connect using Azure Account name and Access Key)
At this point I’m registering everything in the storage table. then I can view the information in the Azure Portal.
You can add a friendly Name, description, Add in expert (in this case me). Tags and management information
I have added Data Preview so you can view the data within the object. there is also documentation and Column information to look at
In the data catalog you can manually add lots of description to your tables along with documentation.
This is great for providing lots of information about your data . You can explore databases and open the information in other formats (Great if you need to supply information to another Data lineage package
I will be having a look at the Azure Data catalog in more detail later to see how it could help to provide full data lineage
Azure Data Factory
Data factory is the Azure ETL Orchestration tool. Go into Monitoring for Lineage Information. However, there doesn’t seem to be a way to export this information to use. Data Factory wont take into account the work done in, for example a stored Procedure
Again this is another area to look into more.
When you use Stored Procedures to transform you data, its harder to provide automated Linage on your code. There are automated data lineage tool for SQL out there, but it would be great if there was a specific Tool within Azure that creates Data Lineage information from your Stored Procedures
Azure Logic Apps
Data for my project is collected via Logic Apps before being Processed into an Azure Data Warehouse.
Essentially, we need out data lineage to capture everything all in one place.
And just as important. everything should be as automated as possible. If I quickly create a measure, the data lineage should reflect this with no manual input needed (Unless you want to add some description to the new measure as to why it was created)
RANKX – RANKX is an Iterator. It takes a table and an expression to evaluate. RANKX looks at each row in the table and running its evaluation which is to return the ranking for each row in the table argument. RANKX creates a row context because it’s an iterator
Row Context – Calculated at processing time instead of at run time. This is a calculated column rather than a measure because the data is set on each row. A good example of a Row context calculation is, Is this value greater than 100.If yes Set to True. If no set to False. This is applied on each row
CROSSJOIN – Cross join allows you to recreate a table from
all Tables and Columns in the cross join
ALL – Returns all the rows in the table, ignoring any applied filters
So essentially You are ranking against ALL months in the date and All groups within the other dimension table. We are ranking the Avg Fact measure
Now this works to some extent if you add in the table with the date filter set but no filter on the groups you want. Looking at January for example
However Later on in the data set the null values are set as
AND when I apply the filter of only having certain groupings in the table the RANKX fails even further
Note that the RANKX is still ranking every single value, even though we have applied a filter for the Group
This is because of the ALL. The Cross Join takes ALL month Values and ALL Group Values, and dismisses the fact that we have applied a filter
You need ALL because if you just ranked against the 1 value that the row context is on, all ranking would be 1.
ALLSELECTED is different to ALL because ALL calculates everything ignoring filters. ALLSelected Takes into account the filter on the visual. An important part of the solution is that we are slicing on Year and we are filtering for specific groups. As a consequence we need to use ALLSELECTED
Remember we are ranking the measure against each Group. What is the best Group this Month?
Next stop, ensuring we don’t rank against null values