Peak Indicators Article on Microsoft Azure

I just wanted to link to an article that I helped write with Paul Clough on working with Azure

https://www.peakindicators.com/blog/spotlight-on-microsoft-azure-does-it-live-up-to-it-s-reputation-what-is-it-like-to-work-with

It was a real joy to write and I mention my work with Social Media Analytics

Power BI Service Data Lineage View

I was logging into Power BI this morning when I saw this exciting new feature

we are always looking at new solutions to provide good data lineage so this is well worth a look

Data lineage includes the data origin, what happens to it and where it moves over time. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process. 

Wikipedia

I have an App workspace set up for Adventureworks so lets have a look at Lineage using this project

Column 1 is my data source. I can see I’m using a local database and I’m also using an xlsx spreadsheet to bring in data.

In most of my projects I’m working on the ETL in Data factory, transforming data in Stored Procedures etc. for example, for a social media feed, I have a logic app that moves tweets to an Azure Data Storage NOSQL table. Data Factory then transfers this data across into a central Azure Data Warehouse. The Power BI Lineage would pick up at the data Warehouse stage. It wont take into account that there is a lot of work previous to this

Column 2 is the data set in Power BI

Column 3 provides Report information

Column 4 displays the Dashboards

You can click on a data flow node to drill down into more detail

Currently you cant go any further to look at the data items

Click on the Link icon to see the data flow for that item. In this case the Report.

This is a great start but there definitely needs to be more information here to make it something that you would want to use as a proper Data Lineage Tool

  • It would be good to see the fields in each Entity for the Data Sets
  • As an extra, it would be great to see what fields are being used in Measures and calculated Fields
  • Reports – For me, Id like to know for every page in my report
    • What field am I using from the data source
    • What calculated columns I have created (Even better with the DAX Logic)
    • Any Name changes from Data Source to Power BI
    • What measures I have created (Even better with the DAX Logic)
  • For the Dashboard, What items I am using in the dashboards (Fields, Measures, Calculated Columns
  • An Important part of data lineage is getting and understanding of the entire process. This includes data transformations pre Power BI. If you cant do that in here, it would be great to be able to extract all the information out so you can use it in some way with your other Linage information to provide the full story. for example:

Azure Data Catalogue

Azure Data Catalog is a fully managed cloud service. Users can discover and consume data sources via the catalog and is a single , central place for all the organisation to contribute and understand all your data sources.

https://eun-su1.azuredatacatalog.com/#/home

I have already registered Our Data Catalog, and I have downloaded the desktop app

As an Example I want to connect to Azure Table Storage (Connect using Azure Account name and Access Key)

At this point I’m registering everything in the storage table. then I can view the information in the Azure Portal.

You can add a friendly Name, description, Add in expert (in this case me). Tags and management information

I have added Data Preview so you can view the data within the object. there is also documentation and Column information to look at

In the data catalog you can manually add lots of description to your tables along with documentation.

This is great for providing lots of information about your data . You can explore databases and open the information in other formats (Great if you need to supply information to another Data lineage package

I will be having a look at the Azure Data catalog in more detail later to see how it could help to provide full data lineage

Azure Data Factory

Data factory is the Azure ETL Orchestration tool. Go into Monitoring for Lineage Information. However, there doesn’t seem to be a way to export this information to use. Data Factory wont take into account the work done in, for example a stored Procedure

Again this is another area to look into more.

Stored Procedures

When you use Stored Procedures to transform you data, its harder to provide automated Linage on your code. There are automated data lineage tool for SQL out there, but it would be great if there was a specific Tool within Azure that creates Data Lineage information from your Stored Procedures

Azure Logic Apps

Data for my project is collected via Logic Apps before being Processed into an Azure Data Warehouse.

Essentially, we need out data lineage to capture everything all in one place.

And just as important. everything should be as automated as possible. If I quickly create a measure, the data lineage should reflect this with no manual input needed (Unless you want to add some description to the new measure as to why it was created)

Azure Fundamentals training (Quick Notes)

I’ve been spending some time revising for the Azure Fundamentals course. Here is a quick list of some of the  more problematic test questions I’ve come across

Azure Advisor

  • Detects threats and vulnerabilities
  • Ensures Fault Tolerance
  • Helps reduce spending
  • Protects data from accidental deletions
  • Speeds up your apps

Application Gateway

Multiple instances of a web application are created across three availability Zones. The company also configures a networking product to evenly distribute service requests based on 3 different URL’s

Application insights

  • Feature of Azure monitor
  • Visually analyse telemetry data

ATP (Azure threat Protection)

  • Pass the ticket – Attacker stealing KERBEROS data
  • Pass the hash – Attacker stealing NTLM data
  • Suspected Brute Force attack – Multiple attempts to guess a users password

Compliance

http://servicetrust.microsoft.com – Compliance manager URL

  • Audit Reports – Service is within the trust Portal to determine Azure Compliance with GDPR
  • Compliance manager – Determines whether or not your services meet industry standards
  • GDPR – Standards enforced by a government Agency
  • Germany – Country with a dedicated trustee for customer data.
    • Physically isolated instance of Azure
  • Azure government – Only available in the US
  • ISO- International Standards based on non reg agency
  • NIST – Standard based none reg agency based in the United States
    • National Institute of Standards and technology

Cloud Shell, CLI and Powershell

Azure CLI

  • Az login
  • Cross platform command based line tool

Azure Cloud Shell

  • New-AzureRmVM
  • Web based tool after you log onto the Azure portal

Azure Powershell

  • Connect -AzureRMAccount
  • Use when you need to log into Azure without opening a web browser

Azure Governance

  • Locks – Prevent users from deleting resources
  • Advisor – Use information from the Security center to best practices
  • Initiatives – Define a set of policies

Cloud Computing terms

  • Fault Tolerance – Power Outage in a data center. Automatic Failover for continual operation
  • High Availability – Having data available when you need it

Fault tolerance and High Availability are both good for the scenario when you are moving on premise data centers to the cloud. The data is mission critical, there is a need for access to the data sources at all times. Changes are incremental and easy to predict.

  • Elasticity – Sudden spikes in traffic
  • Scalable – Increase the Number of VMs easily

Azure Locks

  • Multiple Locks applied to different scopes. The most restrictive lock is applied
  • The lock applies to all resources contained in a scope and any new resources added to the scope

Networking

  • NSG – Network Security Group. Inbound traffic for a virtual machine from specified IP addresses
  • DDoS- Distributed Denial of Service Prevents a flood of HTTP traffic to a VN that hosts IIS
  • Firewall – Create a rule that restricts network traffic

RBAC

Limit Access to Resources at the resource groups and resource Scope

Service Health

  • Notifies if App service usage exceeds the usage quota
  • Respond to planned Service outages
  • Implement a web hook to display health incidents

Azure – Data Factory – changing Source path of a file from Full File name to Wildcard

I originally had one file to import into a SQL Database  Survey.txt

The files are placed in Azure blob storage ready to be imported

I then use Data Factory to import the file into the sink (Azure SQL Database)

However, the data is actually in one worksheet a year. For full logic I need to be able to add a worksheet to the blob storage to get it imported and each worksheet will contain the year.

This means I need to change the Source and Pipeline in Data Factory

First of all remove the file name from the file path. I used 1 file to set up the Schema. All files are the same so this should be OK.

Next I go to the Pipeline and set up the Wildcard in here Survey*.txt

When the Pipeline is run, it will take all worksheets against for example Survey

Create your website with WordPress.com
Get started