Azure SQL Database – Debbies Microsoft Power BI, SQL Fabric and Azure Blog

January 26, 2021

Azure SQL Database. Publishing from Development to Production Part 2

The Dev to Prod Process

Initially in part one we set up the process with Visual Studio 2019 and Devops and moved all our objects across to Production. Then with Data Factory we moved data from the production data lake into Production Azure SQL DB

We have all the Source data in a data lake (and its been assured that the Production data lake is the same as the Development data Lake)

We have a Data Factory in Production that goes through a DevOps release pipeline so we should now be able to use the Production Data Factory to Load all the Production Data into the Production SQL database on a regular basis.

What happens when you already have Objects and data in Target?

Only the changes will be released. So the next time you release into production you are releasing the delta

Lets see this in action

The Initial SQL database was released into Production with Visual Studio
A Production Data Factory moved all the data into the new Objects
Now we have an updated Dev SQL Database.

Open your visual Studio Project

Open the project that was created in the last session.

In SQL Server Object Explorer

You have the Azure Server and Database. Right click and Refresh

And you have the Local Project DB which contains all the Objects. We can Check the Schema differences between the project DB and the latest DB within Azure

Go to the dev SQL Database (Click on the icon to the left of the server to open it up)

on the Azure SQL Database. Right click and Choose Schema Compare

For Select Target , select Select Target

Open the Local DB Project. The Database in this case is the same name as your Solution in Solution Explorer. (Now I know I should have given my solution the project name and my Project an extension of _DB to differentiate the two)

Click OK

Click Compare.

Now you get to see what has been deleted. in this case a table and a Procedure has been dropped

Next we can see changes. If the Table Change is clicked on, you get more information about that change in the object definitions. In this case a test column has been added.

This creates a quandary when it comes to loading in the data because this table should be fully populated but the new column will be blank. Is it possible to do a full load for these updated tables with Data Factory, OR do we need to look at something more complex?

And finally additions. In this case there are lots of new Tables procedures and two new functions.

Once happy Click Update and your changes will be published into Solution Explorer

To Check, have a look for some of the new tables, SPs etc in Solution Explorer

Once completed you can click the x icon to close the comparison window and you can save your Comparison information

Rebuild the Project in Visual Studio

Now we want to Rebuild our Project within Solution Explorer

Right click on the Project in Solution Explorer and choose Rebuild. this rebuilds all the files.

Rebuild rebuild your entire project
Build just rebuilds on the changes

Process your Changes with GIT

Now its in your project you need to process those changes with GIT

in Git changes. Commit all and Push

And remember to add a message

This image has an empty alt attribute; its file name is image-94.png

These objects should now be in Devops. You can go to Devops Repos. then to your database specific project and check for new tables, SPs etc

My new junk dimension object are there so this is all working.

Release the new database objects into Production database

now all the code is in the repos we can push the new and updated objects into production with a DevOps Release Pipeline.

There is already data in my production database. As as initial starting point I do a quick check on a few tables to get a feel of the data.

This SQL Script allows you to do a quick check on row counts in the production database

SELECT
QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.name) AS [TableName]
, SUM(sPTN.Rows) AS [RowCount]
FROM
sys.objects AS sOBJ
INNER JOIN sys.partitions AS sPTN
ON sOBJ.object_id = sPTN.object_id
WHERE
sOBJ.type = 'U'
AND sOBJ.is_ms_shipped = 0x0
AND index_id < 2 -- 0:Heap, 1:Clustered
GROUP BY
sOBJ.schema_id
, sOBJ.name
ORDER BY [TableName]
GO

Azure DevOps

Choose the database repository (You should also have a repository for data factory)

Build Pipelines

Go to Pipelines. Before releasing to Prod we actually have to build all our code into an artifact for the release pipeline

Click on your Project _DBComponentsCI pipeline (Continuous integration) set up in Part 1

Lets remind ourselves of this pipeline by clicking Edit

This image has an empty alt attribute; its file name is image-188.png

We build the solution file from the Repos in devops. Then Copy files to the staging directory. Finally publish the artifact ready for release.

Come out of Edit and this time choose Run Pipeline

And Run.

Once its run, there are warnings again but for the time being I’m going to ignore these

Otherwise the build pipeline runs successfully.

Release pipeline

Now we have rebuilt the artifact using the build pipeline, go to pipelines, Releases and to your DB_CD release (continuous delivery)

We have a successful release. I can run the above SQL and check for differences. for a start, there were 39 objects and now there are 43 so you can immediately see that our production database has been updated.

The row count shows that we haven’t lost any data. We have simply updated the objects.

Part three will allow us to look more in-depth at the data side. How we deal with the data in Dev and Prod

March 2, 2020

Data Factory to move daily snapshot data in Azure SQL DB to files in a Gen2 Data Lake

I have the following problem to resolve (And this was my initial attempt at figuring out a solution)

I have a data source with data that contains a date. the data contains daily snapshot of a record. this means that a record will be in the data set once per day. This will amount to a lot of data and we would rather hold it as files in Azure Data Lake Gen2 Storage

The logic is to pull daily files from the source database into dated files within the Azure data Lake

Once running this will probably pick up a days data because the rest are already created. However on the initial run I want it to pick up all the days that have been loading.

At the minute I have about 6 months of data to load

Tables in Azure SQL Database (destination)

CREATE TABLE [audit].[MemberDailyMetricsDates](
     [DATEUTC] datetime2 NULL
 ) ON [PRIMARY]

This table collects all the dates from the source snapshot table

Remember, the table records the same record every day with any changes to the record.

CREATE TABLE [audit].[IdWatermarks](
     [TableName] nvarchar NOT NULL,
     [WatermarkValue] [bigint] NOT NULL,
     [WatermarkDate] [datetime] NULL,
     [WatermarkDate2] datetime2 NULL
 ) ON [PRIMARY]
 GO

this is where I add the dates from the tables to show where we are.

For example if we add the last lot of data from 02/01/2020 then this value will be stored in the watermark table. I record them in different formats just in case.

CREATE TABLE [audit].[ProcessingMetrics](
     [ID] [int] IDENTITY(1,1) NOT NULL,
     [TableName] varchar NULL,
     [DateProcessed] [datetime] NULL,
     [DateUTC] datetime2 NOT NULL,
     [Duration] [int] NOT NULL,
     [NoRows] [int] NULL,
 PRIMARY KEY CLUSTERED 
 (
     [ID] ASC
 )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
 ) ON [PRIMARY]
 GO
 ALTER TABLE [audit].[ProcessingMetrics] ADD  DEFAULT (getdate()) FOR [DateProcessed]
 GO

We can record meta data from the data factory Pipeline as it runs

Stored Procedures in SQL Database (destination)

/*Debbie Edwards
 17/12/2019 Created initial SP
 Create a watermark table
 EXEC [audit].[audit].[USP_Watermark]    '2014-12-31 00:00:00.000'
 */

 ALTER PROCEDURE [audit].[USP_Watermark] @NullDate datetime2

 AS   

 BEGIN

 IF EXISTS (Select * FROM [audit].[IdWatermarks]  
 WHERE TableName = 'DailyMetrics')

 DELETE FROM [audit].[IdWatermarks]  
 WHERE TableName = 'DailyMetrics';

 --DECLARE @NullDate datetime2 SET @NullDate = '2014-12-31 00:00:00.000'

 WITH CTEDailyMetricsDates (DATEUTC)
 AS
 (SELECT ISNULL(MAX(DATEUTC),@NullDate) FROM [audit].[DailyMetricsDates]) 

 INSERT INTO [audit].[IdWatermarks] 
 (TableName, WatermarkValue, WatermarkDate, WatermarkDate2)
 SELECT 'DailyMetrics',CONVERT(bigint,CONVERT(datetime, MAX(DATEUTC))), MAX(DATEUTC), MAX(DATEUTC)
 FROM CTEDailyMetricsDates
 END

This is the Stored procedure that you run to create the watermark

Currently I’m just running this for one table. You could redesign to run this SP for different tables.

Also note, If there is nothing in the table I am setting a default date to work from @NullDate

/*Debbie Edwards
 18/12/2019 Created initial SP
 Update Processing details
 EXEC [audit].[USP_UpdateMetrics]
 */
 ALTER PROCEDURE [audit].[USP_UpdateMetrics] @TableName varchar(100),   @DateUTC datetime2, 
 @Duration int,  @NoRows int

 AS   

 BEGIN

 INSERT INTO [audit].[ProcessingMetrics]
 (TableName, DateUTC, Duration, NoRows)
 VALUES

 (@TableName, @DateUTC, @Duration, @NoRows)

 END

the Pipeline will run this stored Procedure to add meta data to the ProcessingMetrics table

Data Lake Gen2 Storage Account

Along with the Destination SQL Database you need to have an Azure Data Lake

Quick Set up for the Azure Data Lake. Setting up an Azure Data Lake V2 to use with power BI dataflows in Service (As a data source) …. and Setting up a service principal will give you information on how to:

Set up the Storage Account
Ensure Hierarchical name space is enabled
Create a file system
Grant reader Role to power BI service
Set up the Service Principal (Set up the App Registration)

Set up Data Factory

Now we are ready to set up everything with a Data Factory

create your data Factory and then go to author and monitor

Connections

AzureSQLDatabasereportingdb is the Source Azure SQL Database

Our Destination is a Destination Azure SQL Database AzureSQLDatabase[destination]reportingdb

And we have a gen 2 Data Lake Storage Account AzureDataLakeStorageGen2 (Which will need the Service Principal account setting up in order to use) See setting up a service principal….

DataSets

the Linked Services are in place. Now we can add Data sets

Source Data Set – Azure SQL database

Add an Azure SQL Server Dataset. the first thing we need are two parameters for the TableName and Table Schema

Connect this up to the Source Linked Service and use the parameters to create a table with schema

The Schema will change when ever you use different table parameters so no need to set at this point

Destination Data Set – Azure SQL database

Destination Data Set – Azure data Lake Gen2

First of all, create a FileName parameter. Because we are creating multiple files, each file needs to be renamed to the UTCDate from the Foreach loop later. We need to set this up within the For each rather than here. Therefore will just set the name as a parameter at this point.

What we want to do is add the file into our file system created in the data lake. and we want to add a date to every single file because these will be the snapshots

In add dynamic content

@dataset().FileName

This is where the actual information for Filename from the data set will be added within the copy activity later.

I imported a schema from a test file I already created.

Pipeline 1 Create Member Daily Metric Dates into destination database

Our first part of the process is to create a list of dates in the source data that we dont yet have in the Data Lake. The first time we run it, thats literally every date we have so far in the source data base

Lets have a look at the Activities in the Pipeline

LookupWatermarkOld

I know that my dates are in datetime2 format so I’m using this and changing it to WatermarkValue so all the steps always use WatermarkValue no matter what the format. Here is the query in full

SELECT MAX(WatermarkDate2) AS WatermarkValue From [audit].[IdWatermarks]
WHERE WatermarkDate2 IS NOT NULL
AND TableName = 'DailyMetrics'

We record the table name of the source in the Watermark table. Basically this will tell us what the date we need to use to work from. remember, we set up a default if there is nothing in the dataset to work with which will run everything into the data lake

CopyDailyMetricsDateUTC

This is where we get to the Copy part of the activity. We are simply copying the dates into the table of the snapshots that we haven’t done yet.

Here we add the parameters of the table we want to use from Source. These parameters were set up in the data set section.

Add a Query. We want the Distinct Date from the source table Where the DATEUTC (The column in the table) is greater than WatermarkValue from the Previous Watermark Activity

SELECT DISTINCT DateUTC FROM [dbo].[DailyMetrics]
WHERE DateUTC > ‘@{activity(‘LookupWatermarkOld’).output.firstRow.WaterMarkValue}’

Now we are at the destination data set

The only item in here is DateUTC

USP.Watermark

This will trigger the Watermark Stored procedure and uses the Azure SQL database destination Linked Service

This is everything required for Section 1 that will run all the dates into our table based on snapshots we haven’t yet prepared

Pipeline 2 CopyMemberDailyMetricsDateUTC

Again, lets have a look at these Activities in more detail

LookupDates

this is the Lookup Activity that takes the full record set of Dates from the destination SQL Server

Note that we haven’t ticked First row only because the entire data set is required

CopyDailyFilesIntoDL (ForEach Activity)

We are taking the output.value from our activity LookupDates

Because it only has one column we don’t need to specify anything further. output.value means the entire record set

Sequential – The Items will be read one by one from the data set

there are 2 activities within this foreach loop. Doubly click on the activity to see what is happening in the loop

Again Lets have a look at this in more detail

CopyDailyMetrics

The source is the table in the source database because we want to store everything in daily snapshot files.

The Query is a SELECT FROM WHERE Statement. Select all the columns from the DailyMetric table where the DateUTC in the source data is Equal to DateUTC in @Item which is generated by the ForEach activity

the sink uses the Data Lake. Remember, in the data set we set the Filename parameter and here is where is gets set. Click on the Filename to view Dynamic content

This is Concatenating a file name, the Item UTC Date and .csv

Use @item() to iterate over a single enumeration in ForEach activity . This value is generated by the ForEach activity (In the Pipeline)

We are setting the date as a string in order to use within the file name. Because the For each loop is setting the item for us and we are inside this activity we can create the filename in this copy activity rather than in the data set its self.

We can then simply Import schemas to view the mappings between the source and sink

SPUpdateDetails

Finally, we want to keep a record of everything we are going to be doing in the loop so we can run our stored Procedure to add the meta data into a table in our source database

We can use out SQL destination Linked Service for this

the Stored procedure contains 4 parameters.

We can take the Date from the Item in the foreach loop.

Duration, and NoRows can be added from metadata. See https://docs.microsoft.com/en-gb/azure/data-factory/copy-activity-overview Monitor programmatically

Duration @activity(‘CopyDailyMetrics’).output.copyDuration
No Rows @activity(‘CopyDailyMetrics’).output.rowsRead

the table name is simple DailyMetrics

USP_TruncateMemberDailyMetricsDates

Finally, now that everything is complete we can truncate the date table. Outside of the foreachloop

the max date is held in Watermark which will be used to create the new files Next time (Set in Pipeline 1)

Test your solution

Now we have everything in place we can test each pipeline. By Clicking debug

You can see the files coming in via Azure Table Storage

If you get errors, a great way of debugging is to go into the code. I had an error and after a file nam it had /n/n.

I simply removed this empty row and it worked.

Pipeline 3

There is now a Pipeline 1 to create the dates. And pipeline 2 to create the files. This is good because they can be tested separately.

We need a top level Pipeline to run them

And this is the new model for the flow of this Data factory

Add a trigger

now we have successfully tested the Pipelines we can set them up in a trigger by adding a parent pipeline that runs all the Pipelines in order

Another Post will be created on Adding Triggers

Considerations

You need to make sure that the last file (When you run it is complete) In our case the snapshot table is run at 12 AM and takes around 20 minutes so we need to set this pipeline off at 1 AM

Add Alerts to Monitor the Pipeline

https://azure.microsoft.com/en-gb/blog/create-alerts-to-proactively-monitor-your-data-factory-pipelines/

Another Post will be created on Adding Monitors

January 10, 2020January 10, 2020

Migrating Adventureworks Data Warehouse from Local Machine (SQL Express) to Azure SQL Server Database

For training courses and general lets have a look at updates to Power BI, an Adventureworks database instance has been set up on my machine, complete with 10 years of data that is great for using for training courses.

However, this database needs moving from its local machine to Azure

Previously bacpac files have been created to import into Azure via SQL Server Management Studio but they have always errored.

Its time to have a look at Azure Data Migration

Source Database

Product – Microroft SQL Server Express (64bit)
Operating System – Windows 1 Pro
Version: 14.0.2027.2
Release: SQL Server 2017

Attempt 1

Microsoft Data Migration Assistant

The Microsoft Data Migration Assistant allows you to upgrade to Azure by detecting compatibility issues.

The first thing to do is to download the msi package and then you can get started

Open the Data Migration Assistant

Click New to create a new Project

Then click create

The server would be connected to using Windows Authentication. However there are errors for this. the credentials must have control server permission.

In SSMS go to Security and Logins to find your own user Login.

Switch to the master database and run the following SQL

GRANT CONTROL SERVER TO [User Name];

Unfortunately the error comes back Securable class ‘server’ not supported in this version of SQL Server.

SSMS is not able to Migrate Express Databases

Attempt 2

b

Move to your Azure connection in SSMS and right click on databases Import Data Tier Application

go through the Import Data Tier guide

Currently we are moving to an S2 model. (Purchase model DTU) which is around £54 a month. When its not being used I will change it to S0, around $10 a month.

There are compatibility issues happening between the Azure database and the SQL Express data base.

The first is regards to COLUMNSTORE. Apparently COLUMNSTORE support is available in S3 and above

S3 costs around £109 pounds a month, twice the cost of S2.

The Solution

Simply Import as S3 and then let Azure Scale to another Standard option by clicking on Pricing tier

You could also move to vCore purchasing options too which is the preferred route as it offers more personalisation.

The DTU-based purchase model are differentiated by a range of compute sizes with a fixed amount of included storage, fixed retention period for backups, and fixed price.

The virtual core (vCore) model provides several benefits:

Higher compute, memory, IO, and storage limits.
Control over the hardware generation to better match compute and memory requirements of the workload.
Pricing discounts for Azure Hybrid Benefit (AHB) and Reserved Instance (RI).
Greater transparency in the hardware details that power the compute; facilitates planning for migrations from on-premises deployments.

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu

I can finally move my Express database into Azure and manage the scaling. All sorted.

However you should be aware that if its Under S3 the index gets disabled and Power BI cant use it in this state

So the question is, If you are importing the data into power BI, will you be happy to keep the database at S3 level when its in use OR would be be OK with removing the columnstore index?

this is something to think about……