DevOps Organisation Settings

Lets have a look in a little more detail at the Settings for the DevOps Organisation Level and at what the Organisation is.

When setting up DevOps, it created at Organisation level of [My Name] , and another Organisation was then created manually for the company

Organisations can be treated as accounts and each organisations has its own URL. You will also hear organisations being called Collections

You must always start with one organisation but you can have multiple.

Go to Organisation Settings

General

Overview

This gives you the overview of the Devops organisation and there are a few options in here to be aware of

Privacy URL

Your Privacy URL can be added here to link to your organisations document describing how we handle internal and external guest data. If you have a public website or app you are required to have a dedicated Privacy policy URL.

Q Do you have a privacy document already in place?

https://www.freeprivacypolicy.com/blog/privacy-policy-url/

Organisation Owner

the owner is set against the person who created the organisation account but this can be changed.

Projects

New Projects can be set up here. See https://debbiesmspowerbiazureblog.home.blog/2020/03/06/create-a-devops-project/

Users


A good area to find out the Access levels for each user

  • Stakeholder: Partial access and can be assigned to unlimited users for free
  • Basic: Provides access to everything but test plans. Up to 5 users free, then £4.48 per month
  • Basic + Test Plans: Includes test plans. £38.76 per month
  • Visual Studio subscription: For users with a Visual Studio Subscription and features are enabled based on Visual Studio Enterprise, Professional Test Professional or MSDN Platform

Group Rules

DevOps includes group based licensing for Azure Active Directory (Azure AD) Groups and Azure DevOps groups.

Azure Active Directory Groups are created in Azure Active Directory


DevOps Groups and Collection Level Groups can be found within the Permissions section so we can look at this in more detail later.

Add a group Rule to assign an access level or extention to the group and resources in Azure DevOps are assigned to all members of the group.

When users leave a group the licenses are freed and returned to your pool.

Imagine this scenario

It will be easier to add the following Groups

  • Project A Contributor Group and add Debbie and Jo
  • Project A Reader Group and add Tess
  • Project A Administrator Group and add Sarah
  • Project B Reader Group and add Debbie and Tess
  • Project C Contributor and add Jo
  • Project C Administrator and add Debbie

To manage licenses and group rules, you must be a Project Collection Administrator (PCA) for the organization. Check this within Security > Permissions

Click Add a group rule

Both users within this new group have a visual Studio account so Group Level is set to to Visual Studio. However with Group Rules you assign the Access level for the Users at Group Rule Level

At this point I also added a new Group.

You can click on the … Button after Last Evaluated to get to either Manage Group Rules or Manage Members

These are the two rules for this group

What happens when your users have a mix of access levels within the one User Group?

The Users get their Access Levels from the Rule so their Access level would be reset

Billing

Before Looking at Billing you can use the Azure Calculator to get more of a feel of what you need

https://azure.microsoft.com/en-gb/pricing/calculator/?service=azure-devops

Note that you are paying for 5 basic plans, the first 5 are free but the 10 developers who need the Basic + Test Plans License are the ones adding to the monthly cost


Why would you up Additional Parallel CI/CD jobs on either Microsoft Hosted of Self hosted Pipelines?

See Parallel jobs for more information

Azure DevOps Billing is through Azure and because Billing has not yet been set up we only have access up to the Free Tier limits

If you click on Set up Billing You need to choose an Azure Subscription to add it too

You cant add it to your Personal credits because of the spending Limit caps. Once set up you can then manage the paid access for your users, bearing in mind the free usage tier

https://docs.microsoft.com/en-us/azure/devops/organizations/billing/buy-basic-access-add-users?view=azure-devops

Auditing

Allows you to see all the audit-able events and you can export the log and filter on specific events

Global Notifications

You will recieve notifications for lots of Actions in DevOps like when a build completes or a Deployment approval is pending. You can configure the organisation notifications from here

Usage

Usage allows you to see whats been going on in DevOps


You can also filter the information and choose your columns that you are interested in

You can Select

  • TFS (Any application within the Organisation Service account, e.g. TFS https://dev.azure.com/Organisation/)
  • Release Management (Any application within Release Management Service)
  • Analytics (Any application within the Analytics Service)

Statuses

  • Normal
  • Delayed
  • Blocked

And time period, for example, the last 7 days.

Extensions


You can browse the marketplace for additional DevOps Services. Like Analytics above to gain insight into the health and status of your DevOps Projects

Just Searching for Git brings up lots of free Git Services that you can use

Azure Active Directory

When DevOps was accessed, an email was used that was connected to a Tenant in Azure because we use Office 365

Office 365 uses Azure Active Directory (Azure AD) to manage user identities behind the scenes. Your Office 365 subscription includes a free subscription to Azure AD so that you can integrate Office 365 with Azure AD if you want to sync passwords or set up single sign-on with”

Because of this, Azure DevOps connected up the Azure Active Directory

Security

Policies

If we never want to allow Public Projects we can simply set the above policy to Off.

We can also dis-allow external guest access through policies

Permissions

Project Collection Administrator Has permissions to administer build resources and permissions for the collection

A collection is a container for a number of projects in Azure DevOps and corresponds to the Organisation.

Lets set up another Project Collection Administrator

We want everyone in the POC Admin Group to also be a Project Collection Administrator

The DevOps Groups that have been created and Collection Level (Enterprise Level) Groups can all be seen here

You can also look at the users and access all the Settings per user from the User tab.

Boards

Process

Gives you more information on what will be available on the board for each type, e.g. Agile, Scrum etc.

Pipelines

Agent Pools

DevOps Pipeline Pools are scoped to the entire organisation so they can be shared across Projects. You don’t need to manage Agents individually. You can organise them into Pools

An agent is installable software that runs one job at a time.

With Microsoft hosted agents, every time you run a pipeline you get a fresh virtual machine which is discarded after use

Settings

This is an example status badge in Boards

You can set variables at queue time unless the above option is enabled. Only variables Settable at queue time can be set if its limited

By default, the collection (Enterprise)-scoped identity is used, unless the Limit job authorization scope to current project is set in Project Settings > Settings.

Deployment Pools

An agent pool (Above) defines the sharing boundary for all agents in that pool. Whereas deployment Pools are about deployment target machines that have agents installed on them.

Deployment pools are convenient when you want to deploy a project to different machines

Deployment groups represent the physical environments; for example, “Dev”, “Test”, “UAT”, and “Production”.

The deployment group is a layer over the deployment pool which makes the targets available to the release definitions within a project

Parallel Jobs

As we have already established. The free tier allows you to have 1 parallel job but What is a parallel job?

A job is a set of one or more build tasks that run sequentially on the same target.

You could have for example a project that contains a web app and a Web API. They dont need to be run sequentially, they can be run in parallel to save time

Each App type can equate to 1 job within a build definition. Each build definition as its own build pool/agent

If you have two licensed build pipelines you could run them concurrently which will decrease overall build time.

You could also split up tests across jobs to be executed across parallel builds which will reduce the overall time

You need to look at your Projects and what you are building and see if Parallel jobs are relevant to you, or if you are happy for the agents to run your builds sequentially

Oauth Configurations

Register and Authorise your app

https://docs.microsoft.com/en-us/azure/devops/integrate/get-started/authentication/oauth?view=azure-devops

Artifacts

Storage

Check the storage sizes of your artifacts. Remember you get 2 GB free for your artifacts.

Hopefully this gives you a little more information of each sections of your organisational settings. You need to understand at a basic level

  • Who your teams will be and what they will require (Set Access Levels, Groups and Rules up)
  • What you will be creating and can they be built in parallel?
  • what environments will you be releasing too

Each section can then be looked at in much more detail

Create a DevOps Project

At this point we know that we are using DevOps Service rather than the on premesis DevOps Server so go to https://azure.microsoft.com/en-us/services/devops/?nav=min

To get to your initial DevOps page

Which gives you three routes in. Do you…

  • Start Free?
  • Start Free with GitHub?
  • Or Sign in to Azure DevOps if you already have an account?

At this point, we don’t have an account

Making Sense of DevOps Pricing

Start Free or Start Free with Github

Start free with GitHub

Use this option if you already have a GitHub account

In this example Start for free option is used. Because [I] have an azure account, DevOps is already logged in and it knows what my tenant is.

At this point there is no suggestion of a 30 day trial or any other information. so for the time being lets get started by adding a project (This is because I have a Visual Studio account. This may be different for uses without a Visual Studio Subscription)

Also Note the level https://dev.azure.com/debbieedwards

This is the organisation level and this is my own personal devOps account. What happens if you want to set up a New Organisation to connect related projects and scale out to enterprise Level?

Select New organisation

And New Project

the New Organisation level has been set up with the Companies name

I can now start working with DevOps at an organisational level. We can have 5 basic users for free so for the time being, this is what we will stick to.

We know we want a Private DevOps area

Version Control Git has been selected because this seems to be the one that other team members are the most comfortable with

Work Item Process Agile, Basic, CMMI, Scrum

The default is Agile but lets have a quick look at each of these processes

Basic

The simplest model.

Agile

Agile includes scrum, Works great if you want to track user stories and bugs on a kanban board

Scrum

Supports the Scum methodology. Really good for tracking backlog items and bugs on a kanban board.

CMMI

if your team follows more formal methods use CMMI (capability Maturity Model Integration.

For this Project, the Agile approach is selected

Invite a user into the Project.

Under 5 users and DevOps is free. We wont be doing anything with test plans at the moment so lets add one other user into this area

DevOps Access Levels

  • Stakeholder can be assigned to users with no license or subscriptions and need a limited set of features
  • Basic Provides access to most features. up to 5 users is free
  • Basic + Test Plans The user has access to test plans but this costs around £34 a month extra and you dont get any free accounts
  • Visual Studio Subscription – Assign to users with a Visual Studio Subscription

Check Levels of Access


Go back up to the organisational level and select Organisation settings in the bottom left hand corner of the screen


Next go to users and you can check the settings. Note that one account is under a Visual Studio Enterprise Subscription so this is assigned to users who already have a Visual studio Subscription.

We have 1 basic account which will be free at the moment

Checking Spending

Obviously the one thing you want to do is check that you aren’t spending money unnecessarily.

Still in Organisation Settings

Currently there is no billing applicable but this section will need a more detailed how to later on

Checking what has been done at organisation level

Here you can see that Tess has been added by myself to a Group and her access was set to basic.

There should be a post coming a long soon that will look into Organisational Setting in more detail

We have created a project and set up new users with Basic level accounts.

Getting back to your Project

Close Devops down and then re open


This time you can sign in and go straight to your new project

Next time we will start using some of the services on offer like Boards, Repos, pipelines, test Plans and Artifacts

Making Sense of Pricing for Azure DevOps to get started

You decide that Azure DevOps is the way to go because you want to make use of all the features. Specifically

  • Azure pipelines to build and release code
  • Boards to do all your planning
  • Repos so you can use, for example GiT as your code repository
  • Artifacts to share packages across projects
  • Test Plans to help you test what you have built

Azure DevOps Services Costings

DevOps is free for Open Source projects and small projects up to 5 users

https://azure.microsoft.com/en-gb/pricing/details/devops/azure-devops-services/

Azure DevOps Services

Individual Services

Taking Azure DevOps Services as the starting point, The first area to look at is Individual Services


There are only two individual services to choose from. Pipelines and Artifacts. this would be useful if you choose if you simply want to be able to build code and release it into specific environments and save your artifacts for use across projects

CI/CD -Continuous Integrations and Continuous Delivery or deployment

Along with these two services there are sliders so you can optimise the services for your requirements and it would be helpful if there were more information about the different options

Azure pipeline Options

First of all we need to understand the Microsoft Hosted will be in the public cloud. The jobs are run on a pool of Microsoft Hosted Agents. Basically, each time a job is run, a fresh VM gets created and then discarded after use. and Self Hosted will be ion premises using self hosted agents. .

With the above option you can only run one job at a time for free that runs for 1,800 mins at a time. And remember that your job is building and releasing code.

lets see what happens with Microsoft hosted if move the Microsoft hosted slider to 10?


Its not clear what this actually means. would you be paying £298 per month at the top end if you had 10 developers using the service concurrently, or do you simply pay for what you use.? So if you don’t go to more than one concurrent job at a time its still free?

Azure Artifact Options

This is clearer. Artifacts are stored so you are paying for storage.

User Licenses

If you want to use all or most of the services you can get an individual user License, Much in the same way that you would but in Power BI Pro per user license.

The only difference between the Basic plan and the Basic + Test plans is Test plans but there is a fairly big price difference.

The question is, how useful is testing plans and can you do with out them? Testing plans will be looked at in more detail later

Basic Plan

If you are happy to go without Test plans its worth looking in more detail at the fine points


However….

Azure Pipelines: Includes the free offer from INDIVIDUAL SERVICES and the free offer is specifically for 1 Free parallel job

Does that mean that even paying 4.48 per license you may have extra charges if you run a job in parallel to another developer?

If two developers are running at the same time who gets hit with the charges?

Could this be understood as being if a user puts two jobs out concurrently and if two users have a job running each, this wouldn’t be charged as its per user?

Artifacts 2 GB free per month and then the assumption is that you move on to pay for extra storage. Is this a pay as you go model?

Basic + Test Plans

The same criteria applies to the plan so the same questions still apply

There are no Free plans with this and the cost is £38.76 per user per month so this assumption is that this plan would only be required for users who will need to test the system?

More information is required in regards to Test plans and are they worth the extra £34.28 a month?

Azure DevOps Server

DevOps Server is the on premises offering built on a SQL Server backend. DevOps Server is a good option when all your services are on premises and you have, for instance Microsoft SQL Server 2019

You can either pay month to month through Azure or buy a 3 year software license.

If you buy through Azure it entitles you to use the cloud service.

With either option you need Windows or Windows Server Licenses for the Servers running Azure DevOps Server 2019

Team Foundation Server is now Azure DevOps Server.

Pricing is not established on the web site so it may need to be a call to the Microsoft Sales team to ensure you get the right fit for your needs.

If you already use the cloud the recommendation will be to go for a DevOps Service

However there are still some questions in relation to this and what is the best option, and can these options be mixed and matched when dealing with different types of users?

Introduction to Azure DevOps

What is Azure?

Taking the first part of Azure DevOps, Azure is Microsoft’s Cloud computing platform. It hosts hundreds of Services in over 58 regions (e.g. North Europe,West US, UK South) and available in over 140 countries.

As you can see, lots of Azure services have already been consumed throughout these blogs. Azure SQL Databases, Azure Data Lake gen2, Azure Blob Storage, Azure Data Factory, Azure Logic Apps, Cognitive Services etc.

Business Processes are split into Infrastructure as a Service Iaas (VMs etc) , Platform as a Service PaaS (See the services above) and Software as a Servie SaaS (Office 365, DropBox, etc)

You can save money by moving to this OpEx model (Operational Expenditure) from the CapEx model (Capital Expenditure) because you pay for what you need as you go, rather that having to spend money on your hardware, software, data centers etc

Cloud Services use Economies of Scale, in that Azure can do everything at a lower cost because its operating at such a large scale and these savings are passed to customers.

On Demand Provisioning

When there are suddenly more demands on your service you don’t have to buy in more hardware etc. You can simply provision extra resources very quickly

Scalability in Minutes

Once demand goes down you can easily scale down and reduce your costs. Unlike on Premises when you have to have maximum hardware requirements just in case.

Pay as you Consume

You only pay for what you use

Abstract Resources

You can focus on your business needs and not on the hardware specs (Networking, physical servers, patching etc)

Measurable

Every unit of usage is managed and measurable.

What is DevOps?

A set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, also ensuring high quality

Testing, Reviews, Moving to production. This is the place where developers and the Operations team meet and work together

Pre DevOps

If we dont work within a DevOps Framework. What do we do?

Developers will build their apps, etc and finally add it into Source Control

Source Control or Version Control allows you to track and manage code changes. Source Control Management Systems provide a history of development. They can help resolve conflicts when merging code from different sources

Once in Source Code the Testing team can take the source code and create their own builds to do testing

This will then be pushed back to the development team and will go back and forwards until everyone is happy. Here we can see that the environments we are using are Dev and Test

Once Complete, it is released into Production.

This is a very siloed approach. Everyone is working separately and things can take time and you will get bottlenecks

The DevOpsApproach

Everyone works together. You become a team and time to market becomes faster. Developers and Operations are working as a single team

DevOps Tools

Each stage uses specific tools from a variety of providers and here are a few examples

  • Code – Eclipse, Visual Studio, Team Foundation Services, Jira, Git
  • Build – Maven, Gradle, Apache Ant
  • Test – JUnit, Selenium
  • Release – Jenkins, Bamboo
  • Deploy – Puppet, Chef, Ansible, SaltStack
  • Monitor -New Relic, SENSU, Splunk, Nagios

We need all these tools to work together so we don’t need to do any manual intervention. This means that you can choose the ones that you have experience in.

Components of Azure DevOps

Azure Boards

People with a Scrum and Project Management background will know how to create the features within the Boards. Epics, Stories, Tasks etc

Developers create and work on tasks. Bugs can be logged here by the testers

Azure Repos

Push the development into Source Control to store your information. Check in your code within Azure Repos.

There are lots of repositories to choose from in Repos to suit your needs like GIT or TFS

Azure Pipelines

Developers build code and that code need to get to the Repos via a Pipeline. The code is built within the Pipeline.

The code is then released into Dev, Test, Prod, Q&A etc, And from, say the test or Dev environments we can……..

Azure Test Plans

Test, using Azure Test plans. For example, if you have deployed a web service, you want to make sure its behaving correctly. Once tested the code will go back to the pipeline to be built and pushed to another environment

Azure Artifacts

Collect dependencies and put them into Azure Artifacts

What are dependencies?

Dependencies are logical relationships between activities or tasks that means that the completion of one task is reliant on another.

Azure Boards

Work Items

The artifact that is used to track work on the Azure board.

  • Bug
  • Epic
  • Feature
  • Issue
  • Task
  • Test Case
  • User Story

So you create work items here and interact with them on the board

Boards

Work with Epics, Features, Tasks, Bugs etc.

Includes support for Scrum (agile process framework for managing complex work with an emphasis on software development) and Kanban (a method for managing and improving work across human systems. Balances demands with capacity)

Backlogs

How do you prioritise your work items?

Sprints

Say your Sprint is 20 days (2 weeks) What work can be accomplished within this sprint?

Dashboards

Overall picture of the particular sprint or release

Repos

We can use GIT or Team Foundation Server TFS. The example uses GIT

  • Files
  • Commits
  • Pushes
  • Branches
  • Tags
  • Pull Requests

You create your own branch from the master branch. Do your testing and changes and push back from your branch to the master branch.

Pipelines

Where is your Code? Its in GiT

Select the GiT source like Azure Repos GIT or GiTHub etc

Get the code from the master branch

How do you want to build the project?

Choose from lots of templates, Azure Web App, ASP.net , Mavern, Ant, ASP.NET, ASP.NET with containers, C# function, Python package, Andriod etc

Next provide the Solution path and the Azure Subscription that you want to deploy to

This takes the source code from the GiT repository and builds the source code.

The build will then give you logs to show you how the build of the project happened

Next time when you check in code, it will automatically trigger the pipeline to build the code

Then the build needs to be Released via a Release Pipeline

This is where you release to the correct Azure Subscription and the code will be deployed. You can also add in approvals to ensure you get the pre approval required to release the code.

Conclusion

This is just a whistle stop tour of Dev Ops. Test Plans and Artifacts haven’t been discussed in much detail but it gives you the basics of what is included in DevOps and how you can start to think about using it.

What do you create in Azure and can it be handled within DevOps?

Can we start using the Boards?

How do we can started with Azure Devops?

Which teams members have the right interests in the specific DevOps areas?

Data Factory to move daily snapshot data in Azure SQL DB to files in a Gen2 Data Lake

I have the following problem to resolve (And this was my initial attempt at figuring out a solution)

I have a data source with data that contains a date. the data contains daily snapshot of a record. this means that a record will be in the data set once per day. This will amount to a lot of data and we would rather hold it as files in Azure Data Lake Gen2 Storage

The logic is to pull daily files from the source database into dated files within the Azure data Lake

Once running this will probably pick up a days data because the rest are already created. However on the initial run I want it to pick up all the days that have been loading.

At the minute I have about 6 months of data to load

Tables in Azure SQL Database (destination)

CREATE TABLE [audit].[MemberDailyMetricsDates](
     [DATEUTC] datetime2 NULL
 ) ON [PRIMARY]

This table collects all the dates from the source snapshot table

Remember, the table records the same record every day with any changes to the record.

CREATE TABLE [audit].[IdWatermarks](
     [TableName] nvarchar NOT NULL,
     [WatermarkValue] [bigint] NOT NULL,
     [WatermarkDate] [datetime] NULL,
     [WatermarkDate2] datetime2 NULL
 ) ON [PRIMARY]
 GO

this is where I add the dates from the tables to show where we are.

For example if we add the last lot of data from 02/01/2020 then this value will be stored in the watermark table. I record them in different formats just in case.

CREATE TABLE [audit].[ProcessingMetrics](
     [ID] [int] IDENTITY(1,1) NOT NULL,
     [TableName] varchar NULL,
     [DateProcessed] [datetime] NULL,
     [DateUTC] datetime2 NOT NULL,
     [Duration] [int] NOT NULL,
     [NoRows] [int] NULL,
 PRIMARY KEY CLUSTERED 
 (
     [ID] ASC
 )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
 ) ON [PRIMARY]
 GO
 ALTER TABLE [audit].[ProcessingMetrics] ADD  DEFAULT (getdate()) FOR [DateProcessed]
 GO

We can record meta data from the data factory Pipeline as it runs

Stored Procedures in SQL Database (destination)

/*Debbie Edwards
 17/12/2019 Created initial SP
 Create a watermark table
 EXEC [audit].[audit].[USP_Watermark]    '2014-12-31 00:00:00.000'
 */

 ALTER PROCEDURE [audit].[USP_Watermark] @NullDate datetime2

 AS   

 BEGIN

 IF EXISTS (Select * FROM [audit].[IdWatermarks]  
 WHERE TableName = 'DailyMetrics')

 DELETE FROM [audit].[IdWatermarks]  
 WHERE TableName = 'DailyMetrics';

 --DECLARE @NullDate datetime2 SET @NullDate = '2014-12-31 00:00:00.000'

 WITH CTEDailyMetricsDates (DATEUTC)
 AS
 (SELECT ISNULL(MAX(DATEUTC),@NullDate) FROM [audit].[DailyMetricsDates]) 

 INSERT INTO [audit].[IdWatermarks] 
 (TableName, WatermarkValue, WatermarkDate, WatermarkDate2)
 SELECT 'DailyMetrics',CONVERT(bigint,CONVERT(datetime, MAX(DATEUTC))), MAX(DATEUTC), MAX(DATEUTC)
 FROM CTEDailyMetricsDates
 END 

This is the Stored procedure that you run to create the watermark

Currently I’m just running this for one table. You could redesign to run this SP for different tables.

Also note, If there is nothing in the table I am setting a default date to work from @NullDate

/*Debbie Edwards
 18/12/2019 Created initial SP
 Update Processing details
 EXEC [audit].[USP_UpdateMetrics]
 */
 ALTER PROCEDURE [audit].[USP_UpdateMetrics] @TableName varchar(100),   @DateUTC datetime2, 
 @Duration int,  @NoRows int

 AS   

 BEGIN

 INSERT INTO [audit].[ProcessingMetrics]
 (TableName, DateUTC, Duration, NoRows)
 VALUES

 (@TableName, @DateUTC, @Duration, @NoRows)

 END

the Pipeline will run this stored Procedure to add meta data to the ProcessingMetrics table

Data Lake Gen2 Storage Account

Along with the Destination SQL Database you need to have an Azure Data Lake

Quick Set up for the Azure Data Lake. Setting up an Azure Data Lake V2 to use with power BI dataflows in Service (As a data source) …. and Setting up a service principal will give you information on how to:

  • Set up the Storage Account
  • Ensure Hierarchical name space is enabled
  • Create a file system
  • Grant reader Role to power BI service
  • Set up the Service Principal (Set up the App Registration)

Set up Data Factory

Now we are ready to set up everything with a Data Factory

create your data Factory and then go to author and monitor

Connections

AzureSQLDatabasereportingdb is the Source Azure SQL Database

Our Destination is a Destination Azure SQL Database AzureSQLDatabase[destination]reportingdb

And we have a gen 2 Data Lake Storage Account AzureDataLakeStorageGen2 (Which will need the Service Principal account setting up in order to use) See setting up a service principal….

DataSets

the Linked Services are in place. Now we can add Data sets

Source Data Set – Azure SQL database

Add an Azure SQL Server Dataset. the first thing we need are two parameters for the TableName and Table Schema

Connect this up to the Source Linked Service and use the parameters to create a table with schema

The Schema will change when ever you use different table parameters so no need to set at this point

Destination Data Set – Azure SQL database

Destination Data Set – Azure data Lake Gen2

First of all, create a FileName parameter. Because we are creating multiple files, each file needs to be renamed to the UTCDate from the Foreach loop later. We need to set this up within the For each rather than here. Therefore will just set the name as a parameter at this point.

What we want to do is add the file into our file system created in the data lake. and we want to add a date to every single file because these will be the snapshots

In add dynamic content

@dataset().FileName

This is where the actual information for Filename from the data set will be added within the copy activity later.

I imported a schema from a test file I already created.

Pipeline 1 Create Member Daily Metric Dates into destination database

Our first part of the process is to create a list of dates in the source data that we dont yet have in the Data Lake. The first time we run it, thats literally every date we have so far in the source data base

Lets have a look at the Activities in the Pipeline

LookupWatermarkOld

I know that my dates are in datetime2 format so I’m using this and changing it to WatermarkValue so all the steps always use WatermarkValue no matter what the format. Here is the query in full

SELECT MAX(WatermarkDate2) AS WatermarkValue From [audit].[IdWatermarks]
WHERE WatermarkDate2 IS NOT NULL
AND TableName = 'DailyMetrics'

We record the table name of the source in the Watermark table. Basically this will tell us what the date we need to use to work from. remember, we set up a default if there is nothing in the dataset to work with which will run everything into the data lake

CopyDailyMetricsDateUTC

This is where we get to the Copy part of the activity. We are simply copying the dates into the table of the snapshots that we haven’t done yet.

Here we add the parameters of the table we want to use from Source. These parameters were set up in the data set section.

Add a Query. We want the Distinct Date from the source table Where the DATEUTC (The column in the table) is greater than WatermarkValue from the Previous Watermark Activity

SELECT DISTINCT DateUTC FROM [dbo].[DailyMetrics]
WHERE DateUTC > ‘@{activity(‘LookupWatermarkOld’).output.firstRow.WaterMarkValue}’

Now we are at the destination data set

The only item in here is DateUTC

USP.Watermark

This will trigger the Watermark Stored procedure and uses the Azure SQL database destination Linked Service

This is everything required for Section 1 that will run all the dates into our table based on snapshots we haven’t yet prepared

Pipeline 2 CopyMemberDailyMetricsDateUTC

Again, lets have a look at these Activities in more detail

LookupDates

this is the Lookup Activity that takes the full record set of Dates from the destination SQL Server

Note that we haven’t ticked First row only because the entire data set is required

CopyDailyFilesIntoDL (ForEach Activity)

We are taking the output.value from our activity LookupDates

Because it only has one column we don’t need to specify anything further. output.value means the entire record set

Sequential – The Items will be read one by one from the data set

there are 2 activities within this foreach loop. Doubly click on the activity to see what is happening in the loop

Again Lets have a look at this in more detail

CopyDailyMetrics

The source is the table in the source database because we want to store everything in daily snapshot files.

The Query is a SELECT FROM WHERE Statement. Select all the columns from the DailyMetric table where the DateUTC in the source data is Equal to DateUTC in @Item which is generated by the ForEach activity 

the sink uses the Data Lake. Remember, in the data set we set the Filename parameter and here is where is gets set. Click on the Filename to view Dynamic content

This is Concatenating a file name, the Item UTC Date and .csv


Use @item() to iterate over a single enumeration in ForEach activity . This value is generated by the ForEach activity (In the Pipeline)

We are setting the date as a string in order to use within the file name. Because the For each loop is setting the item for us and we are inside this activity we can create the filename in this copy activity rather than in the data set its self.

We can then simply Import schemas to view the mappings between the source and sink

SPUpdateDetails

Finally, we want to keep a record of everything we are going to be doing in the loop so we can run our stored Procedure to add the meta data into a table in our source database

We can use out SQL destination Linked Service for this

the Stored procedure contains 4 parameters.

We can take the Date from the Item in the foreach loop.

Duration, and NoRows can be added from metadata. See https://docs.microsoft.com/en-gb/azure/data-factory/copy-activity-overview Monitor programmatically

  • Duration @activity(‘CopyDailyMetrics’).output.copyDuration
  • No Rows @activity(‘CopyDailyMetrics’).output.rowsRead

the table name is simple DailyMetrics

USP_TruncateMemberDailyMetricsDates

Finally, now that everything is complete we can truncate the date table. Outside of the foreachloop

the max date is held in Watermark which will be used to create the new files Next time (Set in Pipeline 1)

Test your solution

Now we have everything in place we can test each pipeline. By Clicking debug

You can see the files coming in via Azure Table Storage

If you get errors, a great way of debugging is to go into the code. I had an error and after a file nam it had /n/n.

I simply removed this empty row and it worked.

Pipeline 3

There is now a Pipeline 1 to create the dates. And pipeline 2 to create the files. This is good because they can be tested separately.

We need a top level Pipeline to run them

And this is the new model for the flow of this Data factory

Add a trigger

now we have successfully tested the Pipelines we can set them up in a trigger by adding a parent pipeline that runs all the Pipelines in order

Another Post will be created on Adding Triggers

Considerations

You need to make sure that the last file (When you run it is complete) In our case the snapshot table is run at 12 AM and takes around 20 minutes so we need to set this pipeline off at 1 AM

Add Alerts to Monitor the Pipeline

https://azure.microsoft.com/en-gb/blog/create-alerts-to-proactively-monitor-your-data-factory-pipelines/

Another Post will be created on Adding Monitors

Design a site like this with WordPress.com
Get started