Azure Data Factory – Moving from Development to Production

When working on larger projects we need to merge changes from Developers. When all the changes are in the central branch we can then have an automated process to move development to Production

Smoke tests

In computer programming and software testing, smoke testing is preliminary testing to reveal simple failures severe enough to, for example, reject a prospective software release

Integration testing

Integration testing is the phase in software testing in which individual software modules are combined and tested as a group. Integration testing is conducted to evaluate the compliance of a system or component with specified functional requirements. It occurs after unit testing and before validation testing

Resources Involved with the current Project

  • Azure DevOps
  • Azure SQL Server
  • Azure SQL Database
  • Azure Data Factory
  • Azure Data Lake Gen 2 Storage
  • Azure Blob Storage
  • Azure Key vault

Each resource has its own specific requirements when moving from Dev to Prod.

We will be looking at all of them separately along with all the security requirements that are required to ensure that everything works on the Production side

This post specifically relates to Azure Data Factory and DevOps

Azure data factory CI/DC Lifecycle

GIT does all the creating of the feature branches and then merging them back into main (Master)

Git is used for version controlling.

In terms of Data Factories, you will have a Dev Factory, a UAT factory (If Used) and a Prod Data factory. You only need to intergrate your development data factory with GIT.

The Pull request merges feature into master

Once published we need to move the changes  to the next environment, in this case Prod (When ready)

This is where Azure Devops Pipelines come into play

If we are using the Azure Devops Pipelines for continuous development the following things will happen

  • The devops Pipeline will get the powershell script from the master branch
  • The get the ARM template from the publish branch
  • Deploy the Power Shell script to the next environment
  • Deploy the arm template to the next environment

Why use Git with data Factory

  • Source control allows you to track and audit changes
  • You can do partial saves when for example you have an error. Data Factory wont allow you to publish but with Git you can save where you are and then resolve issues another time
  • It allows you to collaborate more with team members
  • Better CI/CD when deploying to multiple environments
  • Data Factory is time times faster with a GIT back end that it is when authoring against the data factory service because resources are downloaded from GIT
  • Adding your code into Git rather than simply into the Azure Service is actually more secure and faster to process

Setting Up Git

We already have an Azure Devops Project with Repos and Pipelines turned on

We already have an Azure Subscriptions and Resource Groups for both Production and Development Environments

There is already a working Data Factory in development

In this example Git was set up through the Data Factory management hub (the toolbox)

DevOps Git was used for this project rather than GitHub because we have Azure DevOps

Settings

The Project Name matches the Project in Devops

The Collaboration branch is used for Publishing and by default it’s the master branch. You can change the setting in case you want to publish from another branch.

Import existing resources to repository means that all the work done before adding Git can be added to the repository.

Devops is now set up.

Close Azure Data Factory so we can reopen it again to go through the GIT process (If it is open)

Where to find your Azure DevOps

You should now be able to select your own Area in Devops / Repos and select the created project within Azure Devops and Repos

You will need an Account in Azure DevOPs Click on Repos and that account must be higher than Stakeholder to access Repos

You can then select the Project you have created

Using Git with Azure Data Factory

In Azure Open up your Data Factory

Git has been Enabled (Go to Manage to review Git)

The master branch is the main branch with all the development work on it

We now develop a new Feature. Create a Feature Branch + New Branch

We are now in the feature branch and I am simply adding a description to a Stored Procedure Activity in the pipeline. However this is where you will now do your development work rather than within the master

For the test, the description of a Pipeline is updated. Once completed my changes are in the feature 1 branch. I can now save my feature

You don’t need to publish to save the work. Save all will save your feature, even if there are errors

You can go across to Devops to see your Files and history created in Azure Devops for the feature branch (We will look at this once merged back into Production)

Once happy Create the pull request

This takes you to a screen to include more details

Here I have also included the Iteration we are currently working on in Devops Boards.

A few tags are also added. Usually, someone will review the work and will also be added here.

The next screen allows you to approve the change and Complete the change

In this case I have approved. You can also do other things like make suggestions and reject

Completing allows us to complete the work and removes the feature branch. Now all the development in the feature branch will be added to the main, master branch.

In Data Factory, go back to the master branch and note that your feature updates are included

We now publish the changes  in Master Branch which creates the adf publish branch. This publish branch creates the ARM template that represents the pipelines, linked services, triggers etc.

Once published, In Devops Repos , there are now files to work with

You can see your change within the master branch

(The changes would normally be highlighted on the two comparison screens)

Here we open the Pipelines folder got the compare tab and find the before and after code

And you can also see your history

The Arm templates is in the adf_Publish branch, if you select this branch

Once done we need to move the changes  to the next environment, in this case Prod (When ready)

This is where Azure Devops Pipelines come into play

Continuous Development using Azure DevOps

We need another Data Factory object to publish changes to

In this case, the Production has been created with Azure Portal within the Production Subscription and Production Resource Group

Git Configuration is not needed on the Production resource. Skip this step

Create your tags and Review and Create

DevOps Pipelines

For this specific Project, We don’t want to update production automatically when we publish to Dev. We want this to be something that we can do manually.

Go to Pipelines and create a new release Pipeline (In DevOps)

Click on Empty job because we don’t want to start with a template

And because for this project there is no UAT, just Production name the Release Pipeline Prod

Click on the X to close the blade

We need to sort out the Artefact section of the Pipeline
 
Click on Add an Artefact and choose an artefact from Azure Repos
 
We may as well add adf_Publish branch which contains the ARM templates
And the Master branch

the Source alias was updated with _adf_publish

Both Pipelines are Azure Repos artefacts

Next We move to Prod and Start adding tasks

Click on 1 job, 0 tasks to get to tasks

Click + against Agent Job to add the task Our task is for ARM Template deployment


Click Add

Then click on the new Task to configure

The first section is where you select your production environment

Next you need to select the ARM template and the ARM template parameters file. These are always updated in the Devops artefact everytime you publish to dev.

The JSON templates are in the adf_publish branch

Now you need to override the template parameters because these are all for Dev and we need them to be production.  These are:

These will be specific to your own data Factory environment. In this instance we need to sort out the information for the Key vault and data lake storage account

factoryName

This one is easy. The only difference is changing dev to prd

AzureDataLakeStorageGen2_LS_properties_typeProperties_url

The Production Data lake Storage account must be set up in dev and prod before continuing. Go to this Storage account resource

This information is also stored in our Key Vault as a secret which we can hopefully use at a later date.

It is taken from Storage Account, Properties. We want the Primary endpoint Storage for the data lake

Copy the Primary Endpoint URL and override the old with the new Prod URL in DevOps

AzureKeyVault1_properties_typeProperties_baseUrl

We need to update https://dev-uks-project-kv.vault.azure.net/

Lets get this overridden. We already have a Key vault Set up in production. Get the URI from Overview in the Production Key Vault Service

and lets add this into our DevOps parameter

AzureDataLakeStorageGen2_LS_accountKey

This is empty but we could add to it later in the process.

Account keys are the kind of things that should be kept as secrets in Key vault in both Dev And Prod

Lets get them set up. Just for the time being, lets ensure we have the Data Lake storage account key within our development and Production Key vaults

Key Vault.   

Within Key vault in development create a secret with the nameAzureDataLakeStorageGen2LSaccountKey

And the key from the storage account comes from……

And Repeat for Production Key vault

For the time being through lets leave this blank now we have captured the information in the key vault. It should come useful at a later date

AzureSqlDatabaseTPRS_LS_connectionString

This was also empty within the parameters for dev.

You can get the connection string value by going to your SQL data base. Connection Strings. PHP and finding the try statement

And here is the Connection String value for production

Server=tcp: prd-uks-project-sql.database.windows.net,1433; Database= prd-uks-project-sqldb;

You can also add this information into Key Vault as a secret and repeat for Production

For the first instance we are going to leave empty as per the dev parameters. At some point we should be able to set up the Security Principal so we can change the hardcoded values to Secrets

The parameters created in dev are now overridden with the production values

The Pipeline is then named

Create a release

Once Saved. Click back on Releases

For this tye of release we only want to do it manually

Create a Release for our very first manual release

Click back on releases

And click on release 1 to see how it is doing

You can click on Logs under the Stages box to get more information

Now you should be able to go back to the production data Factory and see that everything has been set up exactly like Dev.

Go and have a look at linked Services in the Production data Factory

Note that they are all set with the Production information

We now have a process to move Dev to Prod whenever we want

The Process

Throughout the sprint, the development team will have been working on Feature branches. These branches are then commited into the master pipeline and deployed to Dev

Once you are happy that you want to move your Data Factory across from dev into Prod. Go to DevOps Release pipeline

Create Release to create a new release

It uses the Artefact of the Arm template which is always up to date after a publish.

This will create a new release and move the new information to Prod

All your resources will be to be able to quickly move from Dev to Prod and we will look at this in further posts

Create your website with WordPress.com
Get started