When working on larger projects we need to merge changes from Developers. When all the changes are in the central branch we can then have an automated process to move development to Production
In computer programming and software testing, smoke testing is preliminary testing to reveal simple failures severe enough to, for example, reject a prospective software release
Integration testing is the phase in software testing in which individual software modules are combined and tested as a group. Integration testing is conducted to evaluate the compliance of a system or component with specified functional requirements. It occurs after unit testing and before validation testing
Resources Involved with the current Project
- Azure DevOps
- Azure SQL Server
- Azure SQL Database
- Azure Data Factory
- Azure Data Lake Gen 2 Storage
- Azure Blob Storage
- Azure Key vault
Each resource has its own specific requirements when moving from Dev to Prod.
We will be looking at all of them separately along with all the security requirements that are required to ensure that everything works on the Production side
This post specifically relates to Azure Data Factory and DevOps
Azure data factory CI/DC Lifecycle
GIT does all the creating of the feature branches and then merging them back into main (Master)
Git is used for version controlling.
In terms of Data Factories, you will have a Dev Factory, a UAT factory (If Used) and a Prod Data factory. You only need to intergrate your development data factory with GIT.
The Pull request merges feature into master
Once published we need to move the changes to the next environment, in this case Prod (When ready)
This is where Azure Devops Pipelines come into play
If we are using the Azure Devops Pipelines for continuous development the following things will happen
- The devops Pipeline will get the powershell script from the master branch
- The get the ARM template from the publish branch
- Deploy the Power Shell script to the next environment
- Deploy the arm template to the next environment
Why use Git with data Factory
- Source control allows you to track and audit changes
- You can do partial saves when for example you have an error. Data Factory wont allow you to publish but with Git you can save where you are and then resolve issues another time
- It allows you to collaborate more with team members
- Better CI/CD when deploying to multiple environments
- Data Factory is time times faster with a GIT back end that it is when authoring against the data factory service because resources are downloaded from GIT
- Adding your code into Git rather than simply into the Azure Service is actually more secure and faster to process
Setting Up Git
We already have an Azure Devops Project with Repos and Pipelines turned on
We already have an Azure Subscriptions and Resource Groups for both Production and Development Environments
There is already a working Data Factory in development
In this example Git was set up through the Data Factory management hub (the toolbox)
DevOps Git was used for this project rather than GitHub because we have Azure DevOps
The Project Name matches the Project in Devops
The Collaboration branch is used for Publishing and by default it’s the master branch. You can change the setting in case you want to publish from another branch.
Import existing resources to repository means that all the work done before adding Git can be added to the repository.
Devops is now set up.
Close Azure Data Factory so we can reopen it again to go through the GIT process (If it is open)
Where to find your Azure DevOps
You should now be able to select your own Area in Devops / Repos and select the created project within Azure Devops and Repos
You will need an Account in Azure DevOPs Click on Repos and that account must be higher than Stakeholder to access Repos
You can then select the Project you have created
Using Git with Azure Data Factory
In Azure Open up your Data Factory
Git has been Enabled (Go to Manage to review Git)
The master branch is the main branch with all the development work on it
We now develop a new Feature. Create a Feature Branch + New Branch
We are now in the feature branch and I am simply adding a description to a Stored Procedure Activity in the pipeline. However this is where you will now do your development work rather than within the master
For the test, the description of a Pipeline is updated. Once completed my changes are in the feature 1 branch. I can now save my feature
You don’t need to publish to save the work. Save all will save your feature, even if there are errors
You can go across to Devops to see your Files and history created in Azure Devops for the feature branch (We will look at this once merged back into Production)
Once happy Create the pull request
This takes you to a screen to include more details
Here I have also included the Iteration we are currently working on in Devops Boards.
A few tags are also added. Usually, someone will review the work and will also be added here.
The next screen allows you to approve the change and Complete the change
In this case I have approved. You can also do other things like make suggestions and reject
Completing allows us to complete the work and removes the feature branch. Now all the development in the feature branch will be added to the main, master branch.
In Data Factory, go back to the master branch and note that your feature updates are included
We now publish the changes in Master Branch which creates the adf publish branch. This publish branch creates the ARM template that represents the pipelines, linked services, triggers etc.
Once published, In Devops Repos , there are now files to work with
You can see your change within the master branch
(The changes would normally be highlighted on the two comparison screens)
Here we open the Pipelines folder got the compare tab and find the before and after code
And you can also see your history
The Arm templates is in the adf_Publish branch, if you select this branch
Once done we need to move the changes to the next environment, in this case Prod (When ready)
This is where Azure Devops Pipelines come into play
Continuous Development using Azure DevOps
We need another Data Factory object to publish changes to
In this case, the Production has been created with Azure Portal within the Production Subscription and Production Resource Group
Git Configuration is not needed on the Production resource. Skip this step
Create your tags and Review and Create
For this specific Project, We don’t want to update production automatically when we publish to Dev. We want this to be something that we can do manually.
Go to Pipelines and create a new release Pipeline (In DevOps)
Click on Empty job because we don’t want to start with a template
And because for this project there is no UAT, just Production name the Release Pipeline Prod
Click on the X to close the blade
We need to sort out the Artefact section of the Pipeline
Click on Add an Artefact and choose an artefact from Azure Repos
We may as well add adf_Publish branch which contains the ARM templates
And the Master branch
the Source alias was updated with _adf_publish
Both Pipelines are Azure Repos artefacts
Next We move to Prod and Start adding tasks
Click on 1 job, 0 tasks to get to tasks
Click + against Agent Job to add the task Our task is for ARM Template deployment
Then click on the new Task to configure
The first section is where you select your production environment
Next you need to select the ARM template and the ARM template parameters file. These are always updated in the Devops artefact everytime you publish to dev.
The JSON templates are in the adf_publish branch
Now you need to override the template parameters because these are all for Dev and we need them to be production. These are:
These will be specific to your own data Factory environment. In this instance we need to sort out the information for the Key vault and data lake storage account
This one is easy. The only difference is changing dev to prd
The Production Data lake Storage account must be set up in dev and prod before continuing. Go to this Storage account resource
This information is also stored in our Key Vault as a secret which we can hopefully use at a later date.
It is taken from Storage Account, Properties. We want the Primary endpoint Storage for the data lake
Copy the Primary Endpoint URL and override the old with the new Prod URL in DevOps
We need to update https://dev-uks-project-kv.vault.azure.net/
Lets get this overridden. We already have a Key vault Set up in production. Get the URI from Overview in the Production Key Vault Service
and lets add this into our DevOps parameter
This is empty but we could add to it later in the process.
Account keys are the kind of things that should be kept as secrets in Key vault in both Dev And Prod
Lets get them set up. Just for the time being, lets ensure we have the Data Lake storage account key within our development and Production Key vaults
Within Key vault in development create a secret with the nameAzureDataLakeStorageGen2LSaccountKey
And the key from the storage account comes from……
And Repeat for Production Key vault
For the time being through lets leave this blank now we have captured the information in the key vault. It should come useful at a later date
This was also empty within the parameters for dev.
You can get the connection string value by going to your SQL data base. Connection Strings. PHP and finding the try statement
And here is the Connection String value for production
Server=tcp: prd-uks-project-sql.database.windows.net,1433; Database= prd-uks-project-sqldb;
You can also add this information into Key Vault as a secret and repeat for Production
For the first instance we are going to leave empty as per the dev parameters. At some point we should be able to set up the Security Principal so we can change the hardcoded values to Secrets
The parameters created in dev are now overridden with the production values
The Pipeline is then named
Create a release
Once Saved. Click back on Releases
For this tye of release we only want to do it manually
Create a Release for our very first manual release
Click back on releases
And click on release 1 to see how it is doing
You can click on Logs under the Stages box to get more information
Now you should be able to go back to the production data Factory and see that everything has been set up exactly like Dev.
Go and have a look at linked Services in the Production data Factory
Note that they are all set with the Production information
We now have a process to move Dev to Prod whenever we want
Throughout the sprint, the development team will have been working on Feature branches. These branches are then commited into the master pipeline and deployed to Dev
Once you are happy that you want to move your Data Factory across from dev into Prod. Go to DevOps Release pipeline
Create Release to create a new release
It uses the Artefact of the Arm template which is always up to date after a publish.
This will create a new release and move the new information to Prod
All your resources will be to be able to quickly move from Dev to Prod and we will look at this in further posts