Lessons learned whilst working with Microsoft Fabric – Part 1: Data Engineering

Any time there is a big change in technology, there is a steep learning curve to go with it. Since Microsoft announced Fabric in May 2023 We have been working hard on getting up to speed with how Fabric works and how it changes the nature of what we do.

  • What new architectures are available for us to work with?
  • How it changes working with Power BI?
  • How we create our staging and reporting data before loading into Power BI?
  • How Pipelines differ from data Factory, and pipelines in Synapse?
  • Keeping  up with the monthly updates across Fabric

In my previous post “The Microsoft data Journey so far. From SQL Server, Azure to Fabric” I looked at my own journey working with on premises Microsoft services through to Fabric. Along with  identifying all the key fabric areas.

This post is about my initial discoveries whilst actively working with Fabric, specifically using the Lake house, and ways of developing your learning within the Microsoft Fabric space.

In Part 1, we explore various topics within the Fabric Data Engineering capabilities. In Part 2, we will delve into Fabric Power BI and semantic modelling resources.

Analytics Engineer

Analytics engineer is a new title to go along with Fabric (SaaS) end to end analytics and data platform. One of the first things to do was to go through Microsoft’s Analytics Engineering learning pathway, with the aim of taking the exam.

I personally took my time with this because I wanted to get as much out of it as possible and passed the exam on the 24th of June 2024. Making me a certified Fabric Analytics Engineer.

I highly recommend going through the learning pathway https://learn.microsoft.com/en-us/credentials/certifications/fabric-analytics-engineer-associate/?practice-assessment-type=certification to get equipped with the skills needed for being an Analytics Engineer,

You learn about Lakehouse, Data Warehouses. Notebooks. Pipelines. Semantic models and Reporting throughout the course.

But the main question is, What is Fabric Analytics Engineering, and how does it differ from Data Engineering and Data Analysis?

Data Engineering          

  • Designing and building data pipelines.
  • Focus on maintaining the infrastructure
  • Collecting data from multiple systems and data storage solutions
  • Skills in SQL, Python, Devops GIT, ETL tools, Database management etc

Analytics Engineering

  • Specialises in analytics solutions. Data engineers have a broader focus.
  • Collaborate closely with Data Engineers, business process knowledge owners. Analysts etc
  • Transforming data into reusable assets
  • Implementing best practices like version control and deployment
  • Works with lakehouses, Data Warehouses, Notebooks, Dataflows, Data pipelines, Semantic models and Reports
  • Skills in SQL, DAX, Pypark, ETL Tools etc

Analytics Specialist

  • Focuses on analysing data and data insights to support decision making
  • Creates reports and Dashboards and communicates findings
  • Identifies trends and patterns along with anomalies
  • Collaborates with stakeholders to understand their needs.
  • Skills in visualisation tools like power BI

As you can see, the Analytics Engineer is a bridge between Data Engineering and Analytics. The Analytics Specialist is sometimes seen as an end to end developer with knowledge across all of these specialties. But has a major focus on analytics solutions.

Architecture

With Fabric, we have numerous architectural options. By making strategic choices, we can eliminate the need for data copies across our architecture. Consider the standard architecture using Azure resources. A Data lake, SQL Database and Data Factory.

Here we have 2 copies of the original data. In the data Lake and the SQL Database (Because you copy the data across to transform, create your dimensions, facts etc).

And finally the same imported dims and facts created in SQL DB are imported and stored in Power BI.

This architecture works well, it allows for data scientists to use the data in the data lake and it allows for SQL Stored procedures to be put in place to process the data into the correct analytical (Star) Schemas for Power BI.

However, wouldn’t it be great if you could remove some of the data copies across the resources.

Fabric leverages the medallion architecture

  • Bronze layer – Our raw unprocessed data.
  • Silver – Cleaned and transformed data
  • Gold Layer – Enriched data optimised for analytics.

Even using Fabric, there are lots of opportunities to use specific resources to change your architecture dependent upon the project. For example, you could decide to use Fabrics next generation Data warehouse, designed for high performance and scalability. Excellent for big data solutions. And allows you to do cross database querying, using multiple data sources without data duplication.

However, at this point I have spent my time looking at how we can utilize the delta lake. Creating an architecture that uses Delta Parquet files. Can this be a viable solution for those projects that don’t have a need for the high level ‘big data’ performance of the Fabric Data Warehouse?

There are significant advantages here, as we are reducing the amount of duplicated data we hold.

And of course, Power BI can use Direct Lake connection, rather than Import mode. Allowing you to remove the imported model entirely from Power BI. Even better, with partitioned Delta Parquet files you can have bigger models, only using the files that you need.

This architecture has been used for some initial project work, and the Pyspark code, within Notebooks, has proved itself to be fast and reliable. As a fabric Engineer I would definitely say that if you are a SQL person its vital that you up your skills to include Pyspark.

However, with some provisos, The Data Warehouse can also utilise Direct Lake mode, so sometimes. Its literally the case of, what language do you prefer to work in. Pyspark or SQL?

Task Flows

The Fabric Task flows are a great Fabric feature, and incredibly helpful when setting up a new project.

  • You get to visualize your data processes
  • Create best practice task flows
  • Classify your tasks into Data Ingestion, Data Storage, Data Preparation etc
  • Standardise team work and are easy to navigate

Here, the Medallion flow was used, immediately giving us the architecture required to create our resources.

You can either select a previously created task to add to your task flow

Or create a new item. Fabric will recommend objects for the task

One tip from using the medallion task flow. As you can see. Bronze, Silver and Gold Data Lake houses are shown as separate activities. Currently, you can’t create one data lake and add it to each activity.

If you want to use one lake for all three areas, you need to customise the activity flow.  As a result, the decision was made to have three delta lake’s working together for the project. But it may not be something you wish to do. So customising the flow may be a better option.

GIT integration

The fabric workspace comes with GIT integration, which offers a lot of benefits.  With GIT, you can save your code base to your central repository, allowing for version control. Much better collaboration, better code and peer reviewing. And CI/DC automation.

There are some current issues however, especially with branching, as some branching capabilities are still in preview.  For an initial test project a very basic approach was used.

Azure Devops was used for this project

https://dev.azure.com

Here, a new Project has been added to Devops: Debbies Training

Visual Studio

Visual Studio was used to clone the repository, but there are lots of other ways you can do this next step, For example GIT Bash.

And connect to the repository that has been created (You need to log in to see your repos)

Click clone and you will then have a local copy of the code base. It doesn’t support everything at the moment but it does support Notebooks, Reports, Semantic Models and Pipelines, which is the focus of our current learning.  

Connect Devops to Fabric

Back in the Fabric Workspace go to Workspace Settings

You are now connected to Devops (Note the branch is main)

Later, we want to start using branches when the underlying Fabric logic is better, But for now, we have been using the main branch. Not ideal, but we should see this getting better a little further down the line.

You can now create your resources and be confident that your code is being stored centrally.

All you need to do is publish changes via the Fabric workspace (Source Control)

Click Commit to commit your changes and change Descriptor

Watch out for updates to this functionality. Especially branching

Pyspark and Notebooks

As a SQL developer, I have spent years writing code to create stored procedures to transform data in SQL databases.

SQL, for me is what I think of as my second language. I’m confident with it. Pyspark is fairly new to me. My biggest question was:

Using my SQL knowledge, can I think through a problem and implement that solution with Pyspark.

The answer is, yes.

As with anything. Learning new languages can be a steep learning curve. But there is lots of help out there to grips with the new language. For example, CoPilot has been incredibly useful with ideas and code. But, on the whole, you can apply what you know in SQL and use the same solutions in a Pyspark notebook.

Here are a few Tips

  • Pyspark, unlike SQL is CASE sensitive so you have to be much more rigorous when writing code in your notebooks.
  • When working with joins in Pyspark. You can significantly speed up the creation of the new data frame by using Broadcast on the smaller table.  Broadcast optimizes the performance of your spark job by reducing data shuffling.
  • With SQL, you work with temporary tables and CTE’s (common table expressions). Dataframes replace this functionality, but you can still think of them in terms of your temporary tables.
  • SQL, you load the data into tables, With the Lakehouse, you load your data into files. The most common type is Parquet. It’s worth understanding the difference between Parquet and Delta Parquet. We looked at this in detail in the last blog “The Microsoft data Journey so far. From SQL Server, Azure to Fabric”. But we will look at the practicalities of both, a little later.
  • Unlike a SQL Stored Procedure where, during development you can leave your development work for a while. Then come back to the same state. The spark session will stop at around 20 minutes so you can’t simply leave it mid notebook. Unless you are happy to run again.  
    • Start your session in your notebook.
    • Click on session status in the bottom left corner
  • See the session status in the bottom left corner

Here we can see the timeout period which can be reset.

Delta Parquet

When working with Parquet files. We can either save as Parquet (Saved in the files section of fabric) Or save as Delta Parquet. (Saved in the tables section of Fabric)

Always remember, if you do want to use the SQL Endpoint to run queries over your files, always save as Delta Parquet.

If you want to use the Direct Lake connector to your parquet files for Power BI Semantic Model, again, use Delta Parquet files.

One question was, if you are using a lake house and have the opportunity to create Delta Parquet. Why wouldn’t you save everything with the full Delta components of the parquet file?

There are a few reasons to still go with parquet only.

  1. Parquet is supported across various platforms.  If you share across other systems this may be the best choice.
  2. Parquet is simple, without the ACID transaction features. This may be sufficient.
  3. Plain parquet files can offer better performance.
  4. Parquet files are highly efficient for storage, as they don’t have the delta parquet overheads. A good choice for archiving data.

With this in mind. Our project has followed the following logic for file creation

Dims and Facts

Always use Delta Parquet for full ACID functionality and Direct Lake connectivity to Power BI.

Audit Tables

We always keep track of our loads. When the load took place? What file was loaded? How many records? etc. Again, these are held as Delta Parquet. We can use the SQL endpoint if we want to quickly analyse the data. We can also use the Direct Lake connector for Power BI to publish the results to a report.

Even better. Our audit reports contain an issue flag. We create the logic in the Pyspark Notebook to check if there are issues. And if the flag is 1 (Yes) Power BI can immediately notify someone that there may be a problem with the data using Alerts.

Staging tables

A lot of the basic changes are held in transformed tables. We may join lots of tables together. Rename columns. Add calculated columns etc. Before loading to dims and facts. Staging tables are held as Parquet only. Basically, we only use the staging tables to load dim and fact tables. No need for the Delta overheads.

Pipelines

When you create your notebooks, Just like SQL stored Procedures, you need a way of orchestrating their runs. This is where Data Factory came in working with Azure. Now we have Pipelines in Fabric, based on the pipelines from Azure Synapse.

I have used Data factory (and its predecessor Integration Services) for many years and have worked with API’s. The copy activity. Data Mappings etc. What I haven’t used before is the Notebook activity.

There are 5 notebooks, which need to be run consecutively. 

Iterate through Each Notebook

When creating pipelines, The aim is to reuse activities and objects. So, rather than having 5 activities in the pipeline. One for every notebook. We want to use 1 activity that will process all the notebooks.

In the first instance. We aren’t going to add series 5 into the Data Lake.

Create a csv file

Also get the IDs of the workspace and the notebooks. These were taken from the Fabric url’s. e.g.

The file is added into the bronze delta lake

Now we should be able to use this information in the pipeline

Create a lookup

In the pipeline we need a Lookup activity to get the information from the JSON file

Add a ForEach Activity

Drag and drop a ForEach activity onto your pipeline canvas and create an On Success Relationship between this and the Lookup.

Sequential is ticked because there are multiple rows for each notebook and we want to move through them sequentially.

Set the ‘Items’ in Settings by clicking to get to the pipeline expression builder

We are using the output.value of our lookup activity.

@activity(‘GetNotebookNames’).output.value

Configure the Notebook Activity Inside ForEach

Inside the Foreach. Add a Notebook activity

Again, click to use the Dynamic content expression builder to build

Workspace ID: @item().workspaceID

Notebook ID: @item().notebookID

Note, when you first set this up you see Workspace and Notebook Name. It changes to ID, I think this is because we are using the item() but it can be confusing.

This is the reason ID has been added into the csv file. But we still wanted the names in the file in order to better understand what is going on for error handling and testing.

  • @item(): This refers to the current item in the iteration which is a row. When you’re looping through a collection of items, @item() represents each individual item as the loop progresses.
  • .notebookID: This accesses the notebookID property of the current item. And the notebookID is a column in the csv file

Running the pipeline

You can check the inputs and outputs of each activity.

If it fails you can also click on the failure icon.

The above information can help you to create a simple pipeline that iterates through Notebooks.

There are other things to think about:

  • What happens if anything fails in the pipeline?
  • Can we audit the Pipeline Processes and collect information along the way?

Within the Notebooks, there is Pyspark code that creates auditing Delta Parquet files which contain information like: Date of Load, Number of rows, Name of activity etc. But you can also get Pipeline specific information that can also be recorded.

Currently this Pipeline can be run and it will process either 1 file or multiple files dependant upon what is added to the bronze lakehouse. The Pyspark can deal with either the very first load or subsequent loads.

With this in place, we can move forward to the Semantic Model and Power BI reporting

Conclusion

Most of the time so far has been spent learning how to use Pyspark Code to create Notebooks and our Delta Parquet files. There is so much more to do here, Data Warehousing, Delta parquet file partitioning. Real time data loading, Setting up off line development  for code creation etc.

The more you learn, the more questions you have.  But for the time being we are going to head off and see what we can do with our transformed data in Power BI.

In the Part 2, we will look at everything related to Power BI in Fabric.

Microsoft Fabric Part 10. Taskmaster Project. Creating the Reporting and the Project File (PBIP)

So far we have created our Delta PARQUET files in the Delta Lake using Notebooks with Pyspark.

We have created a Semantic Model – Storage mode Direct Lake in fabric

Its time to create some visuals. but the question is. Do we create them in Fabric. or in a bpix file?

Reporting Fabric or Pbix File?

Fabric – Centrally managed

Pbix – offline Development and version controlling available for the files with .pbip (Project file)

For this project we are going to go for the hybrid approach. The semantic model in Fabric. And reports developed in Desktop with Devops Version controlling which are published into Fabric.

This gives us better opportunities for version control and collaboration.

Get Data

In Power BI Desktop

Power BI Reporting

Our front end reporting wont be gone into too much detail here. We are more interested in other areas of Fabric. So here are the basic pages

Drill through to Series

And Drill through to Episode

We now have a pbix Report we can publish through to our Fabric Workspace

Create the Power BI Project (PBIP)

Instead of simply saving as a pbix (black box) file, lets save as a project file instead and see how this can really change how we work with others in Power BI. We should see benefits like:

  • Items are stored in JSON format instead of being unreadable in one file
  • JSON text files are readable and contain the semantic model and report meta data
  • Source Control. Finally real source control for Power BI
  • Amendable by more than one person at a time?
  • The possibility of using (CI/CD) Continuous Integration and Continuous Delivery with Power BI

Saving as a project is in preview so lets turn it on.

Options and Settings / Options

TMDL

This was mentioned in the Power BI June 2024 Updates.

TMDL is the improvement to PBIP as the semantic model file format for the Power BI Project files.

Our Semantic model has been created within Fabric. we might look at this in more detail later.

And now we can Save as

And we can see that its a project in the title.

Clicking on the title shows us file paths of the objects. we only have the report because the semantic model has been created within Fabric.

Lets have a look at what we have created in Explorer

The main project file

The reporting folder

Our objects within the reporting folder.

Currently this is in One Drive. We want to get this added into Devops and GIT so it can be added into our full source control process along with all the Fabric code.

Azure DevOps build pipelines for continuous integration

First of all we need to make sure Fabric is connected to Azure Devops which it is

And cloned to a local drive. This was done using Visual Studio but there are other ways you can clone.

It would have been better to have saved into this cloned local project, But we can create a folder and move instead.

Created a Power BI Folder.

And Moved all the objects mentioned above into this folder

Its in local but not yet in the cloud.

I have installed Git Bash to work with https://git-scm.com/downloads

In GIT Bash I change directory to the correct GIT Directory (Local) and us ls to list the information in there

cd source/repos/DebbiesTraining
ls
git status

We can see we have powerbi code that is uncommitted and needs pushing to the central GIT repository.

We don’t really want to work on the main branch. A better cleaner process is to create a feature branch. InitialPowerBIFile

git add .
The . in git add . is a wildcard that represents all files and directories. basically telling GIT to stage all changes.

git add powerbi

just adds the power BI folder. We dont really want to do anything with fabric as this is taken care of already.

git commit -m "InitialPowerBIFile"

-m flag is allowing you to provide a commit message in the command. And our command is committing changes

So far we have added the files to the staging area and committed them. Time to Push to the cloud

git push --set-upstream origin InitialPowerBIFile

You will initially get a log in screen to authenticate the push

And this is the line you want to see. We know that we have pushed it to devops

Back in DevOps

Remember to change to the correct branch. You wont see it in Main.

We can see the Power BI Items.

Create a Pull Request

Very simple. We are in Dev and don’t have any reviewers or work on Boards to connect to

Create and Complete

Power BI Files are now in Main

How do you now work with the power BI Project file?

Open the project file from your local source control

lets create a new quick report page for Demographics

And save

Back to GIT

git checkout -b "InitialPowerBIFileAllDemographicsPowerBIPage"
git status
We can see we have modifications
git add .
git commit -m "InitialPowerBIFileAllDemographicsPowerBIPage"
git push --set-upstream origin InitialPowerBIFileAllDemographicsPowerBIPage

Back in Devops

Create and Complete merge. (If you don’t use the reviewing process in Development)

We can see the page is in with lots of work added into this part of the JSON script.

Where is the Semantic Model?

The Semantic model has been created in Fabric.

It says synced on GIT Status.

And here is is in Devops.

Conclusion

The above can be done in Power BI Pro as well as Fabric. Which is good news.

However, my worry about this is that someone creating Pbix Reporting won’t take to the whole GIT process. I suspect that when it comes down to it, it simply wont be done properly.

If you were to go the pbix file route, this would need a lot of governance work to get people to use GIT. At Enterprise level this would be doable but I doubt this would become a real part of the process at self service level.

I did start to question the process. after doing some more reading. It feels like if you created the reporting inside of Fabric, The GIT Process would be simplified. Even though the documentation states that for full GIT control use PBIX. I will be really interested in creating a Power BI Report inside of Fabric to see how this would work in the process at some point.

In the next blogs we are going to look at some new features of Fabric. We can also look at using this feature along with with Devops Pipelines later. (Premium and Fabric only)

Setting up a Board in Azure DevOps (AGILE)

Its time to start tracking Projects with Azure DevOps boards.

With Boards, teams can manage software Projects. They can track user stories, backlog items, tasks, features etc. You can choose the environment you want to work with like AGILE or SCRUM.

For this example, there is only one developer (me) and I’m am tracking my progress on a project where I have been the single developer

AGILE is the process that going to be used

Agile is an iterative approach to project management and software development that helps teams deliver value to their customers faster and with fewer headaches. Instead of betting everything on a “big bang” launch, an agile team delivers work in small, but consumable, increments.

First Open Azure DevOPs

https://azure.microsoft.com/en-gb/services/devops/

Sign into your DevOps account.

And create a new Project under your Enterprise

Work Items

Now we have a new project we can start working with Boards but first we need to understand what our AGILE work items are and how they interact with each other.

Epic

I have specific Epics I want to achieve

  • Reporting from the companies main system
  • Social Media reporting
  • Reporting for the Surveys
  • Reporting for all the telephone enquirers
  • Reporting for Complaints
  • Main Reporting Area for all the data Auditing
  • Reporting for Report usage

So, just looking at this. I want 7 Epics to work with (To start with)

Feature

A feature is some complete behavior to implement a new business process. So for example. for the Social Media Epic we want

  • Overall View of Business Performance as provided by the surveys
  • Monthly level reporting on customer satisfaction with drill through

User Stories

User stories are within a feature. These are the smallest change that will result in behavior change. If you don’t observe a change then it cant be demonstrated

For example, as the Customer Satisfaction Manager I want to see the Survey results by month and have the ability to see how we are doing by over the year and at the same point in the year because we need to know if we are doing well as a company to see our trends in satisfaction

As the Company Head of Service I want a full review of our performance using our scoring system against customer satisfaction and how our competitors are doing for benchmarking

Task

These are within a User Story and are the smallest independently deployable change.

  • Get file of Survey data (Pilot project)
  • Move Survey data into the Azure Data Warehouse (Staging area) Incremental loading using Data Factory
  • Establish dimensions and facts
  • Create Dim 1
  • Create Dim 2…….
  • Create Power BI Data Flows
  • Create Top level report by Month of Customer Satisfaction containing last 12 months
  • Create KPIs for Satisfaction against this time last year
  • Drill through to detailed report
  • Drill through to lowest level

Bug

a Bug is an error in the code

  • Incremental Refresh is causing Duplicates
  • NULL data Items in Survey Data set

Issue

An issue is more related to a process, when the System fails to meet user expectation

  • Created a report based on poorly Served Customers but this needs changing to the new business logic.

Test Case

Test cases can validate individual parts of your code. We will look at this item in another blog post

Boards

Lets start with boards. These boards are Kanban Boards

A Kanban board is one of the tools that can be used to implement Kanban to manage work at a personal or organizational level.

When the board is first opened up, Epics don’t seem to be available

With Boards Selected Go to Configure Team Settings

Make sure that Epics are ticked in Under Backlogs

And now with Epics selected click on New Items are start adding in the required Epics

Next we need to start adding some features. It would seem that you cant add the features and then connect them to the epics. You have to create the Features from the epics

go back to the epic, click on … and Add Feature

And you can then see the feature within the Epic

Now we have a Feature we can add the user Story. Go to Features and click on the Add User Story

Same again. Move to User Stories and add Tasks

You can also go into the items and add lots more detail

This link to the Microsoft Documentation gives you lots of information regarding, effort, story points, business value, Priority etc.

Its always good to create the epic and work your way down into the Tasks

Retrospective Items

For this example, items are being added for a sprint that was closed some time ago because the project is being retrospectively moved into Azure Boards

Epics

I am starting them all from the beginning of this particular Project and for this I can add a Start Date

Stories

The Start Date also Applies to Stories but these will be set when the stories were originally created

However when you close a task and move the whole story into completed, you cant set a completed date


If you click on History and look at the state graph, you cant change the New and Resolved Times. These are set at the time of the action which makes it difficult to add past information into the Board

Backlogs

  • Backlogs help you to Quickly define work (User Stories, backlog items, requirements)
  • You can reorder the backlog so you work on the highest priority first
  • Add details and estimates
  • Assign items to team members and sprints by either bulk update or drag and drop
  • Map items within a hierarcy
  • Review the portfolio of work
  • Forecase work to estimate deliveries
  • Display rollup progress, counts and totals to show completion of work

Basically your backlog displays work items as lists and boards display them as cards

The Remaining Active User Stories have been dragged to Iteration 1

Work Items

All the work items you create can be viewed in here as well as created

Hopefully this gives you a little head start into the world of Azure DevOps boards

DevOps Organisation Settings

Lets have a look in a little more detail at the Settings for the DevOps Organisation Level and at what the Organisation is.

When setting up DevOps, it created at Organisation level of [My Name] , and another Organisation was then created manually for the company

Organisations can be treated as accounts and each organisations has its own URL. You will also hear organisations being called Collections

You must always start with one organisation but you can have multiple.

Go to Organisation Settings

General

Overview

This gives you the overview of the Devops organisation and there are a few options in here to be aware of

Privacy URL

Your Privacy URL can be added here to link to your organisations document describing how we handle internal and external guest data. If you have a public website or app you are required to have a dedicated Privacy policy URL.

Q Do you have a privacy document already in place?

https://www.freeprivacypolicy.com/blog/privacy-policy-url/

Organisation Owner

the owner is set against the person who created the organisation account but this can be changed.

Projects

New Projects can be set up here. See https://debbiesmspowerbiazureblog.home.blog/2020/03/06/create-a-devops-project/

Users


A good area to find out the Access levels for each user

  • Stakeholder: Partial access and can be assigned to unlimited users for free
  • Basic: Provides access to everything but test plans. Up to 5 users free, then £4.48 per month
  • Basic + Test Plans: Includes test plans. £38.76 per month
  • Visual Studio subscription: For users with a Visual Studio Subscription and features are enabled based on Visual Studio Enterprise, Professional Test Professional or MSDN Platform

Group Rules

DevOps includes group based licensing for Azure Active Directory (Azure AD) Groups and Azure DevOps groups.

Azure Active Directory Groups are created in Azure Active Directory


DevOps Groups and Collection Level Groups can be found within the Permissions section so we can look at this in more detail later.

Add a group Rule to assign an access level or extention to the group and resources in Azure DevOps are assigned to all members of the group.

When users leave a group the licenses are freed and returned to your pool.

Imagine this scenario

It will be easier to add the following Groups

  • Project A Contributor Group and add Debbie and Jo
  • Project A Reader Group and add Tess
  • Project A Administrator Group and add Sarah
  • Project B Reader Group and add Debbie and Tess
  • Project C Contributor and add Jo
  • Project C Administrator and add Debbie

To manage licenses and group rules, you must be a Project Collection Administrator (PCA) for the organization. Check this within Security > Permissions

Click Add a group rule

Both users within this new group have a visual Studio account so Group Level is set to to Visual Studio. However with Group Rules you assign the Access level for the Users at Group Rule Level

At this point I also added a new Group.

You can click on the … Button after Last Evaluated to get to either Manage Group Rules or Manage Members

These are the two rules for this group

What happens when your users have a mix of access levels within the one User Group?

The Users get their Access Levels from the Rule so their Access level would be reset

Billing

Before Looking at Billing you can use the Azure Calculator to get more of a feel of what you need

https://azure.microsoft.com/en-gb/pricing/calculator/?service=azure-devops

Note that you are paying for 5 basic plans, the first 5 are free but the 10 developers who need the Basic + Test Plans License are the ones adding to the monthly cost


Why would you up Additional Parallel CI/CD jobs on either Microsoft Hosted of Self hosted Pipelines?

See Parallel jobs for more information

Azure DevOps Billing is through Azure and because Billing has not yet been set up we only have access up to the Free Tier limits

If you click on Set up Billing You need to choose an Azure Subscription to add it too

You cant add it to your Personal credits because of the spending Limit caps. Once set up you can then manage the paid access for your users, bearing in mind the free usage tier

https://docs.microsoft.com/en-us/azure/devops/organizations/billing/buy-basic-access-add-users?view=azure-devops

Auditing

Allows you to see all the audit-able events and you can export the log and filter on specific events

Global Notifications

You will recieve notifications for lots of Actions in DevOps like when a build completes or a Deployment approval is pending. You can configure the organisation notifications from here

Usage

Usage allows you to see whats been going on in DevOps


You can also filter the information and choose your columns that you are interested in

You can Select

  • TFS (Any application within the Organisation Service account, e.g. TFS https://dev.azure.com/Organisation/)
  • Release Management (Any application within Release Management Service)
  • Analytics (Any application within the Analytics Service)

Statuses

  • Normal
  • Delayed
  • Blocked

And time period, for example, the last 7 days.

Extensions


You can browse the marketplace for additional DevOps Services. Like Analytics above to gain insight into the health and status of your DevOps Projects

Just Searching for Git brings up lots of free Git Services that you can use

Azure Active Directory

When DevOps was accessed, an email was used that was connected to a Tenant in Azure because we use Office 365

Office 365 uses Azure Active Directory (Azure AD) to manage user identities behind the scenes. Your Office 365 subscription includes a free subscription to Azure AD so that you can integrate Office 365 with Azure AD if you want to sync passwords or set up single sign-on with”

Because of this, Azure DevOps connected up the Azure Active Directory

Security

Policies

If we never want to allow Public Projects we can simply set the above policy to Off.

We can also dis-allow external guest access through policies

Permissions

Project Collection Administrator Has permissions to administer build resources and permissions for the collection

A collection is a container for a number of projects in Azure DevOps and corresponds to the Organisation.

Lets set up another Project Collection Administrator

We want everyone in the POC Admin Group to also be a Project Collection Administrator

The DevOps Groups that have been created and Collection Level (Enterprise Level) Groups can all be seen here

You can also look at the users and access all the Settings per user from the User tab.

Boards

Process

Gives you more information on what will be available on the board for each type, e.g. Agile, Scrum etc.

Pipelines

Agent Pools

DevOps Pipeline Pools are scoped to the entire organisation so they can be shared across Projects. You don’t need to manage Agents individually. You can organise them into Pools

An agent is installable software that runs one job at a time.

With Microsoft hosted agents, every time you run a pipeline you get a fresh virtual machine which is discarded after use

Settings

This is an example status badge in Boards

You can set variables at queue time unless the above option is enabled. Only variables Settable at queue time can be set if its limited

By default, the collection (Enterprise)-scoped identity is used, unless the Limit job authorization scope to current project is set in Project Settings > Settings.

Deployment Pools

An agent pool (Above) defines the sharing boundary for all agents in that pool. Whereas deployment Pools are about deployment target machines that have agents installed on them.

Deployment pools are convenient when you want to deploy a project to different machines

Deployment groups represent the physical environments; for example, “Dev”, “Test”, “UAT”, and “Production”.

The deployment group is a layer over the deployment pool which makes the targets available to the release definitions within a project

Parallel Jobs

As we have already established. The free tier allows you to have 1 parallel job but What is a parallel job?

A job is a set of one or more build tasks that run sequentially on the same target.

You could have for example a project that contains a web app and a Web API. They dont need to be run sequentially, they can be run in parallel to save time

Each App type can equate to 1 job within a build definition. Each build definition as its own build pool/agent

If you have two licensed build pipelines you could run them concurrently which will decrease overall build time.

You could also split up tests across jobs to be executed across parallel builds which will reduce the overall time

You need to look at your Projects and what you are building and see if Parallel jobs are relevant to you, or if you are happy for the agents to run your builds sequentially

Oauth Configurations

Register and Authorise your app

https://docs.microsoft.com/en-us/azure/devops/integrate/get-started/authentication/oauth?view=azure-devops

Artifacts

Storage

Check the storage sizes of your artifacts. Remember you get 2 GB free for your artifacts.

Hopefully this gives you a little more information of each sections of your organisational settings. You need to understand at a basic level

  • Who your teams will be and what they will require (Set Access Levels, Groups and Rules up)
  • What you will be creating and can they be built in parallel?
  • what environments will you be releasing too

Each section can then be looked at in much more detail

Create a DevOps Project

At this point we know that we are using DevOps Service rather than the on premesis DevOps Server so go to https://azure.microsoft.com/en-us/services/devops/?nav=min

To get to your initial DevOps page

Which gives you three routes in. Do you…

  • Start Free?
  • Start Free with GitHub?
  • Or Sign in to Azure DevOps if you already have an account?

At this point, we don’t have an account

Making Sense of DevOps Pricing

Start Free or Start Free with Github

Start free with GitHub

Use this option if you already have a GitHub account

In this example Start for free option is used. Because [I] have an azure account, DevOps is already logged in and it knows what my tenant is.

At this point there is no suggestion of a 30 day trial or any other information. so for the time being lets get started by adding a project (This is because I have a Visual Studio account. This may be different for uses without a Visual Studio Subscription)

Also Note the level https://dev.azure.com/debbieedwards

This is the organisation level and this is my own personal devOps account. What happens if you want to set up a New Organisation to connect related projects and scale out to enterprise Level?

Select New organisation

And New Project

the New Organisation level has been set up with the Companies name

I can now start working with DevOps at an organisational level. We can have 5 basic users for free so for the time being, this is what we will stick to.

We know we want a Private DevOps area

Version Control Git has been selected because this seems to be the one that other team members are the most comfortable with

Work Item Process Agile, Basic, CMMI, Scrum

The default is Agile but lets have a quick look at each of these processes

Basic

The simplest model.

Agile

Agile includes scrum, Works great if you want to track user stories and bugs on a kanban board

Scrum

Supports the Scum methodology. Really good for tracking backlog items and bugs on a kanban board.

CMMI

if your team follows more formal methods use CMMI (capability Maturity Model Integration.

For this Project, the Agile approach is selected

Invite a user into the Project.

Under 5 users and DevOps is free. We wont be doing anything with test plans at the moment so lets add one other user into this area

DevOps Access Levels

  • Stakeholder can be assigned to users with no license or subscriptions and need a limited set of features
  • Basic Provides access to most features. up to 5 users is free
  • Basic + Test Plans The user has access to test plans but this costs around £34 a month extra and you dont get any free accounts
  • Visual Studio Subscription – Assign to users with a Visual Studio Subscription

Check Levels of Access


Go back up to the organisational level and select Organisation settings in the bottom left hand corner of the screen


Next go to users and you can check the settings. Note that one account is under a Visual Studio Enterprise Subscription so this is assigned to users who already have a Visual studio Subscription.

We have 1 basic account which will be free at the moment

Checking Spending

Obviously the one thing you want to do is check that you aren’t spending money unnecessarily.

Still in Organisation Settings

Currently there is no billing applicable but this section will need a more detailed how to later on

Checking what has been done at organisation level

Here you can see that Tess has been added by myself to a Group and her access was set to basic.

There should be a post coming a long soon that will look into Organisational Setting in more detail

We have created a project and set up new users with Basic level accounts.

Getting back to your Project

Close Devops down and then re open


This time you can sign in and go straight to your new project

Next time we will start using some of the services on offer like Boards, Repos, pipelines, test Plans and Artifacts

Making Sense of Pricing for Azure DevOps to get started

You decide that Azure DevOps is the way to go because you want to make use of all the features. Specifically

  • Azure pipelines to build and release code
  • Boards to do all your planning
  • Repos so you can use, for example GiT as your code repository
  • Artifacts to share packages across projects
  • Test Plans to help you test what you have built

Azure DevOps Services Costings

DevOps is free for Open Source projects and small projects up to 5 users

https://azure.microsoft.com/en-gb/pricing/details/devops/azure-devops-services/

Azure DevOps Services

Individual Services

Taking Azure DevOps Services as the starting point, The first area to look at is Individual Services


There are only two individual services to choose from. Pipelines and Artifacts. this would be useful if you choose if you simply want to be able to build code and release it into specific environments and save your artifacts for use across projects

CI/CD -Continuous Integrations and Continuous Delivery or deployment

Along with these two services there are sliders so you can optimise the services for your requirements and it would be helpful if there were more information about the different options

Azure pipeline Options

First of all we need to understand the Microsoft Hosted will be in the public cloud. The jobs are run on a pool of Microsoft Hosted Agents. Basically, each time a job is run, a fresh VM gets created and then discarded after use. and Self Hosted will be ion premises using self hosted agents. .

With the above option you can only run one job at a time for free that runs for 1,800 mins at a time. And remember that your job is building and releasing code.

lets see what happens with Microsoft hosted if move the Microsoft hosted slider to 10?


Its not clear what this actually means. would you be paying £298 per month at the top end if you had 10 developers using the service concurrently, or do you simply pay for what you use.? So if you don’t go to more than one concurrent job at a time its still free?

Azure Artifact Options

This is clearer. Artifacts are stored so you are paying for storage.

User Licenses

If you want to use all or most of the services you can get an individual user License, Much in the same way that you would but in Power BI Pro per user license.

The only difference between the Basic plan and the Basic + Test plans is Test plans but there is a fairly big price difference.

The question is, how useful is testing plans and can you do with out them? Testing plans will be looked at in more detail later

Basic Plan

If you are happy to go without Test plans its worth looking in more detail at the fine points


However….

Azure Pipelines: Includes the free offer from INDIVIDUAL SERVICES and the free offer is specifically for 1 Free parallel job

Does that mean that even paying 4.48 per license you may have extra charges if you run a job in parallel to another developer?

If two developers are running at the same time who gets hit with the charges?

Could this be understood as being if a user puts two jobs out concurrently and if two users have a job running each, this wouldn’t be charged as its per user?

Artifacts 2 GB free per month and then the assumption is that you move on to pay for extra storage. Is this a pay as you go model?

Basic + Test Plans

The same criteria applies to the plan so the same questions still apply

There are no Free plans with this and the cost is £38.76 per user per month so this assumption is that this plan would only be required for users who will need to test the system?

More information is required in regards to Test plans and are they worth the extra £34.28 a month?

Azure DevOps Server

DevOps Server is the on premises offering built on a SQL Server backend. DevOps Server is a good option when all your services are on premises and you have, for instance Microsoft SQL Server 2019

You can either pay month to month through Azure or buy a 3 year software license.

If you buy through Azure it entitles you to use the cloud service.

With either option you need Windows or Windows Server Licenses for the Servers running Azure DevOps Server 2019

Team Foundation Server is now Azure DevOps Server.

Pricing is not established on the web site so it may need to be a call to the Microsoft Sales team to ensure you get the right fit for your needs.

If you already use the cloud the recommendation will be to go for a DevOps Service

However there are still some questions in relation to this and what is the best option, and can these options be mixed and matched when dealing with different types of users?

Introduction to Azure DevOps

What is Azure?

Taking the first part of Azure DevOps, Azure is Microsoft’s Cloud computing platform. It hosts hundreds of Services in over 58 regions (e.g. North Europe,West US, UK South) and available in over 140 countries.

As you can see, lots of Azure services have already been consumed throughout these blogs. Azure SQL Databases, Azure Data Lake gen2, Azure Blob Storage, Azure Data Factory, Azure Logic Apps, Cognitive Services etc.

Business Processes are split into Infrastructure as a Service Iaas (VMs etc) , Platform as a Service PaaS (See the services above) and Software as a Servie SaaS (Office 365, DropBox, etc)

You can save money by moving to this OpEx model (Operational Expenditure) from the CapEx model (Capital Expenditure) because you pay for what you need as you go, rather that having to spend money on your hardware, software, data centers etc

Cloud Services use Economies of Scale, in that Azure can do everything at a lower cost because its operating at such a large scale and these savings are passed to customers.

On Demand Provisioning

When there are suddenly more demands on your service you don’t have to buy in more hardware etc. You can simply provision extra resources very quickly

Scalability in Minutes

Once demand goes down you can easily scale down and reduce your costs. Unlike on Premises when you have to have maximum hardware requirements just in case.

Pay as you Consume

You only pay for what you use

Abstract Resources

You can focus on your business needs and not on the hardware specs (Networking, physical servers, patching etc)

Measurable

Every unit of usage is managed and measurable.

What is DevOps?

A set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, also ensuring high quality

Testing, Reviews, Moving to production. This is the place where developers and the Operations team meet and work together

Pre DevOps

If we dont work within a DevOps Framework. What do we do?

Developers will build their apps, etc and finally add it into Source Control

Source Control or Version Control allows you to track and manage code changes. Source Control Management Systems provide a history of development. They can help resolve conflicts when merging code from different sources

Once in Source Code the Testing team can take the source code and create their own builds to do testing

This will then be pushed back to the development team and will go back and forwards until everyone is happy. Here we can see that the environments we are using are Dev and Test

Once Complete, it is released into Production.

This is a very siloed approach. Everyone is working separately and things can take time and you will get bottlenecks

The DevOpsApproach

Everyone works together. You become a team and time to market becomes faster. Developers and Operations are working as a single team

DevOps Tools

Each stage uses specific tools from a variety of providers and here are a few examples

  • Code – Eclipse, Visual Studio, Team Foundation Services, Jira, Git
  • Build – Maven, Gradle, Apache Ant
  • Test – JUnit, Selenium
  • Release – Jenkins, Bamboo
  • Deploy – Puppet, Chef, Ansible, SaltStack
  • Monitor -New Relic, SENSU, Splunk, Nagios

We need all these tools to work together so we don’t need to do any manual intervention. This means that you can choose the ones that you have experience in.

Components of Azure DevOps

Azure Boards

People with a Scrum and Project Management background will know how to create the features within the Boards. Epics, Stories, Tasks etc

Developers create and work on tasks. Bugs can be logged here by the testers

Azure Repos

Push the development into Source Control to store your information. Check in your code within Azure Repos.

There are lots of repositories to choose from in Repos to suit your needs like GIT or TFS

Azure Pipelines

Developers build code and that code need to get to the Repos via a Pipeline. The code is built within the Pipeline.

The code is then released into Dev, Test, Prod, Q&A etc, And from, say the test or Dev environments we can……..

Azure Test Plans

Test, using Azure Test plans. For example, if you have deployed a web service, you want to make sure its behaving correctly. Once tested the code will go back to the pipeline to be built and pushed to another environment

Azure Artifacts

Collect dependencies and put them into Azure Artifacts

What are dependencies?

Dependencies are logical relationships between activities or tasks that means that the completion of one task is reliant on another.

Azure Boards

Work Items

The artifact that is used to track work on the Azure board.

  • Bug
  • Epic
  • Feature
  • Issue
  • Task
  • Test Case
  • User Story

So you create work items here and interact with them on the board

Boards

Work with Epics, Features, Tasks, Bugs etc.

Includes support for Scrum (agile process framework for managing complex work with an emphasis on software development) and Kanban (a method for managing and improving work across human systems. Balances demands with capacity)

Backlogs

How do you prioritise your work items?

Sprints

Say your Sprint is 20 days (2 weeks) What work can be accomplished within this sprint?

Dashboards

Overall picture of the particular sprint or release

Repos

We can use GIT or Team Foundation Server TFS. The example uses GIT

  • Files
  • Commits
  • Pushes
  • Branches
  • Tags
  • Pull Requests

You create your own branch from the master branch. Do your testing and changes and push back from your branch to the master branch.

Pipelines

Where is your Code? Its in GiT

Select the GiT source like Azure Repos GIT or GiTHub etc

Get the code from the master branch

How do you want to build the project?

Choose from lots of templates, Azure Web App, ASP.net , Mavern, Ant, ASP.NET, ASP.NET with containers, C# function, Python package, Andriod etc

Next provide the Solution path and the Azure Subscription that you want to deploy to

This takes the source code from the GiT repository and builds the source code.

The build will then give you logs to show you how the build of the project happened

Next time when you check in code, it will automatically trigger the pipeline to build the code

Then the build needs to be Released via a Release Pipeline

This is where you release to the correct Azure Subscription and the code will be deployed. You can also add in approvals to ensure you get the pre approval required to release the code.

Conclusion

This is just a whistle stop tour of Dev Ops. Test Plans and Artifacts haven’t been discussed in much detail but it gives you the basics of what is included in DevOps and how you can start to think about using it.

What do you create in Azure and can it be handled within DevOps?

Can we start using the Boards?

How do we can started with Azure Devops?

Which teams members have the right interests in the specific DevOps areas?

Design a site like this with WordPress.com
Get started