Previous to the brand new Azure Data Lake, I was adding all the files into Blob Storage. However Azure Data Lake V2 is built on Blob storage and DataLake V1
its built for big data and a fundamental change is that we now have a hierarchical namespace. This organises your files into directories.
So now, we can do things like use all files from a specific Directory, delete all files from a specific directory. We can categorise our files within the data lake.
Set up Azure Data Lake V2 in Azure Portal
When you go into Azure Currently and look for Data Lake V2 you can only Find Gen 1
So the question is, how do you set up Gen 2 in Azure Portal? (Currently we are on the 25th November 2019. Hopefully this may get easier in the future)
First of all I go to the Subscription where I want to add the new data Lake v2
Open up the Portal menu (Now hidden to the left of the screen)
Choose Create a resource
next Choose Storage and Storage Account
Note that the Account kind is Storage V2 (General Purpose)
Ive set the Location to North Europe, Simply because I know thats where our Power BI Data in Services is stored and I may as well stick with this.
For the time being, I am leaving everything else as standard
Next go to Advanced
the most important setting here is the Data Lake Storage Gen 2 . Enable the Hierarchical namespace and your storage account will now be created as data Lake Storage V2
Click Review and Create
Create a file System within a Container
We now have a few options available to us. I have some files to add so I am going to add them to a container
Click on Containers And then + File System
Click OK
clicking on your new storage account(s) you are told to Download Azure Storage Explorer.
I already have this Azure Storage Explorer downloaded. If you don’t have this, its something you will absolutely need to work with Azure Storage accounts.
Once downloaded Open Azure Storage Explorer
You will need to Add in your Azure Storage Accounts by clicking the little connector icon
You will be asked to sign into your Account with your Office 365 credentials and 2fa authentication
This will log you into all your Subscriptions and Services
You are good to go
Here you find your subscription, Then go to the Data Lake Storage Gen 2 and find the new File system.
I have added a folder here called Workshop1Files to my File System
Obviously Data Lake Storage gives you so many ways of working with files and automating the files to the storage area. In this case I am going to simply move a file into my new folder to work with
Double click on the folder and then Click Upload and Upload Files
And now your file is in the cloud, in an Azure Data Lake ready to use.
Connect to your Azure File with Power BI Desktop
The first test is can we access this data within Power BI Desktop.
Open Power BI Desktop and Get Data
Choose Azure Data Lake Storage Gen2 (Currently in Beta)
Add the URL
Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder>
Data Lake Storage Gen2 have the following pattern https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder>
If you go to Right click on the file in Storage Explorer and go to properties, there is a difference in structure
http://<accountname>.blob.core.windows.net/<filesystemname>/<subfolder>
If you try to connect with the original URL from Data Storage you get the following error
And if you change the URL from blob to dfs
There is a missing part to the puzzle. Go back to the Azure Data Lake Storage Account in Azure and Add Storage Blob Data Reader to your account
Then try again and hopefully you are in .
No need to combine because we have specified the file.
There are different ways you can load the file. I loaded one file but you can load all files in the File System
https://storageaccount.dfs.core.windows.net/filesystemname
or all files under a directory in the file system (You can include sub directories in this)
https://storageaccount.dfs.core.windows.net/filesystemname/directoryname/directoryname
Connect to your Azure File with Power BI Data Flow
I am creating data flows in the power BI Service to ensure they can be reused across the company. The question is, Can I Connect to the above File in Service via a data flow
In Power BI Service, add a Data Flow which takes you into Power BI Query Editor in the Service. I already had some data flows connected to an Azure database.
The data is in Azure Data Lake Storage so the first think I do is try the Azure route
However, there is no Azure Data Lake Storage Gen 2. This must be something coming in the future. so then I go to File and click on Get Data text / csv
You will need to add the File Path and your Credentials (As per previous advice use dfs not blob in the URL. this seems a little flaky at the moment. I choose Organisational Account first before adding the URL and then it seems to work.
Remember Go back to Azure Storage Explorer. if you click on properties, you can grab the URL from here
We don’t need a Gateway Setting up because everything is now in the cloud.
Clicking next, Nothing happens, it just keeps bouncing back to the same window.
Attempting to use the Blob Storage connector also doesn’t work (Using the Azure Account Key as authentication).
It would appear that currently I have hit a brick wall and there is no current DLGen2 connector for Data Flows.
I will be keeping an eye open on this because obviously, when you are pushing the new generation of Data Lakes and Data Flows then there needs to be a DLGen2 Connector for Data Flows.
Update
Had a reply back on the Power BI Forum (Not a good one)
The feature haven’t been planed. If there is any new message, the document: What’s new and planned for Common Data Model and data integration will be updated.
I have found this in Ideas
Please help us get this working by voting for this idea.
2 Replies to “Setting up an Azure Data Lake V2 to use with power BI dataflows in Service (As a data source)”