Microsoft Fabric Updates Blog

Construct a data analytics workflow with a Fabric Data Factory data pipeline

Microsoft Fabric Data Factory provides an easy way to build low-code data integration and ETL projects for building cloud-scale data analytics. Today, I want to focus on data pipelines in Data Factory and the advantages you’ll find by using pipelines to orchestrate your Fabric data analytics projects and activities.

What is a data pipeline?

For Azure Data Factory and Azure Synapse users, data pipelines will be very familiar as we’ve had data pipelines in those products for many years. Now that Data Factory and data pipelines are available in the SaaS orientation of Fabric, you will find the experience to be nearly identical. However, if you are primarily a Power BI or Power Platform user, you may not have experience with data pipelines. So, today, I’d like to take a few minutes to explain what a data pipeline is.

In the context of Fabric data analytics, you will use a data pipeline to build automated workflows that combine the different artifacts in your workspace that you’ve created as a way to build your analytics. As an example, in the screenshot below, you can see that I’ve built a pipeline that performs the following tasks:

  1. Find files in a storage folder
  2. Iterate over the files found
  3. Copy each file contents to the bronze layer in my Lakehouse
  4. After the data has been loaded to bronze, run a Spark Notebook to transform the data and load it into the silver layer
  5. If the Notebook was successful, send an email to the team and continue
  6. If the Notebook failed, notify the team via a Teams channel and then fail the pipeline
  7. Execute a Dataflow to combine and clean data, preparing for gold layer
  8. Finally, issue a Copy command to load the cleaned data into the gold layer for reporting

Why would you use a data pipeline?

I created that pipeline design entirely in the web UI in Fabric without writing any code. Now I can set a schedule to automate the execution of my logic on a regular cadence from the designer UI when I click on the Schedule button. The frequency with which you update your Lakehouse will depend upon the business requirements and the frequency with which new data arrives at your sources.

Separately, inside of Fabric, I can create and manage those artifacts that I just orchestrated above. My Notebook is created and tested in the Data Engineering app, while I used the Data Factory app to create a Dataflow. So now I use Data Factory data pipelines in Fabric to bring them all together into a single cohesive logical “pipeline”. In other words, I just created an end-to-end workflow that I can run on a schedule, fully automated and additionally … now I can use the central Monitoring Hub feature in Fabric to watch the execution of my pipelines, Notebooks, Dataflows, etc. all from a single pane of glass:

So as you build your analytics project in Fabric, you’ll use data pipelines to piece those artifacts together into an automated workflow to keep your Lakehouse (and subsequently, your business reporting users) updated, refreshed, and cleaned.

How to get started

I hope that this gives you a sense of the value that data pipelines from the Data Factory app inside of Microsoft Fabric can bring to your data analytics projects. To get started, switch over to Data Factory in Fabric and choose New > Data Pipeline. You’ll land on the page in the below screenshot when you can being adding activities to the low-code design surface and begin building your own workflows!

Other resources

  • Join the Fabric community to post your questions, share your feedback, and learn from others.
  • Visit Microsoft Fabric Ideas to submit feedback and suggestions for improvements and vote on your peers’ ideas!
  • Check our Known Issues page for up to date on product fixes!

Have any questions or feedback? Leave a comment below!

Related blog posts

Construct a data analytics workflow with a Fabric Data Factory data pipeline

September 26, 2024 by Ye Xu

Fast Copy in Dataflow Gen2 is now General Available! This powerful feature enables rapid and efficient ingestion of large data volumes, leveraging the same robust backend as the Copy Activity in Data pipelines. With Fast Copy, you can experience significantly shorter data processing times and improved cost efficiency for your Dataflow Gen2. Additionally, it boosts … Continue reading “Announcing the General Availability of Fast Copy in Dataflows Gen2”

September 26, 2024 by Leo Li

Fabric Data Pipeline support in the On-Premises Data Gateway is now generally available! The on-premises data gateway allows you to seamlessly bring on-premises data to Microsoft Fabric. With data pipelines and the on-premises data gateway, you can perform high-scale data ingestion of your on-premises data into Fabric. Enhancements Over Self-Hosted Integration Runtime in Azure Data … Continue reading “Announcing the General Availability of Fabric Data Pipeline Support in the On-Premises Data Gateway”