Microsoft Fabric Updates Blog

Ensuring Data Continuity in Fabric Warehouse: Best Practices for Every Scenario

In the era of Generative Artificial Intelligence, data warehouses play a pivotal role for organizations to store, manage, and analyze massive amounts of data. They serve as the foundation for business intelligence and help make informed decisions. However, like any technology, data warehouses are not immune to failures – whether from hardware malfunctions, software bugs, accidental human errors, or malicious attacks. Effective data recovery strategies are crucial to minimizing disruption and ensuring business continuity.

Data Warehouse in Microsoft Fabric offers various tools and techniques to handle data recovery scenarios. In this blog, we will dive deep into the common recovery scenarios and features that help enable seamless end-to-end data recovery and discuss best practices to ensure data resilience.

  1. Accidental Data Corruption

Accidental data corruption in a data warehouse can result from various factors, compromising the accuracy and reliability of stored data. Common causes include human errors such as incorrect data entry, errors in data transformation processes, or the unintentional overwriting of records during updates or migrations. Hardware issues like disk failures or power outages can also lead to incomplete or corrupted data writes. Additionally, software bugs or problems with ETL (Extract, Transform, Load) workflows may introduce inconsistencies while transferring or transforming data between the source and target. Without proper error handling, validation mechanisms, and regular backups, these incidents can cause significant data quality problems within the warehouse.

Scenario: An organization executes a daily Extract, Transform and Load (ETL) job to update critical production tables with new data from multiple sources. However, a recent release introduced an error in the ETL logic, resulting in inaccurate data being loaded into a few tables. The issue was identified only when the reports and dashboards began displaying inaccurate or corrupted data.

Challenge: The organization needs to quickly restore the table to its pre-ETL state without disrupting any of the ongoing operations or losing valuable data, while also identifying the point-in-time on when and how the corruption occurred and impacted tables.


Methodology of Recovery:

Determining the specific point-in-time when the issue occurred

Time travel at the T-SQL Statement level in Fabric warehouse offers the ability to query the past versions of the data up to a retention period of 30 calendar days. So, queries can be submitted as of various prior points-in-time to determine the specific point-in-time when the issue first occurred.

Recovery Methods

Clone of table:

Once the point-in-time of the issue is determined, the impacted incorrect tables can be cloned to a previous stable point-in-time. This brings the table back to the state just before the problem arose, almost instantaneously.

If multiple tables are incorrect, they can be selected together through the user interface of fabric portal and cloned near instantaneously, with minimal downtime.

Due to the nature of independent existence that table clone offers, once the tables are cloned as of a prior stable point-in-time, the incorrect tables can be dropped or modified based on the business scenario.

Note: As a best practice, before deploying any code changes, it is recommended to create a low-cost, zero-copy clone of the tables involved in the scope of ETL logic update as of current point-in-time. This ensures that the clone can be used for immediate recovery if needed.

Restore in place of warehouse:

If multiple tables are corrupted and there is a need to restore the entire warehouse to a prior known stable point-in-time that upholds data integrity restore in-place can be used. The system generated restore points that are created every 8 hours or user-defined restore points can be leveraged (whichever is closer to the prior known stable state) can be used to restore the entire Fabric warehouse.

Once the warehouse is restored as of a prior stable state, the relevant ETL jobs can be run to reload any data that was added after the prior known stable state into the warehouse, ensuring data consistency moving forward.

2. Accidental Drops

Scenario 1: An important production table is accidentally dropped by a user executing a DROP TABLE command. This action removes the table and its data potentially disrupting business processes and reporting.

Challenge: The organization needs to quickly restore the table as it will be losing valuable data.


Recovery Methods

Clone of table:

Clone is a size of metadata operation where only the metadata is copied, while the actual data files are referenced from One Lake. As table clones have independent existence, deleting the original source of the clone does not impact on the clone- it remains fully operational and available for querying. Similarly, deleting the clone leaves the original source table intact and can be queried.

At the table level of granularity, tables can be cloned as of current point in time as well as prior point in time. As a recommended best practice, table clones can be created periodically so that they can be leveraged as a recovery mechanism in scenarios where the table is accidentally dropped. If a table is dropped, the cloned table can be renamed to the dropped table name and the corresponding ETL job can be run to load the data to uphold data consistency.

Restore in place of warehouse:

In scenarios where the table clone is not created, restore in place of the warehouse can be leveraged to recover the dropped tables. Determine the point-in-time the table was dropped and identify the closest restore point before the drop when the table existed in the fabric portal. Utilize this restore point to perform restore in place following which the ETL jobs can be rerun to ensure the data continuity.

Scenario 2: An important Fabric workspace containing the sales transaction details is accidentally dropped. This action is potentially disrupting the business process and reporting significantly as all the critical sales data is in the warehouse within the workspace.

Challenge: The organization needs to quickly restore the workspace as it will be losing valuable data and multiple business processes are impacted.

Recovery Method:

Workspace Restores:

When a workspace is dropped, it isn’t deleted immediately; instead, it enters a soft-deleted state during which the dropped retention period begins. Dropped retention refers to the time during which the dropped workspace remains recoverable, until the retention period expires.

For personal workspaces (My Workspace), the retention period is 30 days. For collaborative workspaces, the default dropped retention period is seven calendar days, but it can be configurable to last between 7 to 90 days by the Fabric administrators.

So, if the workspace is dropped accidentally and is well within the dropped retention period, it can be recovered almost nearly instantaneously from the admin portal by Fabric Administrators.

Conclusion:

Data recovery within Fabric warehouse plays a vital role in preserving data integrity and ensuring seamless business continuity. By implementing robust recovery strategies, organizations can significantly minimize the risk of downtime and data loss. With reliable features like Time travel, zero-copy cloning and restore in place businesses can achieve quick recovery with minimal disruption. In today’s data driven world, investing in a reliable data recovery mechanism is more than a technical requirement, it’s a strategic business necessity. Come embrace the data recovery capabilities that Fabric warehouse provides to safeguard your data, ensuring the warehouse remains secure, resilient, and trusted source of truth.

Related blog posts

Ensuring Data Continuity in Fabric Warehouse: Best Practices for Every Scenario

November 4, 2024 by Salil Kanade

AI is transforming data warehousing, making complex analytics more accessible and efficient. With tools like Copilot for Data Warehouse and AI Skill, Microsoft Fabric offers two powerful, complementary resources that serve both data developers and business users. This blog explores how these tools differ, when to use each, and how they can work together to … Continue reading “Data Warehouse: Copilot & AI Skill”

October 31, 2024 by Jovan Popovic

Fabric Data Warehouse is a modern data warehouse optimized for analytical data models, primarily focused on the smaller numeric, datetime, and string types that are suitable for analytics. For the textual data, Fabric DW supports the VARCHAR type that can store up to 8KB of text, which is suitable for most of the textual values … Continue reading “Announcing public preview of VARCHAR(MAX) and VARBINARY(MAX) types in Fabric Data Warehouse”