Implementing Row Condensing in BimlFlex
Written by Peter Avenant on 2.20.2025
Efficient data processing is crucial in data warehousing, especially when managing historical data in Data Vault Satellites. Row condensing ensures that only meaningful changes are captured, eliminating redundant records. This guide details how to implement advanced row condensing during both the staging and Data Vault loading processes using BimlFlex.
Understanding Row Condensing
Row condensing involves processing data deltas to ensure that only genuine changes are loaded into historized tables. This process prevents the storage of unnecessary records, optimizing storage and query performance.
Consider the following scenario:
| Integration Key | RowEffectiveFromDate | City | Country |
|---|---|---|---|
| awlt~540 | 2025-02-19 | Toronto | Canada |
| awlt~540 | 2025-02-20 | Montreal | Canada |
| awlt~540 | 2025-02-21 | Toronto | Canada |
In this example, the attribute City changes from 'Toronto' to 'Montreal' and back to 'Toronto'. A simple SELECT DISTINCT would not suffice to identify these changes accurately, as it would miss the reversion to 'Toronto'. Advanced row condensing logic is required to capture such 'A-B-A' changes effectively.
Implementing Row Condensing in BimlFlex
BimlFlex provides a robust framework to implement advanced row condensing through its metadata-driven approach. By configuring specific Settings and utilizing BimlFlex's templating system, you can automate the detection and processing of true data changes. BimlFlex provide you with an option Delta Collapse Rows to configure this for both the Persistent Staging and Satellite loads.
Sample Implementation
Below is a sample SQL implementation demonstrating how BimlFlex handles row condensing during the staging process:
SELECT
STG.[Address_SK],
STG.[Address_BK] AS [Integration Key],
STG.[AddressID],
STG.[AddressLine1],
STG.[AddressLine2],
STG.[City],
STG.[StateProvince],
STG.[CountryRegion],
STG.[PostalCode],
STG.[rowguid],
STG.[ModifiedDate],
STG.[RowEffectiveFromDate],
STG.[RowAuditId],
STG.[RowRecordSource],
STG.[RowHash]
INTO
[#TMP_AWLT_01_Sales_01_Address_STAGE]
FROM
(
SELECT
SRC.[Address_SK],
SRC.[Address_BK] AS [Integration Key],
SRC.[AddressID],
SRC.[AddressLine1],
SRC.[AddressLine2],
SRC.[City],
SRC.[StateProvince],
SRC.[CountryRegion],
SRC.[PostalCode],
SRC.[rowguid],
SRC.[ModifiedDate],
SRC.[RowEffectiveFromDate],
SRC.[RowAuditId],
SRC.[RowRecordSource],
SRC.[RowHash],
LEAD(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LEAD_DATE_DIFF],
LAG(SRC.[RowEffectiveFromDate]) OVER (PARTITION BY SRC.[AddressID], SRC.[RowHash] ORDER BY SRC.[RowEffectiveFromDate]) AS [LAG_DATE_DIFF],
COALESCE(LAG(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LAG_HASH_DIFF],
COALESCE(LEAD(SRC.[RowHash]) OVER (PARTITION BY SRC.[AddressID] ORDER BY SRC.[RowEffectiveFromDate]), SRC.[RowHash]) AS [LEAD_HASH_DIFF]
FROM
[#TMP_AWLT_01_Sales_01_Address_PRE_STAGE] AS SRC
LEFT OUTER JOIN
[#TMP_AWLT_01_Sales_01_Address_CURRENT] AS TGT
ON SRC.[Address_BK] = TGT.[Address_BK]
AND SRC.[RowEffectiveFromDate] <= TGT.[RowEffectiveFromDate]
WHERE
TGT.[Address_BK] IS NULL
) AS STG
WHERE
STG.[LAG_DATE_DIFF] IS NULL
OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_DATE_DIFF] IS NOT NULL)
OR (STG.[RowHash] <> STG.[LAG_HASH_DIFF] AND STG.[LAG_HASH_DIFF] = STG.[LEAD_HASH_DIFF]);
In this script:
- Window Functions:
LEADandLAGfunctions are used to compare the current row'sRowHashwith the previous and next rows, based on theRowEffectiveFromDate. - Change Detection: Rows are filtered to include only those where a genuine change is detected by comparing
RowHashvalues. - Eliminating Redundancy: The
WHEREclause ensures that only rows representing actual changes are processed further.
By incorporating this logic into your BimlFlex framework, you can automate the row condensing process, ensuring efficient and accurate data loading into your Persistent Staging and Data Vault.
Conclusion
Implementing advanced row condensing in BimlFlex streamlines your data integration processes by capturing only meaningful changes. Leveraging BimlFlex's metadata-driven approach and configurable templates, you can ensure your Data Vault remains efficient, accurate, and free from redundant data.
For more detailed information on configuring these attributes in BimlFlex, refer to the BimlFlex Settings Documentation.
Share