Salesforce Data Cloud Data Transforms and Data Quality — Complete Guide 2026 | Module 05
Data Transforms & Data Quality
Complete Guide 2026
Master data cleaning, enrichment and transformation in Salesforce Data Cloud — the layer that determines whether your Unified Profiles are trustworthy or garbage
- What Are Data Transforms?
- Why Data Quality Is the Foundation of Everything
- The 4 Types of Data Transforms
- SQL Transforms — Every Function You Need
- Real Data Cleaning Examples with SQL
- How to Set Up a Data Transform Step by Step
- Data Quality Dimensions — What Makes Data Good
- Real-World Data Quality Scenarios
- Common Data Transform Mistakes
- Quick Quiz
- Interview Questions for This Module
Data Transforms are SQL-based operations that clean, enrich and reshape DLO data before it lands in a DMO. They sit between the raw DLO layer and the harmonized DMO layer — acting as the quality control checkpoint for all incoming data.
Without Data Transforms, whatever arrives in your DLO goes straight into your DMO — raw, unformatted and potentially full of errors. Phone numbers with dashes, hyphens and country codes in 10 different formats. Email addresses in mixed case. Names with leading and trailing spaces. Status codes like 1, 2, 3 instead of Active, Inactive, Pending. All of this flows into Identity Resolution and produces wrong matches or missed matches.
Data Transforms fix this before it causes problems downstream. They are the difference between Identity Resolution that works correctly and Identity Resolution that silently creates wrong customer profiles.
Data Transforms Are Like Airport Security Screening
Imagine an airport where passengers arrive from dozens of different countries — each with different passport formats, different ID types, different languages on their documents. If you just let everyone through without checking or standardizing, chaos follows.
Airport security is the Data Transform layer. It takes everyone coming in — regardless of their origin — and processes them through a consistent, standardized check. Documents are verified. Formats are normalized. Suspicious items are removed.
Only after passing through this standardized process do passengers continue to their destination — just like only after passing through Data Transforms does your data continue to the DMO, Identity Resolution and segmentation.
Data quality is not a nice-to-have in Data Cloud. It is the single most important factor that determines whether every downstream feature works correctly. Identity Resolution, Calculated Insights, Segmentation, Activation, Agentforce personalization — every single one depends on the quality of data flowing through Data Transforms.
Consider what happens when data quality is poor. If email addresses arrive in mixed case — John@Gmail.com from CRM and john@gmail.com from Marketing Cloud — a deterministic email match rule sees these as different values and fails to merge the same customer into one profile. That customer now exists as two separate Unified Profiles. Every Calculated Insight computed for them is wrong. Every segment they belong to may be wrong. Every Agentforce response about them is based on incomplete data.
The root cause of most failed Data Cloud implementations is not technical complexity — it is poor data quality that was never addressed before implementation began. Data Transforms are the tool Salesforce provides to address this — but they only work if you take data quality seriously from the start.
Data Cloud Transforms use ANSI-compatible SQL — similar to standard SQL with some Data Cloud-specific syntax. You write SELECT statements that read from DLO tables and output transformed data to DMOs. Every function below is used in real Data Cloud implementations.
| Function | What It Does | Example |
|---|---|---|
| LOWER() | Converts text to lowercase | LOWER(email) converts John@Gmail.com to john@gmail.com |
| UPPER() | Converts text to uppercase | UPPER(country_code) normalizes us, US, Us to US |
| TRIM() | Removes leading and trailing spaces | TRIM(first_name) removes " John " to "John" |
| REGEXP_REPLACE() | Replace pattern matches with a string | REGEXP_REPLACE(phone, '[^0-9]', '') strips non-numeric |
| CASE WHEN | Conditional value mapping | CASE WHEN status=1 THEN 'Active' ELSE 'Inactive' END |
| COALESCE() | Returns first non-null value | COALESCE(mobile, phone, work_phone) finds best phone |
| CONCAT() | Joins strings together | CONCAT(first_name, ' ', last_name) builds full name |
| DATEDIFF() | Calculates difference between dates | DATEDIFF(day, birth_date, CURRENT_DATE) computes age |
| DATE_TRUNC() | Truncates date to specified precision | DATE_TRUNC('month', order_date) groups by month |
| SPLIT_PART() | Extracts part of a delimited string | SPLIT_PART(email, '@', 2) extracts domain from email |
| CAST() | Converts data type | CAST(revenue AS DECIMAL(18,2)) converts string to number |
| NULLIF() | Returns null if two values are equal | NULLIF(email, '') converts empty string to null |
| UNION ALL | Combines rows from multiple queries | Combines contacts from CRM and Marketing Cloud DLOs |
| WHERE NOT LIKE | Filters rows matching a pattern | WHERE email NOT LIKE '%test%' removes test records |
| IS NOT NULL | Filters out null records | WHERE email IS NOT NULL ensures email always present |
Navigate to Data Transforms in Data Cloud Setup
From the Data Cloud app, go to Setup → Data Transforms → New. This opens the Data Transform editor where you write your SQL transformation logic.
Name your Data Transform
Give the transform a descriptive name following your naming convention. Use names that clearly describe what the transform does — for example Contact_Email_Normalization or Order_Status_Code_Mapping. You will manage many transforms over time so clear naming is essential.
Write your SQL transformation logic
Write your SELECT statement in the editor. Reference your DLO tables by their Data Cloud API names. Use the SQL functions covered in this module to clean, enrich and reshape the data. Always include the Individual ID field in your SELECT to ensure records link to Unified Profiles.
Preview and validate the output
Use the Preview button to run the transform against a sample of data and see the output before saving. Verify field values look correct. Check that null handling works as expected. Confirm that filtering is removing the right records. Fix any SQL errors before activating.
Map transform output to DMO
After saving the transform, create or update the field mapping for this Data Stream — mapping the transformed output fields to the target DMO fields. The transform output becomes the source for the DMO mapping instead of the raw DLO. This ensures only clean, transformed data reaches the DMO.
Set the refresh schedule
Configure when the Data Transform runs. Typically it should run immediately after the Data Stream refresh — so incoming data is transformed and available in the DMO within the same processing window. For streaming data, transforms run continuously as events arrive.
Activate and monitor
Activate the Data Transform. Monitor the first run by checking the transform job history for errors. Verify the DMO is populated with correctly transformed data using the Data Cloud Explorer. Check that Identity Resolution now matches correctly on the transformed fields.
| Quality Dimension | What It Means | Common Problem | Transform Fix |
|---|---|---|---|
| Completeness | All required fields have values | Email or Individual ID is null for many records | WHERE email IS NOT NULL and COALESCE for optional fields |
| Consistency | Same data looks the same across sources | Email mixed case, phone 10 different formats | LOWER for email, REGEXP_REPLACE for phone |
| Accuracy | Data values are correct and real | Test emails, bot records, placeholder names like Test User | WHERE filters excluding known bad patterns |
| Uniqueness | No duplicate records for same entity | Same customer submitted twice with slightly different data | Primary Key deduplication and Identity Resolution |
| Timeliness | Data is fresh and up to date | Stale customer data not synced in months | Appropriate Data Stream refresh schedule |
| Validity | Data conforms to expected format | Phone field contains letters, email missing @ symbol | REGEXP validation and LENGTH checks in WHERE clause |
Before writing a single Data Transform, run a data quality audit on every source system. Profile the data — what percentage of email fields are null? How many formats does the phone field have? Are there test or bot records? What are the min and max values for key fields? This audit tells you exactly which transforms you need and saves enormous time later.
🛒 Retail Company — The Shared Family Email Problem
A large retailer discovered that family@smith.com was the registered email for 47 different customer accounts — presumably a shared family email address. Without a transform, Identity Resolution merged all 47 accounts into one Unified Profile — creating a fictional customer with the purchase history of 47 different people and a wildly inflated LTV. The fix was a Data Transform that identified email addresses appearing more than 3 times in the DLO and flagged them as shared — excluding them from email-based Identity Resolution matching. Phone-based matching continued for these records.
🏢 B2B SaaS Company — The Test Account Contamination
A SaaS company had 15,000 records in their CRM from internal testing, demo accounts and QA environments. These records — with emails like test@company.com, demo@client.com, qa_user@testing.com — were being ingested into Data Cloud and creating false Unified Profiles that contaminated segments and inflated engagement metrics. A single Data Transform with WHERE email NOT LIKE any of 12 identified test patterns eliminated all 15,000 test records from the DMO permanently. Segment sizes dropped by 8% but accuracy improved dramatically.
🏥 Financial Services — The Currency Conversion Problem
A bank operating in 15 countries had transaction amounts stored in local currencies in their ERP — USD, GBP, EUR, INR, AED. Calculated Insights trying to compute LTV were summing raw amounts and getting meaningless numbers — comparing rupees and dollars directly. A Data Transform using exchange rate lookup and CASE WHEN converted all transaction amounts to USD equivalent before they landed in the Sales Order DMO. LTV calculations immediately became accurate and comparable across all regions.
📱 Mobile App Company — The Null Individual ID Crisis
A company ingesting mobile app behavioral events discovered that 40% of their Web Cart DMO records had null Individual IDs. Anonymous users who had not logged in generated cart events with no customer identifier — so the data landed in the DMO but was completely orphaned from any Unified Profile. A Data Transform was added to filter out all records where individual_id IS NULL and a separate anonymous profile strategy was implemented using probabilistic matching on device fingerprint for known returning users. Abandoned cart trigger accuracy improved from 60% to 94%.
Mistake 1: Skipping the data quality audit before building transforms
Teams build transforms based on assumptions about their data rather than profiling actual data first. They normalize email case assuming all emails are in mixed format — but the source actually has a different problem entirely. Always run a data profiling exercise before writing transforms. Count nulls, list unique values, identify format variations, find bot and test records. The audit tells you exactly what transforms are needed.
Mistake 2: Not handling null values explicitly
Writing transforms that assume all fields have values. LOWER(email) returns null if email is null — it does not throw an error. But that null email in the DMO will fail to match anything in Identity Resolution. Always add IS NOT NULL checks for fields used in Identity Resolution and use COALESCE for optional fields to provide a fallback value. Null handling is the most overlooked aspect of transform design.
Mistake 3: Building transforms without testing on real data samples
Writing the SQL in the editor, saving it, and assuming it works correctly without previewing on real data. Edge cases only appear with real data — a phone field containing letters from an old data entry error, an email with two @ symbols from a legacy system bug, a null birth date that causes DATEDIFF to fail. Always use the Preview function to validate transform output against at least 1,000 real records before activating.
Mistake 4: Over-engineering transforms to solve source system problems
Using complex transforms to compensate for fundamental source data quality problems that should be fixed at the source. A transform that tries to guess a customer's name from their email address because the name field is 90% null is not a transform — it is a workaround for a source system data collection failure. Data Transforms should normalize and clean data — not fabricate it. Work with source system teams to fix root cause data quality issues.
Mistake 5: Not monitoring transforms after go-live for failures
Source systems change. A field that was always text suddenly starts receiving numbers. A status code that was always 1 or 2 now includes a new code 5 that your CASE WHEN has no mapping for. It maps to ELSE Unknown — silently categorizing a new customer segment incorrectly for months. Set up monitoring alerts for transform failures and build a quarterly transform review to verify outputs still match business expectations as source systems evolve.
Q1: B — LOWER(email) | Q2: C — REGEXP_REPLACE | Q3: B — UNION ALL | Q4: B — WHERE individual_id IS NOT NULL | Q5: D — Uniqueness
Data Transforms are SQL-based operations that clean, enrich and reshape DLO data before it lands in a DMO. They are the quality control layer between raw source data and the harmonized data model. They are necessary because real-world source data is never clean or consistent. Email addresses arrive in mixed case from different systems — making deterministic matching fail. Phone numbers come in 10 different formats from different countries. Status codes are stored as numbers in one system and text labels in another. Test accounts and bot records contaminate segments. Data Transforms fix all of these problems before they reach Identity Resolution and segmentation, where poor data quality causes silent failures that are difficult to trace back to their root cause.
I would investigate in a specific sequence. First I query the DLO directly to see the raw incoming values for the fields used in match rules — typically email and phone. If I see inconsistent formatting — mixed case emails, phone numbers with different formats — I know the transform is either missing or not working correctly. Next I check the Data Transform configuration for this Data Stream — is a transform configured at all? If yes, I preview the transform output to see what values are actually reaching the DMO. I compare DLO values to DMO values to verify the transform is being applied. If email in the DLO is John@Gmail.com and email in the DMO is also John@Gmail.com instead of john@gmail.com — the LOWER function is either missing or the transform is not running. I check the transform job history for errors and verify the schedule is correct. Once the transform is fixed and run, I re-run Identity Resolution and check whether previously unmatched records now merge correctly.
This is the shared email address problem — common with family email accounts, company-wide addresses or personal assistants managing executive accounts. The solution is a multi-step approach in Data Transforms. First I identify shared email addresses by counting how many unique Individual IDs map to the same email address in the DLO. Any email appearing more than two or three times is likely shared. Second I add a transform that flags these emails as shared — adding a boolean is_shared_email field. Third I modify the Identity Resolution match rules to exclude shared emails from deterministic matching — so these addresses do not cause false profile merges. Records with shared emails fall through to probabilistic matching using name plus city or other signals instead. In extreme cases I maintain an exclusion list of known shared domain patterns like company-wide distribution lists.
A production-ready transform for CRM Contact identity matching would include five key elements. Email normalization using LOWER and TRIM to ensure case-consistent matching. Phone normalization using REGEXP_REPLACE to strip all non-numeric characters so formats match across sources. Null filtering using WHERE to exclude records missing both email and phone — records with no contact point cannot be matched by Identity Resolution anyway. Test record filtering using WHERE NOT LIKE patterns to exclude bot and test emails like test%, example%, noreply% and similar patterns. Finally Individual ID must be selected — typically the Salesforce Contact ID — to link each record to the Unified Profile. The SELECT also includes TRIM for first and last name to handle whitespace, and NULLIF to convert empty strings to proper nulls for fields that Identity Resolution uses as secondary signals.
Multi-currency LTV is a classic Data Transform challenge. The approach depends on whether exchange rates need to be historical or current. For current rates, the simplest approach is a CASE WHEN statement mapping each currency code to its current approximate exchange rate to USD — multiplying the transaction amount by the rate to produce a USD equivalent field. This USD equivalent field is what the Sales Order DMO stores and what the LTV Calculated Insight sums. For historical accuracy — more important for financial services — the transform joins the transaction DLO against a currency exchange rate reference table using the transaction date and currency code to get the exact rate applicable on that date. The transform then computes transaction_amount multiplied by historical_rate as the USD equivalent. This reference table itself is ingested into Data Cloud as a separate Data Stream from a currency rate provider, refreshed daily. The key principle is that currency conversion happens in the Transform layer before the DMO — so every Calculated Insight works in a single consistent currency without knowing the original source currency.