Salesforce Data Cloud Data Streams and Connectors — Complete Guide 2026 | Module 03

Salesforce Data Cloud Data Streams and Connectors — Complete Guide 2026 | Module 03
☁ Data Cloud Complete Guide — Module 03

Data Streams & Connectors
Complete Guide 2026

Everything about bringing data into Salesforce Data Cloud — every connector type, batch vs streaming, Ingestion API, setup steps and real-world examples

📅 Updated May 2026 ⏲ 20 min read 🎓 Beginner to Advanced 🆕 Module 3 of 15
Course Progress
Module 3 / 15
📍 What Are Data Streams and Connectors?
The entry point for all data in Salesforce Data Cloud

Before Data Cloud can do anything with customer data — unify it, segment it, activate it — that data must first get into Data Cloud. This is the job of Data Streams and Connectors.

No data can exist inside Data Cloud without passing through the ingestion layer. Every Unified Customer Profile, every Calculated Insight, every Segment starts with data that entered through a Data Stream. If your Data Streams are wrong — everything downstream will be wrong. This is why mastering this module is foundational to everything else in the course.

Think of Data Streams and Connectors as the plumbing system of your Data Cloud implementation. Connectors are the pipes connecting your source systems to Data Cloud. Data Streams are the water flowing through those pipes — controlled, configured and monitored.

📍 Core Principle

Every piece of data in Data Cloud — every profile attribute, every behavioral event, every transaction record — entered through a Data Stream. Understanding Data Streams means understanding where your data comes from, how fresh it is, how much it costs and what quality it has when it arrives.

📍 Connector vs Data Stream — The Critical Difference
The most commonly confused concepts in Data Cloud — clarified once and for all

These two terms are used interchangeably by beginners but they are fundamentally different things. Confusing them in an interview signals a lack of hands-on experience.

FactorConnectorData Stream
What it isThe integration technology that connects to a source systemA specific configured flow of data from that source
LevelSystem-level connection — configured once per sourceObject-level pipeline — one per object or dataset
RelationshipOne Connector powers many Data StreamsOne Data Stream per object per Connector
What it definesAuthentication, connection settings, source systemWhich object, which fields, refresh schedule, mode
ExampleSalesforce CRM Connector — connects to your CRM orgAccount Stream, Contact Stream, Opportunity Stream
Created First?Yes — Connector is always the prerequisiteCreated after Connector is configured and authenticated
AnalogyThe train track connecting two citiesA specific train running on that track carrying specific cargo
💡 Real World Analogy

Think of a TV Cable Subscription

A Connector is like signing up for a cable subscription — you establish the connection between your TV and the cable provider once. That connection is always there in the background.

Data Streams are like the individual channels you choose to watch — ESPN, BBC News, Discovery. You pick which channels flow through your connection. You can add new channels (new Data Streams) or remove existing ones without changing the underlying cable subscription (Connector).

One cable subscription powers dozens of channels. One Salesforce CRM Connector powers dozens of Data Streams — Account, Contact, Lead, Opportunity, Case and more.

📍 Every Connector Type Explained
All the ways data can enter Salesforce Data Cloud
Salesforce CRM Connector
Native Salesforce
Connects directly to your Salesforce CRM org. Supports all standard objects (Account, Contact, Lead, Case, Opportunity) and custom objects. One-click setup using your existing Salesforce credentials. Most common connector in every Data Cloud implementation.
📧
Marketing Cloud Connector
Native Salesforce
Ingests email engagement data from Salesforce Marketing Cloud — opens, clicks, bounces, unsubscribes, subscriber attributes. Bidirectional — Data Cloud segments can also activate back to Marketing Cloud. Critical for building email engagement Calculated Insights.
🛒
Commerce Cloud Connector
Native Salesforce
Ingests order history, product catalog data, cart events and browsing behavior from Salesforce B2C Commerce Cloud. Essential for retail and e-commerce use cases requiring purchase-based segmentation and product recommendations.
📦
Amazon S3 Connector
Cloud Storage
Ingests CSV, JSON or Parquet files from Amazon S3 buckets on a scheduled basis. Used for ERP exports, historical data loads, third-party data files that land in S3. Configure a folder path, file format and schedule. Data Cloud polls the folder and ingests new files automatically.
Google Cloud Storage Connector
Cloud Storage
Equivalent to S3 but for Google Cloud Storage buckets. Ingests files in CSV or Parquet format on a scheduled interval. Popular for companies running Google Cloud infrastructure who export data to GCS before ingesting to Data Cloud.
💾
Azure Data Lake Connector
Cloud Storage
Connects to Microsoft Azure Data Lake Storage for batch file ingestion. Used by enterprises running Microsoft infrastructure with ADLS as their central data repository. Supports CSV, JSON and Parquet file formats.
🔗
Ingestion API Connector
Custom API
A REST API that allows any external system to push data directly into Data Cloud. Supports both streaming (individual events in real-time) and bulk (large batch uploads) modes. Used when no pre-built connector exists or when you need programmatic control over ingestion timing and content.
🔄
MuleSoft Connector
Integration Platform
Uses MuleSoft Anypoint Platform to connect any external system — SAP, Oracle, legacy CRMs, custom databases — to Data Cloud. The most flexible connector for complex enterprise integrations that require data transformation before ingestion.
Snowflake Zero Copy
Zero Copy
Does NOT physically ingest data. Uses Snowflake Secure Share to let Data Cloud query Snowflake tables in place. No duplication, no sync lag, no additional storage cost. Data stays in Snowflake but appears as a native DLO in Data Cloud. Best for large data warehouses.
📊
Google BigQuery Zero Copy
Zero Copy
Equivalent to Snowflake Zero Copy for Google BigQuery datasets. Data Cloud queries BigQuery tables via Google Analytics Hub sharing without moving data. Ideal for companies using BigQuery as their analytics warehouse.
📱
Mobile SDK Connector
SDK
Salesforce CDP Mobile SDK embedded in iOS or Android apps sends behavioral events directly to Data Cloud in real-time — screen views, button clicks, purchases, search queries. Enables real-time behavioral segmentation based on in-app activity.
🌐
Web SDK Connector
SDK
JavaScript tag embedded on your website that tracks page views, clicks, form submissions and custom events directly into Data Cloud as streaming events. Enables real-time web behavioral segmentation and abandoned browse or cart triggers.
Connector TypeIngestion ModeBest ForCredit Cost
Salesforce CRMBatch (scheduled)CRM object sync — daily or hourlyLow
Marketing CloudBatchEmail engagement dataLow
S3 / GCS / ADLSBatch (file-based)ERP exports, historical loadsLow
Ingestion API — BulkBatch (programmatic)Custom systems, nightly batchLow
Ingestion API — StreamingReal-time streamingCustom events, IoT, clickstreamHigh
Mobile SDKReal-time streamingIn-app behavioral eventsHigh
Web SDKReal-time streamingWebsite behavioral eventsHigh
Snowflake Zero CopyReal-time queryLarge data warehouse accessVery Low
MuleSoftBatch or StreamingComplex enterprise integrationsVaries
📍 Batch vs Streaming Ingestion — Deep Dive
The most important architectural decision in every Data Cloud project
📦 Batch Ingestion
Data collected and processed at scheduled intervals — hourly, daily or weekly
📈
High volume — thousands to millions of records per run
💵
Lower Data Credit consumption per record
🕓
Latency of minutes to hours before data is available
Best for: CRM sync, ERP exports, loyalty data, historical loads
Not for: Abandoned cart triggers, real-time personalization
⚡ Streaming Ingestion
Individual events processed as they occur — within seconds
📈
High frequency — millions of small events per day
💵
Higher Data Credit consumption per event
🕐
Latency of seconds — available in Data Cloud almost instantly
Best for: Web events, cart abandonment, in-app behavior, IoT
Not for: Data that changes infrequently — wastes credits

The Decision Framework — When to Use Which

The core question to ask for any data source is: "Does acting on this data 60 minutes from now instead of 60 seconds from now change the business outcome?"

If the answer is NO — use batch. Customer name, address, industry, loyalty tier, account manager — none of these change rapidly. A daily sync is perfectly sufficient and costs a fraction of streaming.

If the answer is YES — use streaming. Cart abandonment becomes significantly less effective after 30 minutes. A fraud signal that is 2 hours old is useless. An IoT alert from a machine that needs maintenance cannot wait until the next batch run.

The biggest mistake architects make is defaulting to streaming for everything because it feels more powerful. The reality is that streaming costs dramatically more Data Credits. A daily batch of 1 million CRM records might cost the same credits as 10 minutes of high-frequency streaming events.

Data SourceRecommended ModeWhy
Salesforce CRM contacts and accountsBatch — DailyChanges infrequently, cost-efficient
Website page views and clicksStreamingReal-time personalization and triggers
Mobile app behavioral eventsStreamingIn-session triggers, real-time recommendations
Shopping cart add and abandon eventsStreamingCart recovery requires sub-5-minute trigger
Email engagement opens and clicksBatch — HourlyHourly is sufficient, streaming unnecessary
ERP purchase historyBatch — DailyHistorical data, no real-time need
Loyalty point transactionsBatch — DailyPoints calculated end of day typically
IoT sensor eventsStreamingReal-time alerts and predictive maintenance
Fraud transaction signalsStreamingStale fraud signals have zero value
Customer demographics from data warehouseZero Copy or Batch WeeklyStatic data, no real-time need
📍 The Ingestion API — When and How to Use It
The most flexible ingestion option — for when no pre-built connector fits

What is the Ingestion API?

The Data Cloud Ingestion API is a REST API that allows any external system to push data directly into Data Cloud programmatically. Unlike pre-built connectors that pull data on a schedule, the Ingestion API is push-based — the source system sends data to Data Cloud when it is ready.

This makes it the most flexible ingestion option. Any system that can make an HTTP request can send data to Data Cloud — a custom Java application, a Python script, a Salesforce Flow calling an external API, an IoT device, or a third-party SaaS platform with webhook support.

API ModeHow It WorksVolumeUse Case
Streaming ModePush individual JSON events via REST POST in real-time1 to 100 records per callWebsite events, app interactions, IoT sensor readings
Bulk ModeUpload large files via multipart HTTP uploadMillions of records per fileHistorical data migration, nightly batch from custom systems

How the Ingestion API Works

  • Step 1: Create a Connected App in Salesforce for OAuth 2.0 authentication
  • Step 2: Configure an Ingestion API Data Stream in Data Cloud Setup — define the schema of what data you will send
  • Step 3: Authenticate and obtain an access token using OAuth JWT Bearer Flow
  • Step 4: POST JSON payload to the Data Cloud Ingestion API endpoint
  • Step 5: Data lands in the configured DLO within seconds (streaming) or minutes (bulk)
  • Step 6: DLO maps to DMO as configured — data becomes available for segmentation

Example Ingestion API Payload

POST to your Data Cloud Ingestion API endpoint with this JSON structure:

The payload contains a data array with individual event objects. Each object must include the fields matching your configured Data Stream schema — such as individual_id, event_type, product_id, timestamp and any custom fields you defined when setting up the Ingestion API connector.

The individual_id field is mandatory in every payload — it links the incoming event to an existing Unified Customer Profile. Without it the event lands in the DLO but cannot connect to any profile for segmentation or activation.

⚠️ Critical Interview Point

The Ingestion API requires a schema to be defined upfront when creating the Data Stream. Unlike batch connectors that discover the schema from the source, the Ingestion API requires you to define field names and data types before any data is sent. If the incoming payload does not match the schema — fields are missing or types are wrong — the records are rejected.

📍 How to Set Up a Data Stream Step by Step
The exact process for configuring any Data Stream in Salesforce Data Cloud
01

Navigate to Data Cloud Setup

From the App Launcher in Salesforce, open the Data Cloud app. Go to Setup — Data Streams — New. This opens the Data Stream creation wizard.

02

Select Your Connector Type

Choose the appropriate connector from the list — Salesforce CRM, Marketing Cloud, S3, Ingestion API etc. If this is the first Data Stream for this source, you will be prompted to configure the Connector first — providing authentication credentials and connection settings.

03

Select the Object or Data Source

For CRM Connector, select which Salesforce object you want to stream — Account, Contact, Lead, Opportunity or a custom object. For S3, specify the bucket name and folder path. For Ingestion API, define the schema manually.

04

Select Fields to Include

This step is critical for credit management. Only select the fields that are needed for segmentation, Calculated Insights or activation. Do not include every available field. Unneeded fields consume credits to process and store with zero business value.

05

Configure Refresh Schedule

Set how often this Data Stream syncs. Options are typically Hourly, Daily, Weekly or On Demand for batch connectors. For streaming connectors this setting is continuous. Consider business need vs credit cost when setting the schedule — daily is sufficient for most CRM data.

06

Configure Primary Key

Set the Primary Key field — the unique identifier for each record in this Data Stream. For CRM objects this is typically the Salesforce ID field. Primary Key is used for deduplication within the same source — if the same record comes in twice, the newer version replaces the older one.

07

Set the Data Stream Category

Categorize the Data Stream as Profile, Engagement or Other. Profile data contains customer attributes. Engagement data contains behavioral events. This categorization helps Data Cloud organize data and applies appropriate processing rules for each type.

08

Save and Activate

Save the Data Stream configuration and activate it. Data Cloud will run the first ingestion immediately. Monitor the Data Stream status to confirm it shows Active and check the last successful run timestamp. Errors at this stage typically indicate authentication or permission issues.

✅ After Setup

After the Data Stream runs successfully, navigate to Data Lake Objects to verify the DLO was created with the correct schema and contains data. If the DLO exists but has no data — check the connector authentication and source system permissions. The DLO is your confirmation that ingestion worked correctly.

📍 What Happens After Ingestion — The DLO Explained
Where data lands when it first enters Data Cloud

The Data Lake Object

When a Data Stream runs, data lands in a Data Lake Object (DLO). The DLO is automatically created by Data Cloud — you do not manually create it. It mirrors the schema of your source exactly — same field names, same data types, same structure as the source system.

The DLO is a raw, read-only staging area. You cannot modify the DLO schema or edit its data. It exists as an intermediate layer between your source system and the harmonized DMO layer. Think of it as the loading dock of a warehouse — goods arrive exactly as shipped and wait to be unpacked and organized.

The DLO is NOT used for segmentation, Calculated Insights or activation. You cannot build a segment that filters on DLO data. Segments only work on DMO data. This is why the field mapping step — covered in Module 04 — is so critical. Until DLO data is mapped to a DMO, it is invisible to all downstream Data Cloud features.

DLO PropertyDetail
Created ByAutomatically by Data Cloud when Data Stream first runs
SchemaMirrors source system exactly — same field names and types
Editable?No — read-only. Cannot modify schema or data directly
Used for Segmentation?No — only DMOs are used for segmentation and insights
DeduplicationPrimary Key based — same key = newest record wins (upsert)
RetentionConfigurable — typically 90-180 days for engagement data
Data QualityRaw — whatever came from source including errors and nulls
Next StepField mapping to DMO via Data Cloud Setup
📍 Real-World Ingestion Design Scenarios
How actual projects design their Data Cloud ingestion layer
🌎 Real-World Ingestion Designs
How Different Companies Design Their Ingestion Layer

🛒 E-Commerce Retailer — Hybrid Ingestion Design

Batch Data Streams: Salesforce CRM Connector (Account, Contact, Order History — daily sync). Amazon S3 (Loyalty points CSV export from loyalty platform — daily). Marketing Cloud Connector (email engagement — hourly). Streaming Data Streams: Web SDK on website (page views, product views, add to cart — real-time). Mobile SDK on iOS and Android app (in-app events — real-time). Ingestion API (checkout completion events from custom payment system — streaming). Zero Copy: Snowflake (3 years of historical transaction data — query in place, never ingested).

🏢 B2B Software Company — Product-Led Ingestion

Batch: Salesforce CRM Connector (Account, Contact, Opportunity, Contract — daily). S3 (billing data export from Stripe — daily). Marketing Cloud Connector (email engagement — daily). Streaming: Ingestion API (product usage events from SaaS platform — feature used, login, API call — real-time streaming). Key insight: Product usage data is the most valuable signal for churn prediction and expansion opportunity identification. This company streams product events in real-time so their account health scores update within minutes of activity.

🏥 Bank — Compliance-First Ingestion Design

All Data Streams run in Hyperforce EU region for GDPR. Batch: Core Banking System via MuleSoft (account balances, transaction summaries — daily with encryption). CRM Connector (Relationship Manager notes, case history). No streaming from core banking — regulatory restriction on real-time PII transmission. Streaming: Mobile Banking App (login events, feature usage — anonymized before streaming). Zero Copy: Snowflake (historical 7-year transaction data for ML model training — never copied into Data Cloud, queried in place).

📍 Common Data Stream Mistakes
What goes wrong with ingestion in real projects — and exactly how to avoid it

Mistake 1: Selecting every available field in the Data Stream

The Salesforce CRM Contact object has 200+ fields. Teams select all of them because it feels safer. Every field ingested consumes credits to process and store. The average Data Cloud implementation needs fewer than 20 Contact fields for segmentation. Always audit which fields are actually needed before configuring the stream and ruthlessly cut anything not required.

Mistake 2: Streaming data that should be batched

Setting CRM Contact sync to streaming because it is technically possible. CRM contact demographics — name, email, phone, address — change maybe once a month for most customers. Streaming this at real-time frequency costs 10 to 20 times more credits than a daily batch for zero business benefit. Reserve streaming for behavioral event data where seconds matter.

Mistake 3: Not setting the Primary Key correctly

Using a non-unique field as the Primary Key — like a company name or product category — causes multiple records to overwrite each other. The Primary Key must be a truly unique identifier per record. For CRM objects, use the Salesforce Record ID. For custom systems use whatever your source system uses as the unique row identifier.

Mistake 4: Forgetting to include Individual ID in the Data Stream

Individual ID is what links a DLO record to a Unified Customer Profile. Teams configure a perfect Data Stream, the DLO fills up with data, they map it to a DMO — and then discover none of the data connects to any profile. Without Individual ID mapped in the field mapping step, all that data is orphaned. Every Data Stream for profile or engagement data must have Individual ID mappable.

Mistake 5: Not monitoring Data Stream health after go-live

Teams configure Data Streams, validate them on day one and never check again. Source systems change — fields get renamed, authentication tokens expire, API endpoints change. A Data Stream that was working perfectly can silently fail for days before anyone notices. Set up monitoring alerts for failed runs and build a weekly review of Data Stream status into your operating model.

🧠 Quick Knowledge Check
Test your understanding of Module 03 — answers are in the content above!
Question 01
A company has one Salesforce CRM Connector configured. How many Data Streams can they create from it?
A. Only one — one connector, one stream
B. Up to five — one per cloud
C. Multiple — one per object (Account, Contact, Opportunity etc.)
D. Unlimited but only one can run at a time
Question 02
A shopping cart abandonment trigger must fire within 3 minutes of abandonment. Which ingestion mode should the cart event Data Stream use?
A. Batch — Hourly schedule
B. Streaming — real-time ingestion
C. Zero Copy from Snowflake
D. Batch — Daily schedule
Question 03
What is the correct statement about a Data Lake Object (DLO)?
A. DLOs can be used directly in segment filters
B. DLOs are manually created by the Data Cloud Admin
C. DLOs contain raw data as received and cannot be edited directly
D. DLOs are the same as Data Model Objects
Question 04
A company wants to use their Snowflake data warehouse in Data Cloud without incurring duplication costs or sync delays. Which connector should they use?
A. Ingestion API — Bulk mode to upload Snowflake exports
B. S3 Connector with Snowflake export files
C. Snowflake Zero Copy using Secure Share
D. MuleSoft Connector pulling from Snowflake
Question 05
Data is flowing into a DLO correctly but is not appearing in the DMO or available for segmentation. What is the most likely cause?
A. The Data Stream schedule is set to weekly instead of daily
B. The DLO fields have not been mapped to the DMO yet
C. The Connector authentication has expired
D. The segment filters are incorrect
✅ Answers

Q1: C — Multiple streams per connector | Q2: B — Streaming real-time | Q3: C — Raw data, read-only | Q4: C — Snowflake Zero Copy | Q5: B — DLO fields not mapped to DMO

🎤 Interview Questions for This Module
Data Stream questions that come up in real Data Cloud interviews
Q1
What is the difference between a Connector and a Data Stream in Salesforce Data Cloud?

A Connector is the integration technology that establishes and maintains a connection to a source system — configured once per source with authentication credentials and connection settings. A Data Stream is a specific configured flow of data through that connector — defining which object, which fields, what schedule and what ingestion mode. One Connector powers multiple Data Streams. For example, one Salesforce CRM Connector supports separate Data Streams for Account, Contact, Lead, Opportunity and Case objects. You always configure the Connector first, then create Data Streams on top of it.

One-Liner: "A Connector is the pipe connecting to the source — configured once. A Data Stream is the specific flow of data through that pipe — one per object. One Connector, many Data Streams."
Q2
When would you use streaming ingestion vs batch ingestion? Give real examples.

The decision comes down to whether acting on data within seconds rather than hours changes the business outcome. Streaming is justified when immediacy creates value — abandoned cart events need to trigger a recovery message within minutes, so streaming is essential. Website behavioral events, mobile app interactions, IoT sensor readings and fraud signals all require streaming because delayed processing reduces or eliminates their value. Batch is appropriate when data changes infrequently or when hourly or daily freshness is sufficient — CRM Contact demographics, ERP purchase history, Marketing Cloud email engagement and loyalty tier data all fit batch perfectly and cost significantly fewer Data Credits. The biggest mistake is defaulting to streaming for everything because it feels more capable. Streaming costs 10 to 20 times more credits per record than batch for the same data volume.

One-Liner: "Stream when seconds matter — cart events, behavioral triggers, fraud signals. Batch when daily freshness is sufficient — CRM demographics, ERP history, email engagement. Streaming costs far more credits per record."
Q3
When would you use the Ingestion API over a pre-built connector?

The Ingestion API is the right choice when no pre-built connector exists for the source system, when you need programmatic control over exactly when and what data is sent, or when the source system can push data via webhook but cannot be polled by a connector. Common scenarios include custom-built internal applications, third-party SaaS platforms that support webhooks but are not in the connector library, IoT devices that push events directly, and Salesforce Flows or Apex code that need to write real-time events into Data Cloud from within Salesforce itself. The Ingestion API requires upfront schema definition — you must define field names and types before any data arrives, unlike connectors that discover schema from the source automatically.

One-Liner: "Use Ingestion API when no pre-built connector exists, when the source system is push-based via webhook, or when programmatic control over ingestion timing is required. Remember schema must be defined upfront."
Q4
Data is arriving in the DLO but not showing up in segments. Walk me through how you would diagnose this.

This is a classic field mapping problem. When data arrives in a DLO but is not available for segmentation, the most likely cause is that the DLO fields have not been mapped to a DMO yet. I would first confirm the DLO has data by querying it in Data Cloud. Then I would check the field mapping configuration — navigating to the Data Stream and reviewing whether DLO fields have been mapped to their corresponding DMO fields. I would specifically check whether the Individual ID field is mapped — without it the records exist in the DMO but are not linked to any Unified Customer Profile, making them invisible to segments. I would also verify the DMO refresh has run after the mapping was saved and that Identity Resolution has processed the new profiles. If mapping is correct but data still does not appear in segments, I would check segment filter criteria to ensure they match the values actually in the DMO.

One-Liner: "DLO data not in segments — check in order: DLO field mapping to DMO complete? Individual ID mapped? DMO refresh run? Identity Resolution run? Segment filter criteria match DMO values?"
Q5
How would you design the ingestion layer for a global e-commerce company with 10 million customers?

I would design a hybrid ingestion architecture separating real-time behavioral data from relatively static profile data. For profile data — CRM accounts, contacts and order history — I would use daily batch ingestion via the CRM Connector and Commerce Cloud Connector. This minimizes credit consumption for data that changes infrequently. For behavioral data — website events, mobile app interactions and cart activity — I would use streaming ingestion via the Web SDK and Mobile SDK connectors to enable real-time triggers like abandoned cart recovery within minutes. For historical transaction data in the company's existing Snowflake data warehouse, I would implement Zero Copy to avoid duplication costs and data freshness delays. I would also configure the Marketing Cloud Connector on an hourly batch schedule for email engagement data. Before finalizing any Data Stream I would audit the required fields for each use case and configure only those fields — not the full object schema — to manage credit consumption across 10 million profiles.

One-Liner: "Hybrid ingestion — daily batch for CRM and order history, streaming for web and app behavioral events, Zero Copy for Snowflake historical data. Always audit fields before configuring — only ingest what segmentation and insights actually need."