Salesforce Data Cloud Data Streams and Connectors — Complete Guide 2026 | Module 03
Data Streams & Connectors
Complete Guide 2026
Everything about bringing data into Salesforce Data Cloud — every connector type, batch vs streaming, Ingestion API, setup steps and real-world examples
- What Are Data Streams and Connectors?
- Connector vs Data Stream — The Critical Difference
- Every Connector Type Explained
- Batch vs Streaming Ingestion — Deep Dive
- The Ingestion API — When and How to Use It
- How to Set Up a Data Stream Step by Step
- What Happens After Ingestion — The DLO Explained
- Real-World Ingestion Design Scenarios
- Common Data Stream Mistakes
- Quick Quiz
- Interview Questions for This Module
Before Data Cloud can do anything with customer data — unify it, segment it, activate it — that data must first get into Data Cloud. This is the job of Data Streams and Connectors.
No data can exist inside Data Cloud without passing through the ingestion layer. Every Unified Customer Profile, every Calculated Insight, every Segment starts with data that entered through a Data Stream. If your Data Streams are wrong — everything downstream will be wrong. This is why mastering this module is foundational to everything else in the course.
Think of Data Streams and Connectors as the plumbing system of your Data Cloud implementation. Connectors are the pipes connecting your source systems to Data Cloud. Data Streams are the water flowing through those pipes — controlled, configured and monitored.
Every piece of data in Data Cloud — every profile attribute, every behavioral event, every transaction record — entered through a Data Stream. Understanding Data Streams means understanding where your data comes from, how fresh it is, how much it costs and what quality it has when it arrives.
These two terms are used interchangeably by beginners but they are fundamentally different things. Confusing them in an interview signals a lack of hands-on experience.
| Factor | Connector | Data Stream |
|---|---|---|
| What it is | The integration technology that connects to a source system | A specific configured flow of data from that source |
| Level | System-level connection — configured once per source | Object-level pipeline — one per object or dataset |
| Relationship | One Connector powers many Data Streams | One Data Stream per object per Connector |
| What it defines | Authentication, connection settings, source system | Which object, which fields, refresh schedule, mode |
| Example | Salesforce CRM Connector — connects to your CRM org | Account Stream, Contact Stream, Opportunity Stream |
| Created First? | Yes — Connector is always the prerequisite | Created after Connector is configured and authenticated |
| Analogy | The train track connecting two cities | A specific train running on that track carrying specific cargo |
Think of a TV Cable Subscription
A Connector is like signing up for a cable subscription — you establish the connection between your TV and the cable provider once. That connection is always there in the background.
Data Streams are like the individual channels you choose to watch — ESPN, BBC News, Discovery. You pick which channels flow through your connection. You can add new channels (new Data Streams) or remove existing ones without changing the underlying cable subscription (Connector).
One cable subscription powers dozens of channels. One Salesforce CRM Connector powers dozens of Data Streams — Account, Contact, Lead, Opportunity, Case and more.
| Connector Type | Ingestion Mode | Best For | Credit Cost |
|---|---|---|---|
| Salesforce CRM | Batch (scheduled) | CRM object sync — daily or hourly | Low |
| Marketing Cloud | Batch | Email engagement data | Low |
| S3 / GCS / ADLS | Batch (file-based) | ERP exports, historical loads | Low |
| Ingestion API — Bulk | Batch (programmatic) | Custom systems, nightly batch | Low |
| Ingestion API — Streaming | Real-time streaming | Custom events, IoT, clickstream | High |
| Mobile SDK | Real-time streaming | In-app behavioral events | High |
| Web SDK | Real-time streaming | Website behavioral events | High |
| Snowflake Zero Copy | Real-time query | Large data warehouse access | Very Low |
| MuleSoft | Batch or Streaming | Complex enterprise integrations | Varies |
The Decision Framework — When to Use Which
The core question to ask for any data source is: "Does acting on this data 60 minutes from now instead of 60 seconds from now change the business outcome?"
If the answer is NO — use batch. Customer name, address, industry, loyalty tier, account manager — none of these change rapidly. A daily sync is perfectly sufficient and costs a fraction of streaming.
If the answer is YES — use streaming. Cart abandonment becomes significantly less effective after 30 minutes. A fraud signal that is 2 hours old is useless. An IoT alert from a machine that needs maintenance cannot wait until the next batch run.
The biggest mistake architects make is defaulting to streaming for everything because it feels more powerful. The reality is that streaming costs dramatically more Data Credits. A daily batch of 1 million CRM records might cost the same credits as 10 minutes of high-frequency streaming events.
| Data Source | Recommended Mode | Why |
|---|---|---|
| Salesforce CRM contacts and accounts | Batch — Daily | Changes infrequently, cost-efficient |
| Website page views and clicks | Streaming | Real-time personalization and triggers |
| Mobile app behavioral events | Streaming | In-session triggers, real-time recommendations |
| Shopping cart add and abandon events | Streaming | Cart recovery requires sub-5-minute trigger |
| Email engagement opens and clicks | Batch — Hourly | Hourly is sufficient, streaming unnecessary |
| ERP purchase history | Batch — Daily | Historical data, no real-time need |
| Loyalty point transactions | Batch — Daily | Points calculated end of day typically |
| IoT sensor events | Streaming | Real-time alerts and predictive maintenance |
| Fraud transaction signals | Streaming | Stale fraud signals have zero value |
| Customer demographics from data warehouse | Zero Copy or Batch Weekly | Static data, no real-time need |
What is the Ingestion API?
The Data Cloud Ingestion API is a REST API that allows any external system to push data directly into Data Cloud programmatically. Unlike pre-built connectors that pull data on a schedule, the Ingestion API is push-based — the source system sends data to Data Cloud when it is ready.
This makes it the most flexible ingestion option. Any system that can make an HTTP request can send data to Data Cloud — a custom Java application, a Python script, a Salesforce Flow calling an external API, an IoT device, or a third-party SaaS platform with webhook support.
| API Mode | How It Works | Volume | Use Case |
|---|---|---|---|
| Streaming Mode | Push individual JSON events via REST POST in real-time | 1 to 100 records per call | Website events, app interactions, IoT sensor readings |
| Bulk Mode | Upload large files via multipart HTTP upload | Millions of records per file | Historical data migration, nightly batch from custom systems |
How the Ingestion API Works
- Step 1: Create a Connected App in Salesforce for OAuth 2.0 authentication
- Step 2: Configure an Ingestion API Data Stream in Data Cloud Setup — define the schema of what data you will send
- Step 3: Authenticate and obtain an access token using OAuth JWT Bearer Flow
- Step 4: POST JSON payload to the Data Cloud Ingestion API endpoint
- Step 5: Data lands in the configured DLO within seconds (streaming) or minutes (bulk)
- Step 6: DLO maps to DMO as configured — data becomes available for segmentation
Example Ingestion API Payload
POST to your Data Cloud Ingestion API endpoint with this JSON structure:
The payload contains a data array with individual event objects. Each object must include the fields matching your configured Data Stream schema — such as individual_id, event_type, product_id, timestamp and any custom fields you defined when setting up the Ingestion API connector.
The individual_id field is mandatory in every payload — it links the incoming event to an existing Unified Customer Profile. Without it the event lands in the DLO but cannot connect to any profile for segmentation or activation.
The Ingestion API requires a schema to be defined upfront when creating the Data Stream. Unlike batch connectors that discover the schema from the source, the Ingestion API requires you to define field names and data types before any data is sent. If the incoming payload does not match the schema — fields are missing or types are wrong — the records are rejected.
Navigate to Data Cloud Setup
From the App Launcher in Salesforce, open the Data Cloud app. Go to Setup — Data Streams — New. This opens the Data Stream creation wizard.
Select Your Connector Type
Choose the appropriate connector from the list — Salesforce CRM, Marketing Cloud, S3, Ingestion API etc. If this is the first Data Stream for this source, you will be prompted to configure the Connector first — providing authentication credentials and connection settings.
Select the Object or Data Source
For CRM Connector, select which Salesforce object you want to stream — Account, Contact, Lead, Opportunity or a custom object. For S3, specify the bucket name and folder path. For Ingestion API, define the schema manually.
Select Fields to Include
This step is critical for credit management. Only select the fields that are needed for segmentation, Calculated Insights or activation. Do not include every available field. Unneeded fields consume credits to process and store with zero business value.
Configure Refresh Schedule
Set how often this Data Stream syncs. Options are typically Hourly, Daily, Weekly or On Demand for batch connectors. For streaming connectors this setting is continuous. Consider business need vs credit cost when setting the schedule — daily is sufficient for most CRM data.
Configure Primary Key
Set the Primary Key field — the unique identifier for each record in this Data Stream. For CRM objects this is typically the Salesforce ID field. Primary Key is used for deduplication within the same source — if the same record comes in twice, the newer version replaces the older one.
Set the Data Stream Category
Categorize the Data Stream as Profile, Engagement or Other. Profile data contains customer attributes. Engagement data contains behavioral events. This categorization helps Data Cloud organize data and applies appropriate processing rules for each type.
Save and Activate
Save the Data Stream configuration and activate it. Data Cloud will run the first ingestion immediately. Monitor the Data Stream status to confirm it shows Active and check the last successful run timestamp. Errors at this stage typically indicate authentication or permission issues.
After the Data Stream runs successfully, navigate to Data Lake Objects to verify the DLO was created with the correct schema and contains data. If the DLO exists but has no data — check the connector authentication and source system permissions. The DLO is your confirmation that ingestion worked correctly.
The Data Lake Object
When a Data Stream runs, data lands in a Data Lake Object (DLO). The DLO is automatically created by Data Cloud — you do not manually create it. It mirrors the schema of your source exactly — same field names, same data types, same structure as the source system.
The DLO is a raw, read-only staging area. You cannot modify the DLO schema or edit its data. It exists as an intermediate layer between your source system and the harmonized DMO layer. Think of it as the loading dock of a warehouse — goods arrive exactly as shipped and wait to be unpacked and organized.
The DLO is NOT used for segmentation, Calculated Insights or activation. You cannot build a segment that filters on DLO data. Segments only work on DMO data. This is why the field mapping step — covered in Module 04 — is so critical. Until DLO data is mapped to a DMO, it is invisible to all downstream Data Cloud features.
| DLO Property | Detail |
|---|---|
| Created By | Automatically by Data Cloud when Data Stream first runs |
| Schema | Mirrors source system exactly — same field names and types |
| Editable? | No — read-only. Cannot modify schema or data directly |
| Used for Segmentation? | No — only DMOs are used for segmentation and insights |
| Deduplication | Primary Key based — same key = newest record wins (upsert) |
| Retention | Configurable — typically 90-180 days for engagement data |
| Data Quality | Raw — whatever came from source including errors and nulls |
| Next Step | Field mapping to DMO via Data Cloud Setup |
🛒 E-Commerce Retailer — Hybrid Ingestion Design
Batch Data Streams: Salesforce CRM Connector (Account, Contact, Order History — daily sync). Amazon S3 (Loyalty points CSV export from loyalty platform — daily). Marketing Cloud Connector (email engagement — hourly). Streaming Data Streams: Web SDK on website (page views, product views, add to cart — real-time). Mobile SDK on iOS and Android app (in-app events — real-time). Ingestion API (checkout completion events from custom payment system — streaming). Zero Copy: Snowflake (3 years of historical transaction data — query in place, never ingested).
🏢 B2B Software Company — Product-Led Ingestion
Batch: Salesforce CRM Connector (Account, Contact, Opportunity, Contract — daily). S3 (billing data export from Stripe — daily). Marketing Cloud Connector (email engagement — daily). Streaming: Ingestion API (product usage events from SaaS platform — feature used, login, API call — real-time streaming). Key insight: Product usage data is the most valuable signal for churn prediction and expansion opportunity identification. This company streams product events in real-time so their account health scores update within minutes of activity.
🏥 Bank — Compliance-First Ingestion Design
All Data Streams run in Hyperforce EU region for GDPR. Batch: Core Banking System via MuleSoft (account balances, transaction summaries — daily with encryption). CRM Connector (Relationship Manager notes, case history). No streaming from core banking — regulatory restriction on real-time PII transmission. Streaming: Mobile Banking App (login events, feature usage — anonymized before streaming). Zero Copy: Snowflake (historical 7-year transaction data for ML model training — never copied into Data Cloud, queried in place).
Mistake 1: Selecting every available field in the Data Stream
The Salesforce CRM Contact object has 200+ fields. Teams select all of them because it feels safer. Every field ingested consumes credits to process and store. The average Data Cloud implementation needs fewer than 20 Contact fields for segmentation. Always audit which fields are actually needed before configuring the stream and ruthlessly cut anything not required.
Mistake 2: Streaming data that should be batched
Setting CRM Contact sync to streaming because it is technically possible. CRM contact demographics — name, email, phone, address — change maybe once a month for most customers. Streaming this at real-time frequency costs 10 to 20 times more credits than a daily batch for zero business benefit. Reserve streaming for behavioral event data where seconds matter.
Mistake 3: Not setting the Primary Key correctly
Using a non-unique field as the Primary Key — like a company name or product category — causes multiple records to overwrite each other. The Primary Key must be a truly unique identifier per record. For CRM objects, use the Salesforce Record ID. For custom systems use whatever your source system uses as the unique row identifier.
Mistake 4: Forgetting to include Individual ID in the Data Stream
Individual ID is what links a DLO record to a Unified Customer Profile. Teams configure a perfect Data Stream, the DLO fills up with data, they map it to a DMO — and then discover none of the data connects to any profile. Without Individual ID mapped in the field mapping step, all that data is orphaned. Every Data Stream for profile or engagement data must have Individual ID mappable.
Mistake 5: Not monitoring Data Stream health after go-live
Teams configure Data Streams, validate them on day one and never check again. Source systems change — fields get renamed, authentication tokens expire, API endpoints change. A Data Stream that was working perfectly can silently fail for days before anyone notices. Set up monitoring alerts for failed runs and build a weekly review of Data Stream status into your operating model.
Q1: C — Multiple streams per connector | Q2: B — Streaming real-time | Q3: C — Raw data, read-only | Q4: C — Snowflake Zero Copy | Q5: B — DLO fields not mapped to DMO
A Connector is the integration technology that establishes and maintains a connection to a source system — configured once per source with authentication credentials and connection settings. A Data Stream is a specific configured flow of data through that connector — defining which object, which fields, what schedule and what ingestion mode. One Connector powers multiple Data Streams. For example, one Salesforce CRM Connector supports separate Data Streams for Account, Contact, Lead, Opportunity and Case objects. You always configure the Connector first, then create Data Streams on top of it.
The decision comes down to whether acting on data within seconds rather than hours changes the business outcome. Streaming is justified when immediacy creates value — abandoned cart events need to trigger a recovery message within minutes, so streaming is essential. Website behavioral events, mobile app interactions, IoT sensor readings and fraud signals all require streaming because delayed processing reduces or eliminates their value. Batch is appropriate when data changes infrequently or when hourly or daily freshness is sufficient — CRM Contact demographics, ERP purchase history, Marketing Cloud email engagement and loyalty tier data all fit batch perfectly and cost significantly fewer Data Credits. The biggest mistake is defaulting to streaming for everything because it feels more capable. Streaming costs 10 to 20 times more credits per record than batch for the same data volume.
The Ingestion API is the right choice when no pre-built connector exists for the source system, when you need programmatic control over exactly when and what data is sent, or when the source system can push data via webhook but cannot be polled by a connector. Common scenarios include custom-built internal applications, third-party SaaS platforms that support webhooks but are not in the connector library, IoT devices that push events directly, and Salesforce Flows or Apex code that need to write real-time events into Data Cloud from within Salesforce itself. The Ingestion API requires upfront schema definition — you must define field names and types before any data arrives, unlike connectors that discover schema from the source automatically.
This is a classic field mapping problem. When data arrives in a DLO but is not available for segmentation, the most likely cause is that the DLO fields have not been mapped to a DMO yet. I would first confirm the DLO has data by querying it in Data Cloud. Then I would check the field mapping configuration — navigating to the Data Stream and reviewing whether DLO fields have been mapped to their corresponding DMO fields. I would specifically check whether the Individual ID field is mapped — without it the records exist in the DMO but are not linked to any Unified Customer Profile, making them invisible to segments. I would also verify the DMO refresh has run after the mapping was saved and that Identity Resolution has processed the new profiles. If mapping is correct but data still does not appear in segments, I would check segment filter criteria to ensure they match the values actually in the DMO.
I would design a hybrid ingestion architecture separating real-time behavioral data from relatively static profile data. For profile data — CRM accounts, contacts and order history — I would use daily batch ingestion via the CRM Connector and Commerce Cloud Connector. This minimizes credit consumption for data that changes infrequently. For behavioral data — website events, mobile app interactions and cart activity — I would use streaming ingestion via the Web SDK and Mobile SDK connectors to enable real-time triggers like abandoned cart recovery within minutes. For historical transaction data in the company's existing Snowflake data warehouse, I would implement Zero Copy to avoid duplication costs and data freshness delays. I would also configure the Marketing Cloud Connector on an hourly batch schedule for email engagement data. Before finalizing any Data Stream I would audit the required fields for each use case and configure only those fields — not the full object schema — to manage credit consumption across 10 million profiles.