Streaming Data Ingestion

Streaming data ingestion is the continuous delivery of events or records to a data store with sub-minute end-to-end latency. Unlike batch ingestion (periodic bulk loads via COPY INTO or scheduled Fivetran syncs) or micro-batch pipelines (frequent discrete loads), streaming ingestion maintains an open or continuously polled connection to the event source and delivers records as they arrive.

Role in the Composable CDP Stack

In a composable CDP, streaming ingestion fills the latency gap between event production and the CDW profile model that drives activation. Without streaming ingestion, profile data is only as fresh as the last batch cadence (typically 15 minutes to several hours). With streaming ingestion, profiles reflect behavioral events within seconds to minutes of occurrence — enabling near-real-time segmentation updates and reducing activation lag.

Streaming ingestion is a prerequisite for use cases where the activation window is shorter than the batch cadence: triggered email on purchase, real-time fraud-risk suppression, same-session retargeting.

Primary Implementations

Snowflake Kafka Connector. Kafka Connect-based ingestion from any Kafka topic to Snowflake staging tables via the INGEST channel (internally uses the Snowpipe Streaming API). Achieves sub-minute end-to-end latency under normal load. v3 Classic formal deprecation announcement is planned for mid-2026 — as of 2026-05-21, the announcement has not yet been made and no immediate changes are required for existing v3 workloads. Once announced, an 18-month migration window applies (EoL approximately late 2027). Organizations building new composable streaming pipelines should use v4; existing v3 pipelines can continue operating without urgency until the migration window opens.

Snowpipe Streaming SDK. Programmatic low-latency ingestion without a Kafka intermediary. Used when the event source is a custom application, a non-Kafka message broker, or when Kafka infrastructure is not in place. Lower operational overhead for simpler pipelines; achieves comparable latency to the Kafka Connector.

Fivetran connectors (append-mode). Fivetran's append-mode connectors can approximate streaming ingestion for REST API event sources at 1–15 minute cadences, but do not maintain an open connection — latency is bounded by connector sync frequency. Fivetran Activations Live Syncs provide the activation side of near-real-time at similar cadences.

Key Distinctions

Streaming vs. batch ingestion. Batch ingestion uses COPY INTO or a scheduled Fivetran sync triggered on a clock; streaming ingestion uses an open channel or continuous poll. The architectural choice depends on the activation latency requirement.

Streaming ingestion vs. real-time edge processing. Streaming ingestion delivers to the CDW profile store within 1–5 minutes; AEP's Edge Network real-time event processing acts on the event within the originating web/mobile request (<100ms). These are different capability tiers. Composable stacks using streaming ingestion cannot replicate AEP's sub-second edge-triggered activation for journey entry, offer decisioning, or event-based routing — those are AEP-locked capabilities. Composable streaming ingestion covers the near-real-time profile freshness use case, not the intra-session edge decisioning use case.

Streaming Data Ingestion

Role in the Composable CDP Stack

Primary Implementations

Key Distinctions

Sources