Configure a Kafka source connector for real-time event streaming
Configuring a Kafka source connector for real-time event streaming produces a continuously running pipeline that translates Kafka topic messages into XDM-formatted experience events and delivers them to a downstream data platform's streaming ingestion endpoint. The primary outputs are a registered connector instance in Kafka Connect, a verified RUNNING status for that connector, and an active dataflow in the target platform showing incoming records.
Key decisions include: which Kafka topics to consume (and whether to map one topic to one dataset or fan across multiple), whether to enable schema validation at the connector layer or rely on the ingestion endpoint for validation, and how to handle consumer group offsets and replay behavior for error recovery. The connect-distributed mode (as opposed to standalone) is strongly recommended for production as it enables fault tolerance and horizontal scaling across multiple Kafka Connect worker nodes.
This task has high parallelism across CDP architectures because Kafka is infrastructure-layer technology that is independent of the downstream platform. The AEP Sink Connector is AEP-specific, but the functional equivalent for a Snowflake-based composable CDP is the Snowflake Kafka Connector, which streams topic messages directly into Snowflake tables. For dbt-managed pipelines, the upstream source is still Kafka but the ingestion lands in a staging table that dbt then transforms. Teams evaluating composable alternatives should note that the schema contract (XDM vs. raw JSON vs. Avro) shifts depending on which connector is used, so schema governance decisions made at this stage have downstream implications for the data transformation layer.