Validate and monitor a streaming ingestion pipeline
Produce a test event through a streaming pipeline, confirm end-to-end delivery and schema compliance at the ingestion endpoint, and verify the event appears on the target profile or dataset within the expected latency window.
Validating and monitoring a streaming ingestion pipeline confirms that the full path from event producer through the message broker, connector, and ingestion endpoint to the profile store is functioning correctly. The primary outputs are a confirmed event record on the target profile, a healthy connector status, and an established baseline for ongoing throughput and error-rate monitoring.
Key decisions during validation include: choosing representative test payloads that exercise all required schema fields, determining the acceptable end-to-end latency threshold for the use case, and establishing what constitutes a healthy connector state vs. a transient lag vs. a hard failure. The _id uniqueness requirement is particularly important — teams commonly encounter silent data loss when replay tests reuse the same event ID, since the platform deduplicates by ID without surfacing an explicit error. Setting up alerting on connector task state transitions (RUNNING → FAILED) is a prerequisite for production readiness.
This validation pattern is highly portable across streaming architectures. For a Snowflake-based composable CDP, the equivalent validation involves producing to the Kafka topic, confirming the Snowflake Kafka Connector has committed the offset, and querying the landing table for the test record. For any architecture, the three-layer validation approach (producer confirmation, connector health, downstream record presence) remains the recommended pattern regardless of which specific technologies are in use.
Side-by-side implementations
In the AEP Kafka ingestion flow, validation proceeds in three layers. At the producer layer, use the Kafka console producer (bin/kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic aep) to manually publish a well-formed XDM payload with a valid datasetId, imsOrgId, schemaRef, and a unique _id field. At the connector layer, use the Kafka Connect REST API (GET Check Kafka Connect Connector Status) to confirm the connector task remains in RUNNING state after message production; a failed task indicates a schema or connectivity issue. At the AEP layer, navigate to the demo website Profile Viewer for the test profile's phone number or ECID and verify that the new experience event (e.g., eventType: callCenterInteractionKafka) appears under "Other Events" within seconds of production. If the event does not appear, check that the datasetId matches the dataset linked to the HTTP API Source Connector and that the _id field is unique — duplicate _id values are deduplicated and silently discarded.
Capability: Identity Resolution
Three-layer validation for the Snowflake Kafka Connector mirrors the AEP pattern exactly. At the producer layer, use kafka-console-producer.sh to send a test JSON event to the Kafka topic (same command as the AEP exercise), including a unique _id, timestamp, and a known identity value such as a phone number. At the connector layer, GET the connector status via the Kafka Connect REST API (GET /connectors/<name>/status) and confirm RUNNING task state — identical to the AEP Sink Connector validation step. At the Snowflake layer, query the target Snowflake table for the test record using the known _id or phone number value. The _id uniqueness deduplication requirement is enforced by the Snowflake Kafka Connector's INGEST channel deduplication. The datasetId alignment requirement from AEP becomes a snowflake.schema.name and table routing configuration requirement. The three-layer pattern is fully portable and transfers directly from the AEP exercise.
Capability: Reverse-ETL (CDW-to-Destination Sync)
Parallel implementation not yet available.
Task-level sources
- technical-training/module15/index.md
- technical-training/module15/ex4.md
How is this implementation?
Sign-in-gated. Tomorrow morning's curriculum-ingestor consumes your feedback: "Inaccurate" queues the task for re-review, "needs update" queues it for a refresh, and "one vendor panel is wrong" re-drafts just that panel.