Ingest batch data files via source connectors

This task produces a completed batch of records in a platform dataset, typically representing an offline data source such as a CRM export, loyalty database snapshot, or call-center interaction log. The output is a dataset with a non-zero record count, a successful Batch ID, and optionally governance labels applied to sensitive fields. Batch ingestion is the standard path for onboarding historical data that does not arrive via a real-time SDK.

Two ingestion paths. The UI Workflow ("Map CSV to XDM Schema") is appropriate for ad-hoc, one-time file loads performed by analysts or data engineers who are not writing code. It provides an interactive column-mapping interface with platform-suggested mappings. The Data Landing Zone (DLZ) path is appropriate for automated pipelines: files are pushed to a SAS-authenticated Azure Blob container using AzCopy or any Azure SDK, then a source connector dataflow maps columns to XDM fields and submits the batch. Both paths enforce the same XDM schema contract.

Column mapping decisions. Every source column must be mapped to an XDM target field path (e.g., first_name → person.name.firstName). Unmapped columns are silently dropped. The primary identity field in the target schema (typically crmId or emailId) must be mapped to ensure the batch records can be linked to existing profiles. The _id and timestamp fields are required for ExperienceEvent schemas; Profile schemas require only a primary identity.

Data governance. AEP allows governance labels (contractual, identity, sensitive) to be applied at dataset or field level immediately after ingestion. Labels applied at the dataset level cascade to all fields; field-level labels override the dataset default. This labeling controls downstream activation — a field labeled I1 (directly identifiable) cannot be exported to a destination without a matching policy exemption.

Parallel viability (high). CSV batch ingestion into a schema-governed table is available in every modern data platform: Snowflake's COPY INTO, BigQuery's batch load job, dbt seeds, Segment's CSV Import. The practitioner skill — column mapping, identity field designation, governance labeling — maps cleanly across platforms. Phase 3 will document Snowflake COPY INTO with identity column configuration.

Ingest batch data files via source connectors

Sources