Configure deterministic identity matching and stitching
Define identity namespaces, designate primary and secondary identity fields on XDM schemas, and validate that the identity graph correctly links profile fragments from multiple datasets and channels into a single unified profile.
This task produces a functioning identity graph that links anonymous behavioral events to authenticated profile records, enabling the real-time customer profile to present a unified view of a customer across all channels. The output is verifiable in the Profile UI: a single profile record that shows events from both pre-login and post-login sessions, attributes from multiple source datasets, and segment membership derived from the combined data.
Deterministic vs. probabilistic stitching. Deterministic stitching fires when two records share an exact, declared identity value — for example, the same ECID value appearing in both an anonymous event dataset and a registration event. No inference is required; the match is exact. Probabilistic stitching (device graph, co-op) is a separate, optional layer that infers identity links from behavioral signals when no shared key exists. This task covers deterministic configuration only; probabilistic methods introduce additional complexity and vendor-specific data-sharing agreements.
Namespace hierarchy. Every platform that implements identity resolution uses a namespace concept to prevent false matches between identifiers from different systems (e.g., a CRM customer ID of "12345" should not match an email-system message ID of "12345"). Declaring the correct namespace for each identity field is therefore a data-quality gate, not just a labeling exercise. In AEP, standard namespaces (ECID, Email, Phone) are pre-provisioned; custom namespaces (CRM ID, Loyalty ID) must be created in the Identity Service admin UI before they can be referenced in schemas.
Merge rules. The identity graph produces a set of linked fragments; the profile merge rule determines how fragments from different datasets are combined when a conflict exists (e.g., two datasets report different values for homeAddress.city). The default rule is "last-write-wins" per dataset priority; custom merge rules can prefer specific datasets or use union semantics. Merge rule selection affects which attribute values appear in the merged profile record.
Parallel viability (moderate). Deterministic identity resolution is available in Segment (Personas Identity Graph), Snowflake (identity join tables with Hightouch or dbt), and every major CDP. The namespace + exact-match pattern is universal; the UI surface for declaring namespaces and validating the graph varies. Phase 3 will document Segment Identity Graph configuration and a Snowflake identity resolution SQL pattern.
Side-by-side implementations
AEP Identity Service maintains a real-time identity graph that links all identity namespace–value pairs observed across datasets. Identity fields declared in XDM schemas (with the Identity checkbox and a namespace assignment such as Email, ECID, Phone, or a custom CRM namespace) are automatically indexed by Identity Service on ingestion. When two records share a common identifier — e.g., an anonymous ECID event and an authenticated email profile — Identity Service creates an edge in the graph, and the Real-time Customer Profile merges the corresponding fragments. The Profiles → Browse UI allows validation: enter a namespace and value, open the resulting profile, and inspect the Attributes, Events, and Segment Membership tabs. All four identity namespace types (ECID, Email, Phone, CRM ID) should be visible in the profile if data has been correctly ingested.
Capability: Identity Resolution
In a Snowflake-native identity graph, deterministic stitching is implemented as a set of JOIN queries — or a dedicated identity map table — that maps each known identifier (email, phone, loyalty ID, ECID) to a single canonical `person_id`. A dbt model defines the stitching logic: union all identity pairs from all event and profile source tables, then apply transitive closure to merge overlapping identifier clusters into one canonical ID. The resulting `identity_map` table is stored in Snowflake and joined into all downstream dbt models that need a unified person view. Namespace priority (preferring email over phone over ECID when multiple identifiers are present) is encoded as `CASE WHEN` ordering within the dbt model's deduplication logic.
Capability: Identity Resolution
Hightouch provides a native Identity Graph feature that resolves and merges profile records from multiple sources within the activation context. A Hightouch Identity Graph is configured by declaring resolution rules: two records match when they share the same value in a specified column (email, phone, or a custom CRM ID). The resolution algorithm applies transitive closure — if record A matches record B via email and record B matches record C via phone, all three records merge into a single canonical profile. The output canonical_id is exposed as a derived column in all Hightouch models that reference the Identity Graph, enabling downstream segments and syncs to operate on the merged profile without a separate SQL JOIN step. Namespace priority (prefer email over phone over anonymous_id when resolving conflicts) is configured as a priority-order list in the Identity Graph settings.
Capability: Identity Resolution
Task-level sources
- technical-training/module3/index.md
- technical-training/module3/ex1.md
- technical-training/module3/ex2.md
How is this implementation?
Sign-in-gated. Tomorrow morning's curriculum-ingestor consumes your feedback: "Inaccurate" queues the task for re-review, "needs update" queues it for a refresh, and "one vendor panel is wrong" re-drafts just that panel.