Reverse ETL: Database to Events

Merge database records into your event pipeline with synthetic events — your way, with YAML.

What is Reverse ETL?

Reverse ETL connects your database systems to your CDP, transforming database records into event-like streams that complement your real-time event capture.

Whether you're working with orders from your MySQL database, users from your Postgres system, or product catalogs from your SQL Server — Growcado lets you map these to CDP events with simple YAML configuration.

🔄 Why it matters

  • Unlock historical data that existed before your tracking implementation

  • Enrich your customer profiles with data from systems that don't generate events

  • Maintain consistency across all customer touchpoints and systems

  • Stitch together fragmented identities by connecting database records to known users

No engineers or complex ETL pipelines needed. Just clean YAML and your database connection.

Config: Defining Synthetic Event Sources

In your YAML config, you'll define which database tables should be mapped to events, how to identify users, and what properties to include.

Here's an example:

synthetic_event_mappings:
  orders_to_completed_events:
    enabled: true
    source:
      schema: publuc
      table: store_mysql__order
      primary_key: order_id
      timestamp_field: order_cdt
    event:
      type: OrderCompleted
    condition: "order_status > 0"
    identity:
      user_id: user_id_fk
    traits:
      mobile_number: mobile

🛠️ How it works:

  • Source: Tell Growcado where to find your data (schema, table, keys)

  • Event: Define what event type this data should become

  • Condition: Only generate events for records that match your criteria

  • Identity: Map database columns to customer identifiers

  • Traits: Extract fields that should become customer traits

  • Properties: Control how database fields map to event properties

Example in action

Let's walk through a real-world example of mapping order data to synthetic events:

  1. You have a MySQL orders table with customer purchases

  2. You want these orders to appear as OrderCompleted events in your CDP

  3. Each order has line items stored in a JSON column

Growcado will:

  1. Connect to your database and read the orders table

  2. Convert each order to a synthetic event

  3. Map all the identifiers to the correct profiles

  4. Transform line items into properly structured product arrays

  5. Make these events indistinguishable from real tracked events

🔍 Advanced Configuration

Property Mapping

For each database column, you can control exactly how it appears in your events:

properties:
  order_id: 
    target: order_id
    type: string
  order_total_price: 
    target: total
    type: numeric
  order_shipping_price: 
    target: shipping
    type: numeric

Handling Complex Data

Work with nested JSON data using JSON mappings:

order_item_json: 
  target: products
  type: json_array
  json_mappings:
    - source_key: item_id
      target_key: product_id
      type: numeric
    - source_key: item_name
      target_key: name
      type: string
    - source_key: category_name
      target_key: category
      type: string

Deduplication

Prevent duplicate events when processing the same data multiple times:

deduplication:
  enabled: true
  match_keys:
    - source: order_id
      target: property_order_id

Adding Constants

Add fixed values to every generated event:

constants:
  currency: USD

Key Features

  • Bidirectional identity resolution — connects database records to existing profiles

  • Delta processing — only processes new or changed records

  • Full history import — backfill your CDP with historical data

  • Custom transforms — advanced property mapping and typing

  • Schema enforcement — ensures data consistency across sources

  • Just-in-time generation — creates events only when needed

📊 Where Synthetic Events Live

Once processed, synthetic events become indistinguishable from regular tracked events in your warehouse:

cdp_events table structure:

event_id
event_type
user_id
timestamp
properties

evt_a9f2...

OrderCompleted

user_789

2023-05-14T15:32:11

{"order_id":"12345","total":59.99,...}

evt_b3c1...

PageViewed

user_789

2023-05-14T15:30:22

{"page_path":"/checkout",...}

evt_d7e4...

OrderCompleted

user_456

2023-05-14T14:18:05

{"order_id":"12344","total":29.99,...}

The only difference? A metadata field (context_source: synthetic) that helps you trace data lineage when needed.

All computed traits, identity resolution, and segment memberships apply equally to both synthetic and regular events.

Last updated