Engineering

Event Schema Design for SaaS: The Decisions That Will Haunt You Later

Priya Nakashima July 15, 2025 11 min read

Abstract schema diagram with structured data nodes

The engineering team at a growing B2B SaaS product had been tracking events for 18 months before they realized the problem. Their original page_viewed event used the property page_name. Three developers later, some pages were tracked with page_title, some with page_name, and two were tracked with screen_name (a holdover from a mobile SDK someone had copy-pasted). Their funnel analysis for a specific user flow returned different numbers depending on which analytics query you ran — not because the data was wrong, but because the schema had diverged and nobody had noticed.

That's a medium-severity schema problem. The severe version is when a schema decision made on day 30 of instrumentation creates a structural limitation that prevents you from answering a key product question two years later — and the fix requires re-instrumenting and losing historical continuity. Event schema decisions have long half-lives. Most teams underinvest in them early and pay interest for years. This article covers the decisions that matter most and how to make them with future use cases in mind.

The Three Layers of Your Event Schema

Before getting into specific decisions, it helps to be clear about what an event schema actually consists of. There are three layers, and they need to be designed together:

Layer 1: Event Taxonomy

The names and categories of your events. What actions are tracked? How are they named? Is the naming convention consistent across the product? The event taxonomy is the most visible layer of your schema and the one most commonly debated in tracking plan reviews.

Layer 2: Property Schema

The properties attached to each event. What contextual information is captured with each action? Are property names consistent across events that share the same concept? Are property data types consistent? This layer is less visible but often more consequential — the property schema determines what questions you can answer in your analytics UI without writing custom SQL.

Layer 3: Identity Resolution

How users are identified across sessions, devices, and platforms. How anonymous users are merged with identified users. How group-level (account-level) identity is tracked separately from user-level identity. Identity resolution is the foundational layer — if it's broken, no amount of good event naming or clean properties will save your analysis.

Decision 1: Noun-Verb vs. Verb-Noun Naming

The most common event naming debate is whether to name events as Object Action (noun-verb: "Project Created") or Action Object (verb-noun: "Created Project"). Both conventions have advocates. What matters more than which you pick is that you pick one and enforce it.

Our default recommendation is noun-verb, for one practical reason: when you have 150+ events in your tracking plan and you're searching for "all events related to projects," noun-verb naming groups them alphabetically. Project Created, Project Deleted, Project Shared, Project Updated are adjacent in any alphabetically sorted list. Created Project, Deleted Project, Shared Project, Updated Project are scattered across the alphabet by verb.

The exception is if your product is deeply action-oriented and team members naturally describe behavior by action first. Consistency with how your team talks about the product is worth something.

Decision 2: Granularity — How Specific Is Each Event?

This is the decision that haunts teams most often. Consider a product that has multiple types of exports. You can track this as:

// Option A: High granularity (separate event per export type)
track("CSV Export Initiated")
track("PDF Export Initiated")
track("API Export Initiated")

// Option B: Low granularity (single event, type as property)
track("Export Initiated", { export_type: "csv" })
track("Export Initiated", { export_type: "pdf" })
track("Export Initiated", { export_type: "api" })

Option B is almost always the right call. Here's why: with Option A, your analytics UI funnel analysis treats each event as a separate funnel step, making it impossible to ask "how many users exported anything" without a union query. With Option B, you can filter by the export_type property to analyze specific types, or drop the filter to analyze exports in aggregate. Option A can answer the specific questions; Option B can answer both the specific and the aggregate questions.

The corollary: any time you're tempted to create separate event names for variants of the same action, ask whether the variant is better captured as a property. Most of the time, the answer is yes.

Decision 3: Event Properties — The Schema-Within-the-Schema

Property design is where most instrumentation debt accumulates silently. The decisions that matter:

Property Naming Consistency

If a concept appears across multiple events, it must have the same property name in every event. "The resource being acted on" might be a project_id in some events and a resource_id in others and a document_id in a third set. This makes cross-event analysis requiring the common concept impossible without a lookup table. The fix is to decide once: is the universal identifier for a resource going to be called object_id? Then use that everywhere, with additional properties like object_type: "project" to provide the specific context.

Property Data Type Enforcement

A boolean property that sometimes comes in as a boolean (true/false) and sometimes as a string ("true"/"false") will cause silent analysis errors in most analytics platforms. Type coercion happens inconsistently across platforms, and the inconsistency is usually invisible until someone asks a question that requires filtering on that property. Define property types in your tracking plan and enforce them in SDK initialization where possible.

Required vs. Optional Properties

Every event should have a small set of required properties — properties that are always present and never null. These should include: the user ID (or anonymous ID), the timestamp, and any context properties your analysis will need for filtering (app version, platform, environment). Optional properties can be null or missing. The discipline is never having a "required" property that sometimes doesn't fire — that's a bug dressed up as a schema decision.

Decision 4: User Identity and the Alias Problem

User identification is where event schema intersects with data quality in the most consequential way. The standard flow:

// Anonymous user lands on page
analytics.page()  // fires with anonymous ID (auto-generated)

// User starts signup form
analytics.track("Signup Started", { plan_tier: "free" })

// User completes signup — THIS is the critical moment
analytics.identify(userId, {
  email: "[email protected]",
  plan_tier: "free",
  signup_date: "2025-07-15"
})

// If you're using a CDP, alias the anonymous ID to the user ID
analytics.alias(userId)  // merges anonymous pre-signup events to identified user

The most common identity error: identify() fires only on the signup completion page, but not on subsequent logins. This means users who return to the product on a new device or new session get a fresh anonymous ID, and their return sessions are disconnected from their previous identified sessions. identify() must fire on every login, not just on signup.

For B2B SaaS with team accounts, you also need group() calls to track account-level membership:

// After login or signup, set the account context
analytics.group(accountId, {
  account_name: "Acme Corp",
  plan_tier: "pro",
  account_created_date: "2024-03-10",
  seat_count: 12
})

Without group-level tracking, you cannot answer account-level retention questions — which is a significant gap for any B2B product where churn is measured at the account level, not the user level.

Decision 5: The Tracking Plan as a Living Contract

The biggest operational failure in event schema management isn't a bad initial decision — it's the absence of a change process. Schemas drift when engineers add events ad-hoc, when mobile and web teams implement the same event differently, and when "temporary" tracking additions get shipped and forgotten.

A minimal tracking plan that actually stays current has three elements:

A single source of truth document — usually a spreadsheet or a dedicated tool like Avo or Iteratively — that lists every event, its trigger condition, its required properties, and the owner (which squad or PM is responsible for its accuracy). This document must be the authoritative reference, not optional reading.
A PR review gate — any code change that adds, modifies, or removes a track(), identify(), group(), or page() call should require a review from the analytics owner. This doesn't need to be heavyweight; it just needs to exist. Without it, schema changes go to production silently.
A schema validation step — most CDPs (Segment, RudderStack) have a schema enforcement feature that rejects events that don't match a registered schema. Turn it on, but start in "warning mode" before switching to "blocking mode." You'll discover schema drift you didn't know existed.

The Decisions You Can't Easily Undo

Most schema decisions can be fixed with some migration pain. A few are genuinely difficult to reverse:

Event naming convention. If you've been firing User Signed Up for two years and want to change to Signup Completed, you'll have a historical discontinuity in every funnel and retention chart. The old event name can't retroactively be renamed in most analytics platforms. The convention you ship with is the one you'll live with.

User ID strategy. If you ship with email addresses as user IDs (a common shortcut in early instrumentation) and later want to use account-scoped UUIDs, the migration requires re-aliasing every historical user record. This is not impossible, but it's expensive and error-prone.

Timestamp semantics. Are you tracking server-side event timestamps or client-side? If client-side, are you accounting for clock drift and timezone inconsistencies? This matters for any time-windowed analysis (retention windows, activation windows). The analytics infrastructure you choose will make assumptions about this; make sure they match your actual instrumentation.

We're not saying you need a perfect schema on day one — that's not realistic and the constraints of early-stage products mean some pragmatic shortcuts are appropriate. We're saying the shortcuts with the longest half-lives (naming conventions, identity strategy, user ID choice) are worth getting right from the start, even if everything else is iterative.

The event schema you design in the first 60 days of instrumentation will still be answering — or failing to answer — questions in year three. The investment in getting it right is one of the highest-leverage technical decisions a product team makes, and it's usually treated as the lowest-urgency item in the sprint backlog right up until it isn't.

← Back to Blog