Stop Doing OLAP In Your OLTP Database


You built this table. I know you did. I’ve seen it in your code reviews for the last two decades.

You’re tracking history. You’re being responsible. This is mature engineering. You’re not just clobbering the price when it changes, you’re keeping a record. Business requirement fulfilled. Future you will thank past you. Auditors will smile. Your manager will be impressed.

Maybe you even created a stored procedure for tracking prices. In the running for that lead developer promotion, aren’t you? Of course, you need to block direct inserts and updates on the product_prices table. You remembered that, yes? Or is everyone logging into your database as postgres?

CREATE OR REPLACE PROCEDURE set_product_price(
    p_product_id    INT,
    p_price         DECIMAL,
    p_starts_at     TIMESTAMP
)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE product_prices
    SET ends_at = p_starts_at
    WHERE 1 = 1
        AND product_id = p_product_id
        AND ends_at = '9999-12-31 23:59:59.999';

    INSERT INTO product_prices (
        product_id,
        price,
        starts_at,
        ends_at
    )
    VALUES (
        p_product_id,
        p_price,
        p_starts_at,
        '9999-12-31 23:59:59.999'
    );

    COMMIT;
END;
$$;

It’s wrong. Not wrong the way your normalization schema was wrong. It’s wrong the way a Swiss Army knife is wrong when you need a chef’s knife. It will cut things. It will just cut them worse, slower, and you look like you don’t know what you’re doing.

OLTP and OLAP are not the same thing

Be precise. This is one of those things that actually matters.

OLTP (Online Transaction Processing) answers the question: What is true right now? It’s your production database, serving live traffic. When a customer adds something to their cart, your app needs to know the current price. Fast. Every response served hot in single-digit milliseconds. Deterministic. One answer.

OLAP (Online Analytical Processing) answers the question: What was true at some point, and how has it changed over time? It’s your data warehouse, serving reports and dashboards. When your finance team needs to know how a price has moved over the last two years, they want history. You’ve got time. Start the report generation, kick off a cube build, and go get a coffee.

Two questions. Two shapes of query. Two completely different access patterns.

That starts_at / ends_at pattern you’re using? That’s a Slowly Changing Dimension (Type 2). You’re describing a data warehouse. The people who invented it were not thinking about serving live application requests. They were thinking about analytical queries over historical time.

You took a data warehousing pattern and bolted it onto your operational database. Now every query pays the price.

The cost of being “responsible”

Here’s what you actually want when someone hits your checkout:

SELECT price FROM products WHERE product_id = ?

Here’s what you have to write now:

SELECT price
FROM product_prices
WHERE product_id = ?
  AND starts_at <= NOW()
  AND NOW() < ends_at

Every. Single. Time. Every query. Every developer. Every service that touches pricing. One mistake — one forgotten WHERE clause — and you’re serving a stale price from 2019. Nobody notices for three weeks.

We could add an is_current flag. Sure. You could. But then you’d just be more wrong. How do you future date expected price changes? Being the newest record does not guarantee being the current record. The is_current flag is trying to turn your SCD OLAP table back into an OLTP table and still failing.

A simple index on product_id doesn’t help you much when you also have to filter on date ranges. The query planner has to work harder. Your operational traffic is paying an analytical tax on every read.

It’s everywhere

Once you know what to look for, you’ll see this pattern all over production systems.

Employee compensation. salary with effective_on on a compensation table. “What does this person make right now?” should be a lookup. Instead it’s an expedition through every raise they’ve ever had.

Tax rates. tax_rate with date ranges on a jurisdiction table. How often do national, state, county, and local taxes change? Your checkout is doing a date-range filter on every single transaction to find the one row that’s current.

Role-based access control. user_roles with granted_at and revoked_at instead of just the current roles. “Can this user do this thing?” is the most latency-sensitive question in your application, and you’ve turned it into a temporal query. Congratulations.

Feature flags. feature_toggles with enabled_from and enabled_until. Your feature flag system is supposed to be sub-millisecond. You just added a date range filter to every flag check. Your infrastructure team will love you.

Addresses. Customer shipping addresses with starts_at. “Where do we ship this?” is operational. “Where did they used to live?” is analytical. One table is now trying to answer both, which means every shipment query filters by date.

The pattern behind all of these: someone asked “what if X changes?” and instead of designing for change — update the current value, record the event — they designed for time travel. Every row gets a validity window. Every query becomes a range scan. The simple question (“what is true now?”) becomes as expensive as the complex question (“what was true then?”).

The fix

It’s not complicated.

Operational table: products with price. That is the current price. Period. No date ranges. No is_current. No null-checking. One lookup, one answer.

History table: product_price_history with price, effective_at, created_at. Append-only. Never updated, only inserted. When the price changes: update products.price, insert a row into product_price_history. Two statements, one transaction.

Now the operational query is trivial:

SELECT price FROM products WHERE product_id = ?

And your analysts don’t get shortchanged. The history query is just as clean:

SELECT price, effective_at
FROM product_price_history
WHERE product_id = ?
ORDER BY effective_at ASC

Each table does one job. Each query is unambiguous. There’s no flag to keep in sync, no date range to remember, no subtle bug waiting to surface when someone forgets a WHERE clause.

But I need the historical price operationally

Every time. Someone always says this.

“What if a customer disputes their charge? I need to know what the price was when they placed their order.”

That’s not a pricing question. That’s an order question. The order_lines table stores the price at the time of sale. When the order is placed, you record the product’s unit_price right there on the line item. That value never changes. That is a fact about what happened. Not a query about what the current price is or was.

This is the same distinction I made about shipping labels in a previous article. A shipping_labels table with city and state isn’t denormalized. These are the values that were on the label when it shipped. The label is the fact. Same logic here. The order line is the fact. The price at the time of sale is a property of the order, not a temporal query over pricing history.

“What about auditing?”

Audit logs are append-only event records. They belong in their own table, or better, their own store. The history table is the audit trail, but it’s clearly separate from the operational table. Your auditors are not competing for the same data as your customers.

You (probably) don’t need a data warehouse

I know what you’re thinking: “This sounds like a data warehouse problem, and we don’t have one.”

Fine. You don’t need one to fix this. This is basic separation of concerns.

Same database. Same Postgres instance. Different tables, different concerns. products serves your application. product_price_history serves your analysts. Put them both in your existing database if that’s what you have. The point isn’t “buy a data warehouse.” The point is: stop making one table serve two masters.

A data warehouse is a better home for history tables eventually, because it’s optimized for analytical queries and won’t compete with your operational traffic for I/O. Redshift is a fundamentally different product from RDS for a reason. But that’s a future decision. The immediate decision is to stop letting your OLTP table pretend to be an OLAP table. Separate the concerns. Write the history table. Get the date ranges out of your operational schema.

Every table answers one question

When we talked about normalization, the rule was about columns: every non-key column should depend on the primary key, the whole key, and nothing but the key.

The history-tracking instinct is good. When prices change, you should know. The impulse to preserve that information is right. But the implementation is wrong. You don’t track history by making your operational table a timeline. You track history by recording events.

Your database has two jobs. Gotta keep ‘em separated.


Photo by Lukas Blazek.