Place2Page
Why We Moved Streaming Events Out of Process Memory

Engineering

Why We Moved Streaming Events Out of Process Memory

The first streaming version worked on one worker, but a shared event store was the safer boundary once generation and stream requests could land on different processes.

The first version of streaming progress in Place2Page was reasonable.

Generation steps were stored in process memory, and the stream endpoint read that in-memory state and pushed events to the browser.

For a single process, that worked.

The trouble is that a lot of designs look stable right up until the moment they have to cross a process boundary.

What made the first version fragile

Two issues showed up quickly.

The first was cleanup. Completed stream state could linger in memory longer than intended, because cleanup depended on timing and later access patterns.

The second was more important.

Once background generation work and /stream requests were allowed to hit different workers, the system no longer had one shared in-memory truth.

That meant a user could open a perfectly valid running page and still miss live step or preview events, simply because the stream request was reading from a different process than the one writing them.

That is the kind of bug that makes a product feel random.

The real boundary was not the function call

This was the main lesson.

The interesting boundary was not "which helper publishes an event." It was "what storage can both sides trust."

If generation and streaming can happen on different workers, then process memory is the wrong contract.

It is too local. It hides failure modes until the system grows just enough to expose them.

The fix was to make events durable and shared

Instead of holding stream state in memory, Place2Page moved events into a database table.

The shape is intentionally simple:

  • project_id
  • seq
  • event_json
  • is_terminal
  • created_at

That small shift changed the runtime behavior in a useful way.

Now the generation worker appends events to shared storage. The stream endpoint reads new events by sequence. That gives every worker access to the same ordered event history instead of depending on whatever still happens to live inside one Python process.

What got better immediately

This design solved more than one problem at once.

It improved multi-worker correctness because both the writer and the reader now look at the same source of truth.

It made replays and future resume semantics easier to reason about because sequence numbers give the stream a clear ordering model. The current reconnect path still restarts from the beginning of the stream, but it now replays from shared state instead of worker-local memory.

It also made terminal behavior easier to reason about, because done and error are stored as explicit terminal events instead of as a temporary memory condition.

That is a much healthier contract than "keep this object around long enough and hope the right worker sees it."

What we accepted in return

This was not a free change.

The system now writes streaming events to the database, and the stream endpoint internally polls for new rows.

That means:

  • more database writes during generation
  • retention and cleanup policies now matter
  • the stream is "shared and durable" rather than "pure push with zero persistence"

That is a real trade-off.

But it was the right one for this product stage. Place2Page did not need a full event bus to get reliable cross-worker behavior. It needed a contract that survived ordinary scaling and reconnection scenarios.

Why this pattern fit the product

The generation stream is not an open-ended chat system. It is a bounded job with ordered status updates and a terminal end state.

That makes an append-only event log a natural fit.

We are not trying to synchronize dozens of live participants. We are trying to give one user a reliable answer to a simple question:

"What has happened in my generation job so far?"

Once you frame the problem that way, durable ordered events make more sense than worker-local memory.

Closing

The first streaming implementation was not wrong. It was local.

That is a useful distinction.

A lot of engineering work is really about noticing when a local assumption has become a distributed system assumption.

For Place2Page, moving stream events into shared storage was the point where the architecture caught up with the actual runtime shape of the product.

Sources