Realtime Interaction-Based Content Recommendations

The problem

Members interact with content on the feed constantly: clicks, reactions, dwell time, comments. Each of these is a strong signal about what the member finds interesting right now. But the feed's existing retrieval sources were either too coarse or too stale to capture that signal in real time. A member could engage deeply with a particular topic, refresh their feed seconds later, and see content unrelated to what they had just been reading.

The opportunity was to turn a fresh interaction into a recommendation almost immediately: see what the member just engaged with, find content semantically similar to it, surface that as the next candidate. We called this Activity History, and it became a new first-stage retrieval source for the feed.

The core idea

The primitive is straightforward: generate an embedding for the post, then find its nearest neighbors. An LLM converts a piece of content into a dense vector. Two pieces of content that are semantically similar end up close to each other in vector space. Cosine similarity measures that closeness.

Everything downstream of that primitive is the engineering work of making it operate at scale, in real time, with high coverage and low cost. The next three sections walk through how the system evolved across three iterations, each motivated by a problem the previous one exposed.

Phase 1: Item-to-item retrieval

The first version of Activity History was an item-to-item lookup. When a member visited the feed, the system fetched the IDs of posts the member had interacted with in the last three days, looked up each post's pre-computed query embedding from a key-value store, and sent the embedding to a GPU-accelerated vector retrieval service. The service returned the nearest items in embedding space, the candidates we surfaced.

Architecture

Reused the existing embedding inference service, the existing key-value store, and the existing vector retrieval service. Built a new lookup path keyed by recent post IDs.

Pros

Online query-embedding lookup, low latency at serve time
Reused existing inference, storage, and retrieval services

Cons

Low coverage. Only ~47% of requests returned a candidate.
3-day lookback meant members with infrequent interactions were skipped
Many recent posts had no precomputed embedding
Duplicated data between query- and item-embedding stores

Phase 2: Member-to-item retrieval

Coverage was the dominant problem in Phase 1. The fix had two parts: re-key the query-embedding store by member ID instead of post ID, and generate the embedding in the moment a member interacts, so the data is fresh and the lookup at serve time is a single read keyed off the member.

That required a new nearline streaming pipeline: consume the interaction event stream, filter for eligible interactions, call the LLM inference service to generate the embedding, and write it to the member-keyed store. Lookback was extended from 3 days to 7.

Architecture

New: an interaction-driven embedding pipeline and a member-keyed query store. Reused: inference, retrieval, and item-embedding services.

Pros

Coverage jumped. 7-day lookback + member-keyed lookup
One round-trip at serve time instead of two
Embeddings stay fresh, generated the moment a member interacts

Cons

Big increase in QPS on the inference service: every eligible interaction triggered an embedding call
New storage capacity needed for the member-keyed query store

Empty rate

52.9%→31.7%

Share of feed requests where Activity History returned zero candidates. Measured as 1-day pre-ramp vs. 1-day post-ramp on a single plugin.

Lookback window

3d→7d

How far back the system considers a member's interactions when generating retrieval candidates.

Phase 3: Cutting inference cost

Phase 2 fixed coverage but created a new problem: the inference service was now being called on every eligible interaction, including for items that already had an embedding generated as part of the item-creation flow. That was wasted compute. The fix was to add an intermediate existence check. Before invoking the inference service, check whether the embedding already exists in the item-embedding store, and if it does, reuse it directly.

Architecture

Identical shape to Phase 2, with a short-circuit lookup against the item embedding store before any inference call.

Pros

~3× drop in inference QPS with no loss of coverage
Short-circuit logic is cheap: one extra read, no compute

Cons

Slight increase in read load on the item-embedding store

LLM inference QPS

~3× reduction

Measured across three production fabrics over a 5-day window post-ramp. Average call rate fell roughly 3×.

Coverage

no regression

Empty rate stayed at the Phase 2 level (~31.7%). The cost optimization was lossless.

Impact

Activity History is now one of a few "high-quality" sources for out-of-network content recommendations on LinkedIn's Feed (Majority Member Experience). The pipelines and infrastructure built along the way are being reused to power upcoming supervised item-to-item content retrieval experiments.