← Back to home

Realtime Interaction-Based Content Recommendations

A new real-time recommendation source for LinkedIn's feed that surfaces content based on a member's most recent interactions on the platform.

The problem

Members interact with content on the feed constantly: clicks, reactions, dwell time, comments. Each of these is a strong signal about what the member finds interesting right now. But the feed's existing retrieval sources were either too coarse or too stale to capture that signal in real time. A member could engage deeply with a particular topic, refresh their feed seconds later, and see content unrelated to what they had just been reading.

The opportunity was to turn a fresh interaction into a recommendation almost immediately: see what the member just engaged with, find content semantically similar to it, surface that as the next candidate. We called this Activity History, and it became a new first-stage retrieval source for the feed.

The core idea

The primitive is straightforward: generate an embedding for the post, then find its nearest neighbors. An LLM converts a piece of content into a dense vector. Two pieces of content that are semantically similar end up close to each other in vector space. Cosine similarity measures that closeness.

embedding space Post 1 (member just interacted) Post 2 (recommend) Post 3 (recommend) Post 4 (recommend) unrelated content Member interacts with Post 1 on Feed React, Comment, Save, etc. Fetch Post 1's LLM Embedding [ 0.123, 0.453, 0.873, … ] Fetch Similar Posts via Cosine Similarity

Everything downstream of that primitive is the engineering work of making it operate at scale, in real time, with high coverage and low cost. The next three sections walk through how the system evolved across three iterations, each motivated by a problem the previous one exposed.

Phase 1: Item-to-item retrieval

The first version of Activity History was an item-to-item lookup. When a member visited the feed, the system fetched the IDs of posts the member had interacted with in the last three days, looked up each post's pre-computed query embedding from a key-value store, and sent the embedding to a GPU-accelerated vector retrieval service. The service returned the nearest items in embedding space, the candidates we surfaced.

Architecture

Reused the existing embedding inference service, the existing key-value store, and the existing vector retrieval service. Built a new lookup path keyed by recent post IDs.

Member visits Feed Fetch posts member recently interacted with 3-day lookback window Query embedding store keyed by post ID GPU-based vector retrieval service Candidates LLM Embedding Inference upon post creation/update Item embedding store

Pros

  • Online query-embedding lookup, low latency at serve time
  • Reused existing inference, storage, and retrieval services

Cons

  • Low coverage. Only ~47% of requests returned a candidate.
  • 3-day lookback meant members with infrequent interactions were skipped
  • Many recent posts had no precomputed embedding
  • Duplicated data between query- and item-embedding stores

Phase 2: Member-to-item retrieval

Coverage was the dominant problem in Phase 1. The fix had two parts: re-key the query-embedding store by member ID instead of post ID, and generate the embedding in the moment a member interacts, so the data is fresh and the lookup at serve time is a single read keyed off the member.

That required a new nearline streaming pipeline: consume the interaction event stream, filter for eligible interactions, call the LLM inference service to generate the embedding, and write it to the member-keyed store. Lookback was extended from 3 days to 7.

Architecture

New: an interaction-driven embedding pipeline and a member-keyed query store. Reused: inference, retrieval, and item-embedding services.

Member visits feed Member interacts with Post 1 Interaction stream + eligibility filter LLM embedding inference Member-keyed query embedding store 7-day lookback GPU vector retrieval service Item embedding store Candidates

Pros

  • Coverage jumped. 7-day lookback + member-keyed lookup
  • One round-trip at serve time instead of two
  • Embeddings stay fresh, generated the moment a member interacts

Cons

  • Big increase in QPS on the inference service: every eligible interaction triggered an embedding call
  • New storage capacity needed for the member-keyed query store
Empty rate
52.9%31.7%
Share of feed requests where Activity History returned zero candidates. Measured as 1-day pre-ramp vs. 1-day post-ramp on a single plugin.
Lookback window
3d7d
How far back the system considers a member's interactions when generating retrieval candidates.

Phase 3: Cutting inference cost

Phase 2 fixed coverage but created a new problem: the inference service was now being called on every eligible interaction, including for items that already had an embedding generated as part of the item-creation flow. That was wasted compute. The fix was to add an intermediate existence check. Before invoking the inference service, check whether the embedding already exists in the item-embedding store, and if it does, reuse it directly.

Architecture

Identical shape to Phase 2, with a short-circuit lookup against the item embedding store before any inference call.

Member visits feed Member interacts with Post 1 Interaction stream + eligibility filter Existence check embedding exists? → skip inference LLM inference only if needed Member-keyed query embedding store GPU vector retrieval service Item embedding store Candidates short-circuit

Pros

  • ~3× drop in inference QPS with no loss of coverage
  • Short-circuit logic is cheap: one extra read, no compute

Cons

  • Slight increase in read load on the item-embedding store
LLM inference QPS
~3× reduction
Measured across three production fabrics over a 5-day window post-ramp. Average call rate fell roughly 3×.
Coverage
no regression
Empty rate stayed at the Phase 2 level (~31.7%). The cost optimization was lossless.

Impact

Activity History is now one of a few "high-quality" sources for out-of-network content recommendations on LinkedIn's Feed (Majority Member Experience). The pipelines and infrastructure built along the way are being reused to power upcoming supervised item-to-item content retrieval experiments.