A new real-time recommendation source for LinkedIn's feed that surfaces content based on a member's most recent interactions on the platform.
Members interact with content on the feed constantly: clicks, reactions, dwell time, comments. Each of these is a strong signal about what the member finds interesting right now. But the feed's existing retrieval sources were either too coarse or too stale to capture that signal in real time. A member could engage deeply with a particular topic, refresh their feed seconds later, and see content unrelated to what they had just been reading.
The opportunity was to turn a fresh interaction into a recommendation almost immediately: see what the member just engaged with, find content semantically similar to it, surface that as the next candidate. We called this Activity History, and it became a new first-stage retrieval source for the feed.
The primitive is straightforward: generate an embedding for the post, then find its nearest neighbors. An LLM converts a piece of content into a dense vector. Two pieces of content that are semantically similar end up close to each other in vector space. Cosine similarity measures that closeness.
Everything downstream of that primitive is the engineering work of making it operate at scale, in real time, with high coverage and low cost. The next three sections walk through how the system evolved across three iterations, each motivated by a problem the previous one exposed.
The first version of Activity History was an item-to-item lookup. When a member visited the feed, the system fetched the IDs of posts the member had interacted with in the last three days, looked up each post's pre-computed query embedding from a key-value store, and sent the embedding to a GPU-accelerated vector retrieval service. The service returned the nearest items in embedding space, the candidates we surfaced.
Reused the existing embedding inference service, the existing key-value store, and the existing vector retrieval service. Built a new lookup path keyed by recent post IDs.
Coverage was the dominant problem in Phase 1. The fix had two parts: re-key the query-embedding store by member ID instead of post ID, and generate the embedding in the moment a member interacts, so the data is fresh and the lookup at serve time is a single read keyed off the member.
That required a new nearline streaming pipeline: consume the interaction event stream, filter for eligible interactions, call the LLM inference service to generate the embedding, and write it to the member-keyed store. Lookback was extended from 3 days to 7.
New: an interaction-driven embedding pipeline and a member-keyed query store. Reused: inference, retrieval, and item-embedding services.
Phase 2 fixed coverage but created a new problem: the inference service was now being called on every eligible interaction, including for items that already had an embedding generated as part of the item-creation flow. That was wasted compute. The fix was to add an intermediate existence check. Before invoking the inference service, check whether the embedding already exists in the item-embedding store, and if it does, reuse it directly.
Identical shape to Phase 2, with a short-circuit lookup against the item embedding store before any inference call.
Activity History is now one of a few "high-quality" sources for out-of-network content recommendations on LinkedIn's Feed (Majority Member Experience). The pipelines and infrastructure built along the way are being reused to power upcoming supervised item-to-item content retrieval experiments.