Getting Started with the GitHub Copilot SDK — Part 7: Infinite Sessions & Context Compaction

This is Part 7 of my hands-on series on the GitHub Copilot SDK for .NET. The
companion code lives on GitHub at
egarim/GettingStartedWithGithubCopilotSDK
— each numbered folder is a demo you can run on its own.

Today we tackle the wall every chat app eventually hits:

The context window is finite. Your conversation isn't.

The problem

Every LLM has a maximum number of tokens it can hold in context. Send enough
messages and you run out of room. The model starts forgetting, or the request just
fails.

For a real production chat — a support bot, an agent that runs all day — that's a
dealbreaker. You can't tell users "sorry, this conversation is full."

The SDK's answer is Infinite Sessions.

How compaction works

When the context fills up, the SDK compacts the conversation: it summarizes the
older messages, drops the now-redundant tokens, and keeps the key information. The
session keeps going as if nothing happened.

  Message 1 ─┐
  Message 2  │  context fills up...
  ...        │
  Message N ─┘
       │
       ▼  threshold reached
  SessionCompactionStartEvent
  → SDK summarizes old messages, removes N tokens
  SessionCompactionCompleteEvent { Success, TokensRemoved }
       │
       ▼
  Message N+1 works normally

It runs in the background. The user never sees a stall — they just keep typing.

Turning it on

You enable it per session, on SessionConfig, via InfiniteSessionConfig:

var session = await client.CreateSessionAsync(new SessionConfig
{
    InfiniteSessions = new InfiniteSessionConfig
    {
        Enabled = true,
        BackgroundCompactionThreshold = 0.005,  // 0.5% → start compacting in the background
        BufferExhaustionThreshold     = 0.01    // 1%   → block and compact before continuing
    }
});

Two thresholds, both expressed as a fraction of the context window:

BackgroundCompactionThreshold — when to quietly start compacting while the
user keeps going.
BufferExhaustionThreshold — the hard stop: block, compact, then accept the
next message.

Those 0.005 and 0.01 values are absurdly low — on purpose. They make the demo
compact almost immediately so you can watch it. In production, use the defaults
(or sensible high values). You don't want to compact every other message.

Watching it happen

Compaction fires two events. Subscribe with session.On and you can see exactly when
the SDK kicks in and how much it reclaimed:

session.On(evt =>
{
    if (evt is SessionCompactionStartEvent)
        Console.WriteLine("  * Compaction started!");

    if (evt is SessionCompactionCompleteEvent c)
        Console.WriteLine(quot;  OK Success: {c.Data.Success}, tokens removed: {c.Data.TokensRemoved}");
});

var a1 = await session.SendAndWaitAsync(new MessageOptions
{
    Prompt = "Tell me a long story about a dragon. Be very detailed."
});

Ask for something big and verbose, and with those tiny thresholds you'll see the
compaction line print while the model is still answering normally. That's the whole
point: the work happens out of band.

Proving it's actually compaction

Easy to fool yourself here. So the demo does the inverse to establish a baseline —
turn compaction off and confirm zero events fire:

var compactionEvents = new List<SessionEvent>();

var session = await client.CreateSessionAsync(new SessionConfig
{
    InfiniteSessions = new InfiniteSessionConfig { Enabled = false }
});

session.On(evt =>
{
    if (evt is SessionCompactionStartEvent or SessionCompactionCompleteEvent)
        compactionEvents.Add(evt);
});

await session.SendAndWaitAsync(new MessageOptions { Prompt = "What is 2+2?" });
Console.WriteLine(quot;  Compaction events: {compactionEvents.Count}"); // expected: 0

Enabled = false → the counter stays at 0. Now you know the events from the
previous step really came from compaction, not from something else in the pipeline.

The production shape

Put it together — infinite session, streaming, an interactive loop — and you've got
the skeleton of a real chat:

var compactionCount = 0;

await using var session = await client.CreateSessionAsync(new SessionConfig
{
    Streaming = true,
    InfiniteSessions = new InfiniteSessionConfig
    {
        Enabled = true,
        BackgroundCompactionThreshold = 0.005,
        BufferExhaustionThreshold     = 0.01
    }
});

session.On(evt =>
{
    if (evt is SessionCompactionCompleteEvent c)
        Console.WriteLine(quot;\n  OK compaction #{++compactionCount} — removed {c.Data.TokensRemoved} tokens");
});

while (true)
{
    Console.Write(quot;  You [{compactionCount} compactions]: ");
    var input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input)) break;

    await session.SendAsync(new MessageOptions { Prompt = input });
    // ...stream the AssistantMessageDeltaEvent deltas, wait on SessionIdleEvent
}

Chat as long as you want. The prompt shows a running compaction counter, the deltas
stream in real time, and the SDK manages memory underneath you. The user experience
never breaks — which is exactly how it should feel.

Takeaways

Context windows are finite. Infinite Sessions make conversations effectively
unlimited by compacting old messages.
Enable it via InfiniteSessionConfig on SessionConfig; tune with
BackgroundCompactionThreshold and BufferExhaustionThreshold (fractions of the
window).
Watch SessionCompactionStartEvent / SessionCompactionCompleteEvent —
TokensRemoved tells you how much was reclaimed.
The demo's tiny thresholds are for seeing it. In production, use the defaults.
Compaction runs in the background. The user keeps typing; nothing stalls.

Next up, in Part 8 — Skill Loading & Configuration,
we stop managing memory and start extending the model itself — loading Skills to give
your agent new capabilities.

Want to follow along? You'll need the .NET 10 SDK and GitHub Copilot access. Then
dotnet run --project 07.CompactionDemo from the
course repo.