Closing the Loop (Part 2): So Far, So Good — and Yes, It’s Token Hungry

by Joche Ojeda | Feb 9, 2026 | A.I

I wrote my previous article about closing the loop for agentic development earlier this week, although the ideas themselves have been evolving for several days. This new piece is simply a progress report: how the approach is working in practice, what I’ve built so far, and what I’m learning as I push deeper into this workflow.

Short version: it’s working.
Long version: it’s working really well — but it’s also incredibly token-hungry.

Let’s talk about it.

A Familiar Benchmark: The Activity Stream Problem

Whenever I want to test a new development approach, I go back to a problem I know extremely well: building an activity stream.

An activity stream is basically the engine of a social network — posts, reactions, notifications, timelines, relationships. It touches everything:

Backend logic
UI behavior
Realtime updates
State management
Edge cases everywhere

I’ve implemented this many times before, so I know exactly how it should behave. That makes it the perfect benchmark for agentic development. If the AI handles this correctly, I know the workflow is solid.

This time, I used it to test the closing-the-loop concept.

The Current Setup

So far, I’ve built two main pieces:

An MCP-based project
A Blazor application implementing the activity stream

But the real experiment isn’t the app itself — it’s the workflow.

Instead of manually testing and debugging, I fully committed to this idea:

The AI writes, tests, observes, corrects, and repeats — without me acting as the middleman.

So I told Copilot very clearly:

Don’t ask me to test anything
You run the tests
You fix the issues
You verify the results

To make that possible, I wired everything together:

Playwright MCP for automated UI testing
Serilog logging to the file system
Screenshot capture of the UI during tests
Instructions to analyze logs and fix issues automatically

So the loop becomes:

write → test → observe → fix → retest

And honestly, I love it.

My Surface Is Working. I’m Not Touching It.

Here’s the funny part.

I’m writing this article on my MacBook Air.

Why?

Because my main development machine — a Microsoft Surface laptop — is currently busy running the entire loop by itself.

I told Copilot to open the browser and actually execute the tests visually. So it’s navigating the UI, filling forms, clicking buttons, taking screenshots… all by itself.

And I don’t want to touch that machine while it’s working.

It feels like watching a robot doing your job. You don’t interrupt it mid-task. You just observe.

So I switched computers and thought: “Okay, this is a perfect moment to write about what’s happening.”

That alone says a lot about where this workflow is heading.

Watching the Loop Close

Once everything was wired together, I let it run.

The agent:

Writes code
Runs Playwright tests
Reads logs
Reviews screenshots
Detects issues
Fixes them
Runs again

Seeing the system self-correct without constant intervention is incredibly satisfying.

In traditional AI-assisted development, you often end up exhausted:

The AI gets stuck
You explain the issue
It half-fixes it
You explain again
Something else breaks

You become the translator and debugger for the model.

With a self-correcting loop, that burden drops dramatically. The system can fail, observe, and recover on its own.

That changes everything.

The Token Problem (Yes, It’s Real)

There is one downside: this workflow is extremely token hungry.

Last month I used roughly 700% more tokens than usual. This month, and we’re only around February 8–9, I’ve already used about 200% of my normal limits.

Why so expensive?

Because the loop never sleeps:

Test execution
Log analysis
Screenshot interpretation
Code rewriting
Retesting
Iteration

Every cycle consumes tokens. And when the system is autonomous, those cycles happen constantly.

Model Choice Matters More Than You Think

Another important detail: not all models consume tokens equally inside Copilot.

Some models count as:

3× usage
1× usage
0.33× usage
0× usage

For example:

Some Anthropic models are extremely good for testing and reasoning
But they can count as 3× token usage
Others are cheaper but weaker
Some models (like GPT-4 Mini or GPT-4o in certain Copilot tiers) count as 0× toward limits

At some point I actually hit my token limits and Copilot basically said: “Come back later.”

It should reset in about 24 hours, but in the meantime I switched to the 0× token models just to keep the loop running.

The difference in quality is noticeable.

The heavier models are much better at:

Debugging
Understanding logs
Self-correcting
Complex reasoning

The lighter or free models can still work, but they struggle more with autonomous correction.

So model selection isn’t just about intelligence — it’s about token economics.

Why It’s Still Worth It

Yes, this approach consumes more tokens.

But compare that to the alternative:

Sitting there manually testing
Explaining the same bug five times
Watching the AI fail repeatedly
Losing mental energy on trivial fixes

That’s expensive too — just not measured in tokens.

I would rather spend tokens than spend mental fatigue.

And realistically:

Models get cheaper every month
Tooling improves weekly
Context handling improves
Local and hybrid options are evolving

What feels expensive today might feel trivial very soon.

MCP + Blazor: A Perfect Testing Ground

So far, this workflow works especially well for:

MCP-based systems
Blazor applications
Known benchmark problems

Using a familiar problem like an activity stream lets me clearly measure progress. If the agent can build and maintain something complex that I already understand deeply, that’s a strong signal.

Right now, the signal is positive.

The loop is closing. The system is self-correcting. And it’s actually usable.

What Comes Next

This article is just a status update.

The next one will go deeper into something very important:

How to design self-correcting mechanisms for agentic development.

Because once you see an agent test, observe, and fix itself, you don’t want to go back to manual babysitting.

For now, though:

The idea is working. The workflow feels right. It’s token hungry. But absolutely worth it.

Closing the loop isn’t theory anymore — it’s becoming a real development style.

Structured RAG for Unknown and Mixed Languages

by Joche Ojeda | Jan 5, 2026 | Uncategorized

How I stopped my multilingual activity stream from turning RAG into chaos

In the previous article (RAG with PostgreSQL and C# (pros and cons) | Joche Ojeda) I explained how naïve RAG breaks when you run it over an activity stream.

Same UI language.
Totally unpredictable content language.
Spanish, Russian, Italian… sometimes all in the same message.

Humans handle that fine.
Vector retrieval… not so much.

This is the “silent failure” scenario: retrieval looks plausible, the LLM sounds confident, and you ship nonsense.

So I had to change the game.

The Idea: Structured RAG

Structured RAG means you don’t embed raw text and pray.

You add a step before retrieval:

Extract a structured representation from each activity record
Store it as metadata (JSON)
Use that metadata to filter, route, and rank
Then do vector similarity on a cleaner, more stable representation

Think of it like this:

Unstructured text is what users write.
Structured metadata is what your RAG system can trust.

Why This Fix Works for Mixed Languages

The core problem with activity streams is not “language”.

The core problem is: you have no stable shape.

When the shape is missing, everything becomes fuzzy:

Who is speaking?
What is this about?
Which entities are involved?
Is this a reply, a reaction, a mention, a task update?
What language(s) are in here?

Structured RAG forces you to answer those questions once, at write-time, and save the answers.

PostgreSQL: Add a JSONB Column (and Keep pgvector)

We keep the previous approach (pgvector) but we add a JSONB column for structured metadata.

ALTER TABLE activities
ADD COLUMN rag_meta jsonb NOT NULL DEFAULT '{}'::jsonb;

-- Optional: if you store embeddings per activity/chunk
-- you keep your existing embedding column(s) or chunk table.

Then index it.

CREATE INDEX activities_rag_meta_gin
ON activities
USING gin (rag_meta);

Now you can filter with JSON queries before you ever touch vector similarity.

A Proposed Schema (JSON Shape You Control)

The exact schema depends on your product, but for activity streams I want at least:

language: detected languages + confidence
actors: who did it
subjects: what object is involved (ticket, order, user, document)
topics: normalized tags
relationships: reply-to, mentions, references
summary: short canonical summary (ideally in one pivot language)
signals: sentiment/intent/type if you need it

Example JSON for one activity record:

{
  "schemaVersion": 1,
  "languages": [
    { "code": "es", "confidence": 0.92 },
    { "code": "ru", "confidence": 0.41 }
  ],
  "actor": {
    "id": "user:42",
    "displayName": "Joche"
  },
  "subjects": [
    { "type": "ticket", "id": "ticket:9831" }
  ],
  "topics": ["billing", "invoice", "error"],
  "relationships": {
    "replyTo": "activity:9912001",
    "mentions": ["user:7", "user:13"]
  },
  "intent": "support_request",
  "summary": {
    "pivotLanguage": "en",
    "text": "User reports an invoice calculation error and asks for help."
  }
}

Notice what happened here: the raw multilingual chaos got converted into a stable structure.

Write-Time Pipeline (The Part That Feels Expensive, But Saves You)

Structured RAG shifts work to ingestion time.

Yes, it costs tokens.
Yes, it adds steps.

But it gives you something you never had before: predictable retrieval.

Here’s the pipeline I recommend:

Store raw activity (as-is, don’t lose the original)
Detect language(s) (fast heuristic + LLM confirmation if needed)
Extract structured metadata into your JSON schema
Generate a canonical “summary” in a pivot language (often English)
Embed the summary + key fields (not the raw messy text)
Save JSON + embedding

The key decision: embed the stable representation, not the raw stream text.

C# Conceptual Implementation

I’m going to keep the code focused on the architecture. Provider details are swappable.

Entities

public sealed class Activity
{
    public long Id { get; set; }
    public string RawText { get; set; } = "";
    public string UiLanguage { get; set; } = "en";

    // JSONB column in Postgres
    public string RagMetaJson { get; set; } = "{}";

    // Vector (pgvector) - store via your pgvector mapping or raw SQL
    public float[] RagEmbedding { get; set; } = Array.Empty<float>();

    public DateTimeOffset CreatedAt { get; set; }
}

Metadata Contract (Strongly Typed in Code, Stored as JSONB)

public sealed class RagMeta
{
    public int SchemaVersion { get; set; } = 1;
    public List<DetectedLanguage> Languages { get; set; } = new();
    public ActorMeta Actor { get; set; } = new();
    public List<SubjectMeta> Subjects { get; set; } = new();
    public List<string> Topics { get; set; } = new();
    public RelationshipMeta Relationships { get; set; } = new();
    public string Intent { get; set; } = "unknown";
    public SummaryMeta Summary { get; set; } = new();
}

public sealed class DetectedLanguage
{
    public string Code { get; set; } = "und";
    public double Confidence { get; set; }
}

public sealed class ActorMeta
{
    public string Id { get; set; } = "";
    public string DisplayName { get; set; } = "";
}

public sealed class SubjectMeta
{
    public string Type { get; set; } = "";
    public string Id { get; set; } = "";
}

public sealed class RelationshipMeta
{
    public string? ReplyTo { get; set; }
    public List<string> Mentions { get; set; } = new();
}

public sealed class SummaryMeta
{
    public string PivotLanguage { get; set; } = "en";
    public string Text { get; set; } = "";
}

Extractor + Embeddings

You need two services:

Metadata extraction (LLM fills the schema)
Embeddings (Microsoft.Extensions.AI) for the stable text

public interface IRagMetaExtractor
{
    Task<RagMeta> ExtractAsync(Activity activity, CancellationToken ct);
}

Then the ingestion pipeline:

using System.Text.Json;
using Microsoft.Extensions.AI;

public sealed class StructuredRagIngestor
{
    private readonly IRagMetaExtractor _extractor;
    private readonly IEmbeddingGenerator<string, Embedding<float>> _embeddings;

    public StructuredRagIngestor(
        IRagMetaExtractor extractor,
        IEmbeddingGenerator<string, Embedding<float>> embeddings)
    {
        _extractor = extractor;
        _embeddings = embeddings;
    }

    public async Task ProcessAsync(Activity activity, CancellationToken ct)
    {
        // 1) Extract structured JSON
        RagMeta meta = await _extractor.ExtractAsync(activity, ct);

        // 2) Create stable text for embeddings (summary + keywords)
        string stableText =
            $"{meta.Summary.Text}\n" +
            $"Topics: {string.Join(", ", meta.Topics)}\n" +
            $"Intent: {meta.Intent}";

        // 3) Embed stable text
        var emb = await _embeddings.GenerateAsync(new[] { stableText }, ct);
        float[] vector = emb.First().Vector.ToArray();

        // 4) Save into activity record
        activity.RagMetaJson = JsonSerializer.Serialize(meta);
        activity.RagEmbedding = vector;

        // db.SaveChangesAsync(ct) happens outside (unit of work)
    }
}

This is the core move: you stop embedding chaos and start embedding structure.

Query Pipeline: JSON First, Vectors Second

When querying, you don’t jump into similarity search immediately.

You do:

Parse the user question
Decide filters (actor, subject type, topic)
Filter with JSONB (fast narrowing)
Then do vector similarity on the remaining set

Example: filter by topic + intent using JSONB:

SELECT id, raw_text
FROM activities
WHERE rag_meta @> '{"intent":"support_request"}'::jsonb
  AND rag_meta->'topics' ? 'invoice'
ORDER BY rag_embedding <=> @query_embedding
LIMIT 20;

That “JSON first” step is what keeps multilingual streams from poisoning your retrieval.

Tradeoffs (Because Nothing Is Free)

Structured RAG costs more at write-time:

more tokens
more latency
more moving parts

But it saves you at query-time:

less noise
better precision
more predictable answers
debuggable failures (because you can inspect metadata)

In real systems, I’ll take predictable and debuggable over “cheap but random” every day.

Final Thought

RAG over activity streams is hard because activity streams are messy by design.

If you want RAG to behave, you need structure.

Structured RAG is how you make retrieval boring again.
And boring retrieval is exactly what you want.

In the next article, I’ll go deeper into the exact pipeline details: language routing, mixed-language detection, pivot summaries, chunk policies, and how I made this production-friendly without turning it into a token-burning machine.

Let the year begin 🚀

“`

RAG with PostgreSQL and C# (pros and cons)

by Joche Ojeda | Jan 5, 2026 | A.I, Postgres

RAG with PostgreSQL and C#

Happy New Year 2026 — let the year begin

Happy New Year 2026 🎉

Let’s start the year with something honest.

This article exists because something broke.

I wasn’t trying to build a demo.
I was building an activity stream — the kind of thing every social or collaborative system eventually needs.

Posts.
Comments.
Reactions.
Short messages.
Long messages.
Noise.

At some point, the obvious question appeared:

“Can I do RAG over this?”

That question turned into this article.

The Original Problem: RAG over an Activity Stream

An activity stream looks simple until you actually use it as input.

In my case:

The UI language was English
The content language was… everything else

Users were writing:

Spanish
Russian
Italian
English
Sometimes all of them in the same message

Perfectly normal for humans.
Absolutely brutal for naïve RAG.

I tried the obvious approach:

embed everything
store vectors
retrieve similar content
augment the prompt

And very quickly, RAG went crazy.

Why It Failed (And Why This Matters)

The failure wasn’t dramatic.
No exceptions.
No errors.

Just… wrong answers.

Confident answers.
Fluent answers.
Wrong answers.

The problem was subtle:

Same concept, different languages
Mixed-language sentences
Short, informal activity messages
No guarantee of language consistency

In an activity stream:

You don’t control the language
You don’t control the structure
You don’t even control what a “document” is

And RAG assumes you do.

That’s when I stopped and realized:

RAG is not “plug-and-play” once your data becomes messy.

So… What Is RAG Really?

RAG stands for Retrieval-Augmented Generation.

The idea is simple:

Retrieve relevant data first, then let the model reason over it.

Instead of asking the model to remember everything, you let it look things up.

Search first.
Generate second.

Sounds obvious.
Still easy to get wrong.

The Real RAG Pipeline (No Marketing)

A real RAG system looks like this:

Your data lives in a database
Text is split into chunks
Each chunk becomes an embedding
Embeddings are stored as vectors
A user asks a question
The question is embedded
The closest vectors are retrieved
Retrieved content is injected into the prompt
The model answers

Every step can fail silently.

Tokenization & Chunking (The First Trap)

Models don’t read text.
They read tokens.

This matters because:

prompts have hard limits
activity streams are noisy
short messages lose context fast

You usually don’t tokenize manually, but you do choose:

chunk size
overlap
grouping strategy

In activity streams, chunking is already a compromise — and multilingual content makes it worse.

Embeddings in .NET (Microsoft.Extensions.AI)

In .NET, embeddings are generated using Microsoft.Extensions.AI.

The important abstraction is:

IEmbeddingGenerator<TInput, TEmbedding>

This keeps your architecture:

provider-agnostic
DI-friendly
survivable over time

Minimal Setup

dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.AI.OpenAI

Creating an Embedding Generator

using OpenAI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.OpenAI;

var client = new OpenAIClient("YOUR_API_KEY");

IEmbeddingGenerator<string, Embedding<float>> embeddings =
    client.AsEmbeddingGenerator("text-embedding-3-small");

Generating a Vector

var result = await embeddings.GenerateAsync(
    new[] { "Some activity text" });

float[] vector = result.First().Vector.ToArray();

That vector is what drives everything that follows.

⚠️ Embeddings Are Model-Locked (And Language Makes It Worse)

Embeddings are model-locked.

Meaning:

Vectors from different embedding models cannot be compared.

Even if:

the dimension matches
the text is identical
the provider is the same

Each model defines its own universe.

But here’s the kicker I learned the hard way:

Multilingual content amplifies this problem.

Even with multilingual-capable models:

language mixing shifts vector space
short messages lose semantic anchors
similarity becomes noisy

In an activity stream:

English UI
Spanish content
Russian replies
Emoji everywhere

Vector distance starts to mean “kind of related, maybe”.

That’s not good enough.

PostgreSQL + pgvector (Still the Right Choice)

Despite all that, PostgreSQL with pgvector is still the right foundation.

Enable pgvector

CREATE EXTENSION IF NOT EXISTS vector;

Chunk-Based Table

CREATE TABLE doc_chunks (
    id            bigserial PRIMARY KEY,
    document_id   bigint NOT NULL,
    chunk_index   int NOT NULL,
    content       text NOT NULL,
    embedding     vector(1536) NOT NULL,
    created_at    timestamptz NOT NULL DEFAULT now()
);

Technically correct.
Architecturally incomplete — as I later discovered.

Retrieval: Where Things Quietly Go Wrong

SELECT content
FROM doc_chunks
ORDER BY embedding <=> @query_embedding
LIMIT 5;

This query decides:

what the model sees
what it ignores
how wrong the answer will be

When language is mixed, retrieval looks correct — but isn’t.

Classic example: Moscow

Spanish: Moscú
Italian: Mosca
Meaning in Spanish: 🪰 a fly

So for a Spanish speaker, “Mosca” looks like it should mean insect (which it does), but it’s also the Italian name for Moscow.

Why RAG Failed in This Scenario

Let’s be honest:

Similar ≠ relevant
Multilingual ≠ multilingual-safe
Short activity messages ≠ documents
Noise ≠ knowledge

RAG didn’t fail because the model was bad.
It failed because the data had no structure.

Why This Article Exists

This article exists because:

I tried RAG on a real system
With real users
Writing in real languages
In real combinations

And the naïve RAG approach didn’t survive.

What Comes Next

The next article will not be about:

embeddings
models
APIs

It will be about structured RAG.

How I fixed this by:

introducing structure into the activity stream
separating concerns in the pipeline
controlling language before retrieval
reducing semantic noise
making RAG predictable again

In other words:
How to make RAG work after it breaks.

Final Thought

RAG is not magic.

It’s:

search + structure + discipline

If your data is chaotic, RAG will faithfully reflect that chaos — just with confidence.

Happy New Year 2026 🎆

If you’re reading this:
Happy New Year 2026.

Let’s make this the year we stop trusting demos
and start trusting systems that survived reality.

Let the year begin 🚀

Creating an activity stream using Postgres 15

by Joche Ojeda | Mar 12, 2023 | Postgres

An activity stream is a data format used to represent a list of recent activities performed by an individual or group on a social network, web application, or other platform. It typically includes information such as the type of activity (e.g., posting a status update, commenting on a post), the person or entity performing the activity, and any associated objects or targets (e.g., a photo or link). Activity streams can be used to track user behavior, personalize content recommendations, and facilitate social interactions between users.

An activity stream typically consists of the following parts:

Actor: The person or entity that initiates the action.
Verb: The action being taken.
Object: The thing on which the action is taken.
Target: The thing to which the action is directed.
Time: The time at which the action occurred.
Context: Any additional information about the action, such as the location or device used to perform it.
Metadata: Additional information about the action, such as the user’s preferences or the permissions required to perform it.

Activity streams can be used to represent data from any system, and there is no direct relationship between the stream of activities and the associated objects.

With a basic understanding of what an activity stream is, we can leverage PostgreSQL as a database storage to implement one. PostgreSQL is particularly suitable for activity streams due to its built-in support for JSON columns, which can store data with flexible schemas, and its GIS functionality, which makes it easy to filter activities based on location.

For this project, I have chosen to use Postgres 15 with GIS extensions, as well as the DBeaver Community Edition for managing the database. The GIS extensions are especially useful for this project since we want to display only activities that occurred around specific geographical points

Let’s begin our coding journey with the creation of an object storage in PostgreSQL. The object storage will have a column to store the object type and a JSON column to store the complete data of the object being stored.

CREATE DATABASE ActivityStream;

After creating the database, the next step is to install the PostGIS extension using the following query.

CREATE EXTENSION IF NOT EXISTS postgis; -- Enable PostGIS extension

CREATE TABLE objectstorage (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    latitude DECIMAL(9,6) NOT NULL,
    longitude DECIMAL(9,6) NOT NULL,
    location GEOMETRY(Point, 4326), -- 4326 is the SRID for WGS 84, a common coordinate system for GPS data
    object_type TEXT NOT NULL,
    object_data JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);


CREATE OR REPLACE FUNCTION update_location() RETURNS TRIGGER AS $$
BEGIN
    NEW.location := ST_SetSRID(ST_MakePoint(NEW.longitude, NEW.latitude), 4326);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;


CREATE TRIGGER set_location
    BEFORE INSERT OR UPDATE
    ON objectstorage
    FOR EACH ROW
    EXECUTE FUNCTION update_location();

This query creates a table named objectstorage with columns for id, latitude, longitude, location, object_type, object_data, created_at, and updated_at. The id column is a primary key and generates a random UUID as its default value. The latitude and longitude columns store decimal values for geographic coordinates. The location column stores a geometry object of type Point using the WGS 84 coordinate system with SRID 4326. The object_type column stores the type of the object being stored, and the object_data column stores the complete data for the object in JSONB format. The created_at and updated_at columns store timestamps for when the row was created and last updated, respectively.

Additionally, this query creates a trigger function named update_location() that is triggered when a row is inserted or updated in the objectstorage table. The function updates the location column based on the values in the latitude and longitude columns using the ST_SetSRID() and ST_MakePoint() functions from PostGIS. The ST_SetSRID() function sets the coordinate system for the point, and the ST_MakePoint() function creates a point geometry object from the latitude and longitude values. The function returns the updated row.

To simplify our database interactions, we’ll create UPSERT functions as needed. Here’s an example of an UPSERT function we can use for the objectstorage table.

CREATE OR REPLACE FUNCTION upsert_objectstorage(
    p_id UUID, 
    p_latitude DECIMAL(9,6), 
    p_longitude DECIMAL(9,6),
    p_object_type TEXT,
    p_object_data JSONB
) RETURNS VOID AS $$
BEGIN
    -- Try to update the existing row
    UPDATE objectstorage SET
        latitude = p_latitude,
        longitude = p_longitude,
        location = ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326),
        object_type = p_object_type,
        object_data = p_object_data,
        updated_at = CURRENT_TIMESTAMP
    WHERE id = p_id;
    
    -- If no row was updated, insert a new one
    IF NOT FOUND THEN
        INSERT INTO objectstorage (id, latitude, longitude, location, object_type, object_data)
        VALUES (p_id, p_latitude, p_longitude, ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326), p_object_type, p_object_data);
    END IF;
END;
$$ LANGUAGE plpgsql;

Below is the code for the “activity” table, which is the central piece of an activity stream system. It includes a trigger function that updates the “location” column using PostGIS.

CREATE TABLE activity (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    verb TEXT NOT NULL,
    actor_id UUID NOT NULL REFERENCES objectstorage(id),
    object_id UUID NOT NULL REFERENCES objectstorage(id),
    target_id UUID REFERENCES objectstorage(id),
    latitude DECIMAL(9,6) NOT NULL,
    longitude DECIMAL(9,6) NOT NULL,
    location GEOMETRY(Point, 4326) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE OR REPLACE FUNCTION update_activity_location() RETURNS TRIGGER AS $$
BEGIN
    NEW.location := ST_SetSRID(ST_MakePoint(NEW.longitude, NEW.latitude), 4326);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_activity_location
    BEFORE INSERT OR UPDATE
    ON activity
    FOR EACH ROW
    EXECUTE FUNCTION update_activity_location();

Now the UPSERT function for the activity table

CREATE OR REPLACE FUNCTION upsert_activity(
    p_id UUID,
    p_verb TEXT,
    p_actor_id UUID,
    p_object_id UUID,
    p_target_id UUID,
    p_latitude DECIMAL(9,6),
    p_longitude DECIMAL(9,6)
) RETURNS VOID AS $$
BEGIN
    -- Try to update the existing row
    UPDATE activity SET
        verb = p_verb,
        actor_id = p_actor_id,
        object_id = p_object_id,
        target_id = p_target_id,
        latitude = p_latitude,
        longitude = p_longitude,
        location = ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326),
        updated_at = CURRENT_TIMESTAMP
    WHERE id = p_id;
    
    -- If no row was updated, insert a new one
    IF NOT FOUND THEN
        INSERT INTO activity (id, verb, actor_id, object_id, target_id, latitude, longitude, location)
        VALUES (p_id, p_verb, p_actor_id, p_object_id, p_target_id, p_latitude, p_longitude, ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326));
    END IF;
END;
$$ LANGUAGE plpgsql;

To avoid serialization issues and redundant code, we’ll modify our queries to return JSON arrays. We’ll add a new column named “self” to the activity table and create a trigger that saves the current activity values in JSON format.

ALTER TABLE activity ADD COLUMN self JSON;

CREATE OR REPLACE FUNCTION update_activity_self() RETURNS TRIGGER AS $$
BEGIN
    NEW.self = json_build_object(
        'id', NEW.id,
        'verb', NEW.verb,
        'actor_id',NEW.actor_id,
        'actor', (SELECT object_data FROM objectstorage WHERE id = NEW.actor_id),
        'object_id',NEW.object_id,
        'object', (SELECT object_data FROM objectstorage WHERE id = NEW.object_id),
        'target_id',NEW.target_id,
        'target', (SELECT object_data FROM objectstorage WHERE id = NEW.target_id),
        'latitude', NEW.latitude,
        'longitude', NEW.longitude,
        'created_at', NEW.created_at,
        'updated_at', NEW.updated_at
    )::jsonb;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER activity_self_trigger
    BEFORE INSERT OR UPDATE ON activity
    FOR EACH ROW
    EXECUTE FUNCTION update_activity_self();

CREATE OR REPLACE FUNCTION get_activities_by_distance_as_json(
    p_lat NUMERIC,
    p_long NUMERIC,
    p_distance INTEGER,
    p_page_num INTEGER,
    p_page_size INTEGER
) 
RETURNS JSON
AS $$
DECLARE
    activities_json JSON;
BEGIN
    SELECT json_agg(a.self) INTO activities_json
    FROM (
        SELECT a.self
        FROM activity a
        WHERE ST_DWithin(location::geography, ST_SetSRID(ST_Point(p_long, p_lat), 4326)::geography, p_distance)
        ORDER BY created_at DESC
        LIMIT p_page_size
        OFFSET (p_page_num - 1) * p_page_size
    ) a;
    
    RETURN activities_json;
END;
$$ LANGUAGE plpgsql;

An activity stream without a follow functionality would defeat the main purpose of an activity stream, which is to keep track of the activities of other actors without the need to constantly visit their profile page.

So here is the code for the follow functionality

CREATE TABLE follow (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    follower_id UUID NOT NULL REFERENCES objectstorage(id),
    followee_id UUID NOT NULL REFERENCES objectstorage(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE OR REPLACE FUNCTION follow_user(
    p_follower_id UUID,
    p_followee_id UUID
) RETURNS VOID AS $$
BEGIN
    -- Try to insert a new row into the follow table
    -- If the row already exists, do nothing
    BEGIN
        INSERT INTO follow (follower_id, followee_id)
        VALUES (p_follower_id, p_followee_id);
    EXCEPTION WHEN unique_violation THEN
        RETURN;
    END;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION unfollow_user(
    p_follower_id UUID,
    p_followee_id UUID
) RETURNS VOID AS $$
BEGIN
    -- Delete the row from the follow table where the follower_id and followee_id match
    DELETE FROM follow
    WHERE follower_id = p_follower_id AND followee_id = p_followee_id;
END;
$$ LANGUAGE plpgsql;

To create an activity stream, we need to first identify the actors that we are following. To accomplish this, we can define a function that takes an ID of an object from our object storage and retrieves the IDs of all the actors that are being followed by that object.

Here’s the function code:

CREATE OR REPLACE FUNCTION get_following_ids(p_user_id UUID)
RETURNS UUID[] AS $$
DECLARE
  following_ids UUID[];
BEGIN
  SELECT ARRAY_AGG(followee_id) INTO following_ids
  FROM follow
  WHERE follower_id = p_user_id;
  
  RETURN following_ids;
END;
$$ LANGUAGE plpgsql;

Now that we have obtained the list of actors we are following, the next step is to retrieve their activities. However, this can be a challenging task due to two reasons: first, using a relational database could result in complex joins that could slow down the data retrieval process; second, the actors we are following might have produced a large number of activities, and retrieving them all at once could potentially overload the server. To address these issues, we will introduce pagination to our queries to ensure efficient and scalable data retrieval.

CREATE OR REPLACE FUNCTION get_activities_by_following(p_page_num INTEGER, p_page_size INTEGER, p_following_ids UUID[])
    RETURNS TABLE (
        id UUID,
        verb TEXT,
        actor_id UUID,
        object_id UUID,
        target_id UUID,
        latitude DECIMAL(9,6),
        longitude DECIMAL(9,6),
        location GEOMETRY(Point, 4326),
        self_data JSON,
        created_at TIMESTAMP WITH TIME ZONE,
        updated_at TIMESTAMP WITH TIME ZONE
    ) AS $$
BEGIN
    RETURN QUERY
    SELECT a.id, a.verb, a.actor_id, a.object_id, a.target_id, a.latitude, a.longitude, a.location, a."self" , a.created_at, a.updated_at
    FROM activity a
    WHERE a.actor_id = ANY(p_following_ids)
    ORDER BY a.created_at DESC
    LIMIT p_page_size
    OFFSET (p_page_num - 1) * p_page_size;
END;
$$ LANGUAGE plpgsql;

We need a function that takes the result produced by the get_activities_by_following function, and converts it into a JSON array.

CREATE OR REPLACE FUNCTION get_activities_by_following_as_json(p_page_num INTEGER, p_page_size INTEGER, p_user_id UUID)
RETURNS JSON AS $$
DECLARE
    following_ids UUID[] := ARRAY(SELECT get_following_ids(p_user_id));
BEGIN
    RETURN (SELECT json_agg(self_data) FROM get_activities_by_following(p_page_num, p_page_size, following_ids));
END;
$$ LANGUAGE plpgsql;

To demonstrate our activity stream system, we need to create sample data. Let’s create 5 users and have them post ads on our objectstorage table.

--create users and activities

SELECT upsert_objectstorage(
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', -- object ID 1
    59.9311, -- latitude
    30.3609, -- longitude
    'user', -- object type
    '{"name": "Alice", "age": 27, "email": "alice@example.com", "picture_url": "https://example.com/pictures/alice.jpg"}' -- object data in JSON format
);
SELECT upsert_objectstorage(
    'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- object ID 2
    59.9428, -- latitude
    30.3071, -- longitude
    'user', -- object type
    '{"name": "Bob", "age": 33, "email": "bob@example.com", "picture_url": "https://example.com/pictures/bob.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- object ID 3
    59.9375, -- latitude
    30.3086, -- longitude
    'user', -- object type
    '{"name": "Charlie", "age": 42, "email": "charlie@example.com", "picture_url": "https://example.com/pictures/charlie.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1', -- object ID 4
    59.9167, -- latitude
    30.25, -- longitude
    'user', -- object type
    '{"name": "Dave", "age": 29, "email": "dave@example.com", "picture_url": "https://example.com/pictures/dave.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', -- object ID 5
    59.9391, -- latitude
    30.3158, -- longitude
    'user', -- object type
    '{"name": "Eve", "age": 25, "email": "eve@example.com", "picture_url": "https://example.com/pictures/eve.jpg"}' -- object data in JSON format
);

--create ads

-- Bob's ad
SELECT upsert_objectstorage(
'f6c7599e-8161-4d54-82ec-faa13bb8cbf7', -- object ID
59.9428, -- latitude (near Saint Petersburg)
30.3071, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Vintage bicycle, good condition", "ad_type": "sale", "picture_url": "https://example.com/pictures/bicycle.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- actor ID (Bob)
'f6c7599e-8161-4d54-82ec-faa13bb8cbf7', -- object ID (Bob's ad)
NULL, -- target ID (no target)
59.9428, -- latitude (near Saint Petersburg)
30.3071 -- longitude (near Saint Petersburg)
);

-- Charlie's ad
SELECT upsert_objectstorage(
'41f7c558-1cf8-4f2b-b4b4-4d4e4df50843', -- object ID
59.9375, -- latitude (near Saint Petersburg)
30.3086, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Smartphone, unlocked", "ad_type": "sale", "picture_url": "https://example.com/pictures/smartphone.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- actor ID (Charlie)
'41f7c558-1cf8-4f2b-b4b4-4d4e4df50843', -- object ID (Charlie's ad)
NULL, -- target ID (no target)
59.9375, -- latitude (near Saint Petersburg)
30.3086 -- longitude (near Saint Petersburg)
);


-- Dave's ad
SELECT upsert_objectstorage(
'c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88', -- object ID
59.9139, -- latitude (near Saint Petersburg)
30.3341, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Vintage camera, working condition", "ad_type": "exchange", "picture_url": "https://example.com/pictures/camera.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'34f6c0a5-5d5e-463f-a2cf-11b7529a92a1', -- actor ID (Dave)
'c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88', -- object ID (Dave's ad)
NULL, -- target ID (no target)
59.9139, -- latitude (near Saint Petersburg)
30.3341 -- longitude (near Saint Petersburg)
);

-- Eve's ad
SELECT upsert_objectstorage(
'3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3', -- object ID
59.9375, -- latitude (near Saint Petersburg)
30.3141, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Plants, various types", "ad_type": "want", "picture_url": "https://example.com/pictures/plants.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', -- actor ID (Eve)
'3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3', -- object ID (Eve's ad)
NULL, -- target ID (no target)
59.9375, -- latitude (near Saint Petersburg)
30.3141 -- longitude (near Saint Petersburg)
);

-- Alice's ad
SELECT upsert_objectstorage(
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02', -- new object ID for Alice's ad
    59.9311, -- latitude
    30.3609, -- longitude
    'ad', -- object type
    '{"description": "Used bicycle, good condition", "ad_type": "sell", "picture_url": "https://example.com/pictures/bicycle.jpg"}' -- ad data in JSON format
);

SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', -- actor ID (Alice)
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02', -- object ID (Alice's ad)
    NULL, -- target ID (no target)
    59.9311, -- latitude
    30.3609 -- longitude
);
-- Charly's ad 
SELECT upsert_objectstorage(
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4d', -- new object ID for Charlie's ad
    59.9375, -- latitude
    30.3086, -- longitude
    'ad', -- object type
    '{"description": "Books, various genres", "ad_type": "sell", "picture_url": "https://example.com/pictures/books.jpg"}' -- ad data in JSON format
);

SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- actor ID (Charlie)
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4d', -- object ID (Charlie's ad)
    NULL, -- target ID (no target)
    59.9428, -- latitude
    30.3071 -- longitude
);
-- Bob's ad
SELECT upsert_objectstorage(
    'cc7ebda2-019c-4387-925c-352f7e1f0b12', -- new object ID for Bob's ad
    59.9428, -- latitude
    30.3071, -- longitude
    'ad', -- object type
    '{"description": "Vintage record player, needs repair", "ad_type": "exchange", "picture_url": "https://example.com/pictures/record_player.jpg"}' -- ad data in JSON format
);
SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- actor ID (Bob)
    'cc7ebda2-019c-4387-925c-352f7e1f0b12', -- object ID (Bob's ad)
    NULL, -- target ID (no target)
    59.9428, -- latitude
    30.3071 -- longitude
);

Now that we have created objects and activities, the activity stream will still be empty because actors need to follow each other to generate activity. Therefore, we need to establish follow relationships between users to create a stream showing their activities.

-- Follow data

-- Follow Eve and Alice to themselves
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', 'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01');
-- Follow Eve and Alice to Bob, Charlie, and Dave
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', 'cc7ebda2-019c-4387-925c-352f7e1f0b10');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', 'cc7ebda2-019c-4387-925c-352f7e1f0b10');
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '99875f15-49ee-4e6d-b356-cbab4f4e4a4c');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', '99875f15-49ee-4e6d-b356-cbab4f4e4a4c');
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1');

-- Follow data

It’s time to test our activity stream first lets try to get the followers for the user Alice

SELECT get_following_ids('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01') -- get the objects that Allice is following

here is the result

get_following_ids                                                                                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------+
{
b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01,
cc7ebda2-019c-4387-925c-352f7e1f0b10,
99875f15-49ee-4e6d-b356-cbab4f4e4a4c,
34f6c0a5-5d5e-463f-a2cf-11b7529a92a1
}

Now lets get the activities of the objects that Alice is following, we will get page 1 and how 10 records per page

SELECT * FROM get_activities_by_following(1, 10, ARRAY(SELECT get_following_ids('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01')));

here is the result

id                                  |verb|actor_id                            |object_id                           |target_id|
------------------------------------+----+------------------------------------+------------------------------------+---------+
f905356f-2fe3-4f51-b6de-d2cd107f46b8|post|cc7ebda2-019c-4387-925c-352f7e1f0b10|f6c7599e-8161-4d54-82ec-faa13bb8cbf7|         |
43a92964-5bcd-4096-93bc-e5e87c76455e|post|99875f15-49ee-4e6d-b356-cbab4f4e4a4c|41f7c558-1cf8-4f2b-b4b4-4d4e4df50843|         |
69ec53ac-bbaa-4c36-81ef-8764647d7914|post|34f6c0a5-5d5e-463f-a2cf-11b7529a92a1|c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88|         |
de6b052f-8a84-4b37-9920-9f76cbb539d4|post|b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01|b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02|         |
3c35544a-3ee0-47ae-bddc-1017127ff4d6|post|99875f15-49ee-4e6d-b356-cbab4f4e4a4c|99875f15-49ee-4e6d-b356-cbab4f4e4a4d|         |
e76dcbb9-56c4-46d8-bb42-2f67dec4c5aa|post|cc7ebda2-019c-4387-925c-352f7e1f0b10|cc7ebda2-019c-4387-925c-352f7e1f0b12|         |

Now lets makes this better and get the activities in JSON format

SELECT * FROM get_activities_by_following_as_json(1, 2, 'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01');

and here is the result

[
   {
      "id":"43a92964-5bcd-4096-93bc-e5e87c76455e",
      "verb":"post",
      "actor":{
         "age":42,
         "name":"Charlie",
         "email":"charlie@example.com",
         "picture_url":"https://example.com/pictures/charlie.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Smartphone, unlocked",
         "picture_url":"https://example.com/pictures/smartphone.jpg"
      },
      "target":null,
      "actor_id":"99875f15-49ee-4e6d-b356-cbab4f4e4a4c",
      "latitude":59.937500,
      "longitude":30.308600,
      "object_id":"41f7c558-1cf8-4f2b-b4b4-4d4e4df50843",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"f905356f-2fe3-4f51-b6de-d2cd107f46b8",
      "verb":"post",
      "actor":{
         "age":33,
         "name":"Bob",
         "email":"bob@example.com",
         "picture_url":"https://example.com/pictures/bob.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Vintage bicycle, good condition",
         "picture_url":"https://example.com/pictures/bicycle.jpg"
      },
      "target":null,
      "actor_id":"cc7ebda2-019c-4387-925c-352f7e1f0b10",
      "latitude":59.942800,
      "longitude":30.307100,
      "object_id":"f6c7599e-8161-4d54-82ec-faa13bb8cbf7",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   }
]

And now before I go, here is a good , this query will return all the activities that happened around a specific geo location

SELECT get_activities_by_distance_as_json(59.9343, 30.3351, 1600, 1, 10);

Here are the results, all those places are near my home ))

         "name":"Charlie",
         "email":"charlie@example.com",
         "picture_url":"https://example.com/pictures/charlie.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Smartphone, unlocked",
         "picture_url":"https://example.com/pictures/smartphone.jpg"
      },
      "target":null,
      "actor_id":"99875f15-49ee-4e6d-b356-cbab4f4e4a4c",
      "latitude":59.937500,
      "longitude":30.308600,
      "object_id":"41f7c558-1cf8-4f2b-b4b4-4d4e4df50843",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"e5e26df0-e58f-4b25-96c1-5b3460beb0c8",
      "verb":"post",
      "actor":{
         "age":25,
         "name":"Eve",
         "email":"eve@example.com",
         "picture_url":"https://example.com/pictures/eve.jpg"
      },
      "object":{
         "ad_type":"want",
         "description":"Plants, various types",
         "picture_url":"https://example.com/pictures/plants.jpg"
      },
      "target":null,
      "actor_id":"8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54",
      "latitude":59.937500,
      "longitude":30.314100,
      "object_id":"3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"de6b052f-8a84-4b37-9920-9f76cbb539d4",
      "verb":"post",
      "actor":{
         "age":27,
         "name":"Alice",
         "email":"alice@example.com",
         "picture_url":"https://example.com/pictures/alice.jpg"
      },
      "object":{
         "ad_type":"sell",
         "description":"Used bicycle, good condition",
         "picture_url":"https://example.com/pictures/bicycle.jpg"
      },
      "target":null,
      "actor_id":"b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01",
      "latitude":59.931100,
      "longitude":30.360900,
      "object_id":"b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   }
]

In conclusion, this article provided a step-by-step guide on how to create an activity stream system using PostgreSQL as the database storage. The article covered various aspects, such as the creation of the object storage table, activity table, follow functionality, and pagination to handle the huge amount of data generated by users. Additionally, the article discussed the use of PostGIS extensions for geographical search and the benefits of using JSON columns in PostgreSQL to store complex data structures. Overall, the article provided a comprehensive guide to building an activity stream system that can handle a large volume of data efficiently. By following this guide, developers can create their own activity stream systems using PostgreSQL and implement them into their applications.

You can find the complete code for this tutorial here :

https://github.com/egarim/PostgresActivityStream

Structured RAG for Unknown and Mixed Languages

The Idea: Structured RAG

Why This Fix Works for Mixed Languages

PostgreSQL: Add a JSONB Column (and Keep pgvector)

A Proposed Schema (JSON Shape You Control)

Write-Time Pipeline (The Part That Feels Expensive, But Saves You)

C# Conceptual Implementation

Entities

Metadata Contract (Strongly Typed in Code, Stored as JSONB)

Extractor + Embeddings

Query Pipeline: JSON First, Vectors Second

Tradeoffs (Because Nothing Is Free)

Final Thought

RAG with PostgreSQL and C# (pros and cons)

RAG with PostgreSQL and C#

The Original Problem: RAG over an Activity Stream

Why It Failed (And Why This Matters)

So… What Is RAG Really?

The Real RAG Pipeline (No Marketing)

Tokenization & Chunking (The First Trap)

Embeddings in .NET (Microsoft.Extensions.AI)

Minimal Setup

Creating an Embedding Generator

Generating a Vector

⚠️ Embeddings Are Model-Locked (And Language Makes It Worse)

PostgreSQL + pgvector (Still the Right Choice)

Enable pgvector

Chunk-Based Table

Retrieval: Where Things Quietly Go Wrong

Classic example: Moscow

Why RAG Failed in This Scenario

Why This Article Exists

What Comes Next

Final Thought

Happy New Year 2026 🎆

Creating an activity stream using Postgres 15

Search

Recent Posts

Categories

Archives