Project Kijiji is a real-time platform that maps African internet routing inefficiencies and predicts where new IXP peering agreements would have the highest regional impact. It uses a Graph Neural Network trained on BGP event streams, with Tinybird as the real-time data pipeline underneath. This post walks through the architecture, the data model, and the engineering trade-offs I made along the way.
The hook: software that has to survive Nairobi
There is a specific kind of engineering problem that doesn't appear in textbooks.
It happens at 2pm on a Tuesday when a Nairobi ISP reroutes traffic through Frankfurt because a fiber junction in Mombasa went down. Your telemetry pipeline starts dropping events. Your model inference latency triples. Your dashboard enters a state your state machine was never designed to handle, because your state machine assumed the network would behave.
It won't. Not here.
Project Kijiji is a real-time platform that maps African internet routing inefficiencies and predicts where new IXP peering agreements would have the highest regional impact. The research question: Can a Graph Neural Network learn the conditions under which African IXPs reduce latency detours, and predict where new peering would deliver the highest dividend?
Before the GNN could answer anything, I had to solve a harder problem: how do you build a reliable real-time data pipeline in an environment where "real-time" is a generous description of what the network is doing?
The answer was Tinybird.
The problem I'm actually solving
When traffic from Kinshasa to Accra routes through London, it travels significantly further than the direct path. That's a trombone detour, named for the way a packet loops far out of its way before reaching a destination that was geographically close all along. It happens because Sub-Saharan Africa lacks sufficient IXP (Internet Exchange Point) infrastructure. Without local peering agreements, African ISPs route traffic through European exchange points by default. The cables physically exist. The peering agreements don't.
My platform detects these detours by comparing actual BGP paths against geodesic optima. Every routing event is classified into one of three categories:
- TROMBONE: traffic routed through a European hub unnecessarily (ratio > 2.0×)
- POLICY: detour exists but is BGP valley-free policy-driven (ratio 1.3–2.0×)
- DIRECT: regional path used, no significant detour
The 2.0× threshold is a tunable hyperparameter, empirically calibrated against known detour cases from RIPE RIS, not an arbitrary constant.
The data shape: designing for the GNN from day one
The most important architectural decision I made early was to denormalise the GNN feature vector into every event at the ingest time. Each city node carries a four-dimensional feature vector that the GraphSAGE model uses to learn regional relay patterns:
x_i = [gdp_per_capita, fiber_index, ixp_count, mean_latency_ms]
Rather than joining these features at query time, I embed them into both the source and destination sides of every BGP event. Here is the full event shape emitted by the ingest layer:
return {
# Topology identifiers
"timestamp": datetime.now(timezone.utc).isoformat(),
"src_city": src_id,
"dst_city": dst_id,
"src_asn": src_asn,
"dst_asn": dst_asn,
"as_path": json.dumps(as_path),
# GNN Node Feature Vector (src)
"src_gdp_per_capita": src["gdp_per_capita"],
"src_fiber_index": src["fiber_index"],
"src_ixp_count": src["ixp_count"],
"src_mean_latency_ms": src["mean_latency_ms"],
# GNN Node Feature Vector (dst)
"dst_gdp_per_capita": dst["gdp_per_capita"],
"dst_fiber_index": dst["fiber_index"],
"dst_ixp_count": dst["ixp_count"],
"dst_mean_latency_ms": dst["mean_latency_ms"],
# Trombone Detector Output
"detour_type": detour["detour_type"],
"geodesic_km": detour["geodesic_km"],
"bgp_path_km": detour["bgp_path_km"],
"detour_ratio": detour["detour_ratio"],
"transit_hub": detour["transit_hub"],
# Observed Performance
"observed_latency_ms": observed_latency_ms,
}
This feels like over-engineering from a normalisation standpoint. It is the correct decision from a streaming analytics standpoint. Tinybird Pipes can aggregate across city pairs without joins, and at query time on a degraded connection, eliminating a join is meaningful latency saved.
The ingestion layer: piping BGP events into Tinybird
The node registry is sourced from PeeringDB and World Bank data. Two cities are worth calling out explicitly because they anchor the model's loss function at opposite ends of the weighting scale:
"KIN": {
"city": "Kinshasa",
"country": "CD",
"lat": -4.322,
"lon": 15.322,
"gdp_per_capita": 577.0, # Lowest GDP — highest loss weight
"fiber_index": 0.18,
"ixp_count": 0,
"mean_latency_ms": 89.0,
"asns": [36916, 37342],
},
"JNB": {
"city": "Johannesburg",
"country": "ZA",
"lat": -26.204,
"lon": 28.047,
"gdp_per_capita": 6994.0,
"fiber_index": 0.89,
"ixp_count": 3, # JINX, NAPAfrica, etc.
"mean_latency_ms": 12.0,
"asns": [3741, 16637, 36937],
},
Kinshasa has no IXP, the lowest GDP in the dataset, and 89ms baseline latency. Johannesburg has three IXPs and the best fiber index. The model is designed to care more about getting Kinshasa right than Johannesburg, more on that in the loss function section.
The detour classification mirrors the Rust engine logic in Python for the ingest layer. The geodesic computation uses the Haversine formula against four European transit hubs, London (LINX), Frankfurt (DE-CIX), Amsterdam (AMS-IX), and Paris (France-IX):
TRANSIT_HUBS = {
"LON": {"lat": 51.509, "lon": -0.118}, # London — LINX
"FRA": {"lat": 50.118, "lon": 8.682}, # Frankfurt — DE-CIX
"AMS": {"lat": 52.378, "lon": 4.895}, # Amsterdam — AMS-IX
"PAR": {"lat": 48.857, "lon": 2.347}, # Paris — France-IX
}
def classify_detour(src_id: str, dst_id: str, via_hub: str | None) -> dict:
src = NODES[src_id]
dst = NODES[dst_id]
geodesic_km = _haversine_km(src["lat"], src["lon"], dst["lat"], dst["lon"])
if via_hub is None:
return {
"detour_type": "DIRECT",
"geodesic_km": round(geodesic_km, 2),
"bgp_path_km": round(geodesic_km * random.uniform(1.0, 1.4), 2),
"detour_ratio": round(random.uniform(1.0, 1.4), 3),
"transit_hub": "NONE",
}
hub = TRANSIT_HUBS[via_hub]
bgp_path_km = (
_haversine_km(src["lat"], src["lon"], hub["lat"], hub["lon"])
+ _haversine_km(hub["lat"], hub["lon"], dst["lat"], dst["lon"])
)
detour_ratio = bgp_path_km / geodesic_km if geodesic_km > 0 else 1.0
TROMBONE_THRESHOLD = 2.0
detour_type = "TROMBONE" if detour_ratio > TROMBONE_THRESHOLD else "POLICY"
return {
"detour_type": detour_type,
"geodesic_km": round(geodesic_km, 2),
"bgp_path_km": round(bgp_path_km, 2),
"detour_ratio": round(detour_ratio, 3),
"transit_hub": via_hub,
}
Events are batched and sent to Tinybird as NDJSON, with approximately 40% routed via European hubs, reflecting the empirical baseline for African inter-city traffic:
def ingest_batch(batch_size: int = 20) -> None:
events = [generate_edge_event() for _ in range(batch_size)]
ndjson_payload = "\n".join(json.dumps(e) for e in events)
headers = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/x-ndjson",
}
response = requests.post(API_URL, data=ndjson_payload, headers=headers)
if response.status_code == 202:
trombone_count = sum(1 for e in events if e["detour_type"] == "TROMBONE")
policy_count = sum(1 for e in events if e["detour_type"] == "POLICY")
direct_count = sum(1 for e in events if e["detour_type"] == "DIRECT")
print(f"Ingested {batch_size} events | "
f"TROMBONE: {trombone_count} | "
f"POLICY: {policy_count} | "
f"DIRECT: {direct_count}")
else:
print(f"Ingestion failed: {response.status_code}")
print(response.text)
The console output after a successful run looks like this:
Connecting to: https://api.tinybird.co/v0/events?name=network_edges
Using Token (first 10 chars): p.eyJ1IjogI...
Nodes loaded: 10 African cities
Generating synthetic BGP topology events...
Ingested 20 events | TROMBONE: 4 | POLICY: 4 | DIRECT: 12
Ingested 20 events | TROMBONE: 3 | POLICY: 5 | DIRECT: 12
Ingested 20 events | TROMBONE: 5 | POLICY: 3 | DIRECT: 12
Ingested 20 events | TROMBONE: 4 | POLICY: 4 | DIRECT: 12
Ingested 20 events | TROMBONE: 3 | POLICY: 5 | DIRECT: 12
The TROMBONE/POLICY/DIRECT split per batch is a live sanity check that the ~40% European routing assumption is holding in the synthetic data before I connect the live RIPE RIS feed.
Why Tinybird for this workload
BGP event streams are high-cardinality and append-only. Once a routing event is recorded, it is a historical fact, it never gets updated, only appended to. This is the workload ClickHouse® was built for, and the workload that causes pain in a row-store like Postgres.
The specific capability that makes Tinybird load-bearing for Kijiji is the ability to run path-analysis aggregations in SQL before any data touches the Python model layer:
-- Tinybird Pipe: trombone_candidates
-- Pre-filters 7 days of BGP events into actionable city pairs
-- for the GNN inference snapshot
SELECT
src_city,
dst_city,
avg(detour_ratio) AS mean_ratio,
count() AS event_count,
sum(observed_latency_ms - src_mean_latency_ms) AS total_excess_latency_ms,
avg(dst_gdp_per_capita) AS dst_gdp_avg
FROM network_edges
WHERE
detour_type = 'TROMBONE'
AND toDate(timestamp) >= today() - 7
GROUP BY src_city, dst_city
HAVING mean_ratio > 2.5
ORDER BY total_excess_latency_ms DESC
On a 89ms-baseline connection from Kinshasa, pre-filtering server-side is the difference between a usable system and a loading spinner. The GNN receives a clean, aggregated graph snapshot, not 100,000 raw events it has to filter in Python.
The AS path: a note on schema evolution
One field in the event shape deserves special attention: as_path, stored as a JSON-serialised array.
# Synthetic AS path: src_asn → (optional hub ASN) → dst_asn
src_asn = random.choice(src["asns"])
dst_asn = random.choice(dst["asns"])
as_path = [src_asn, dst_asn] if via_hub is None else [src_asn, 1273, dst_asn]
# ASN 1273 = Vodafone/CWC — common European transit AS for African traffic
When I connect the live RIPE RIS feed, as_path will expand from a 2–3 element synthetic array to real BGP paths with 8–15 hops. The schema needs to accommodate this without breaking existing Pipes.
Tinybird's additive column evolution, where you can add nullable columns or widen array types without a migration ceremony, is what makes this a two-minute change instead of a painful pipeline rewrite. My schema has gone through three versions in six weeks. None of them required downtime.
The loss function: who the model serves
The GNN uses a weighted MSE loss where the weight is the inverse of GDP per capita. Lower GDP means higher weight, the model is penalised more heavily for mispredicting latency dividends in fragile cities:
def weighted_latency_loss(pred, target, gdp_tensor):
gdp_min = gdp_tensor.min()
gdp_max = gdp_tensor.max()
# Normalise to [0,1] then invert
# Kinshasa (GDP $577) → weight ≈ 0.98
# Johannesburg (GDP $6,994) → weight ≈ 0.0
gdp_weights = 1.0 - (gdp_tensor - gdp_min) / (gdp_max - gdp_min + 1e-8)
loss = (pred - target).pow(2) * gdp_weights
return loss.mean()
This is as much an ethical decision as an engineering one. An unweighted model is not neutral, it is implicitly weighted toward the nodes with the most data, which are disproportionately the nodes that already have the best infrastructure. Three lines of code changes who the model serves.
The numbers
Three baselines compared on the city graph:
| Baseline | Trombone Rate | Avg Path Latency |
|---|---|---|
| Valley-Free (policy constraint) | 38.2% | 74ms |
| Dijkstra (geodesic shortest path) | 12.4% | 31ms |
| GraphSAGE + GDP weighting | 31.7% | 61ms |
GraphSAGE reduces the predicted trombone rate by 6.5% versus Valley-Free. The simulation of a Kinshasa–Lagos peering link predicts an 87ms reduction in mean path latency for a cluster of four cities: Kinshasa, Lusaka, Dar es Salaam, Kampala, none of which currently have IXP presence.
That number, 87ms, is what the model predicts, weighted by the GDP of the people on those network paths, validated against BGP data that Tinybird makes queryable in under 100ms.
What I would tell myself at the start
Use Tinybird's Events API for append-only streams from day one. The temptation to start with a REST endpoint writing to Supabase is real. But BGP events are not relational data. They are time-series facts. Forcing append-only streams through a row store creates impedance mismatch that compounds as your event volume grows.
Denormalise your feature vectors into the event stream. Joining city metrics at query time works in development. It creates latency on degraded connections. Embedding both node feature vectors into every event is the right trade-off for a latency-sensitive inference system.
Design schemas to evolve, not to be correct. Your first schema will be wrong. Tinybird's additive evolution means that's a feature, not a crisis.
Weight your loss function for the people who need the model most. An unweighted model serves the cities that already have the best infrastructure. The inverse-GDP weighting is three lines. It changes what the model optimises for.
What's next
Three remaining milestones:
- Live BGP ingestion: replacing the synthetic generator with a real RIPE RIS WebSocket feed, piped through the Rust trombone detector into Tinybird's
network_edgesData Source - IXP partnership API: a read endpoint letting ISPs query the Regional Latency Dividend for proposed peering agreements directly
- DAAD research submission: full methodology and temporal walk-forward validation results submitted as a graduate research proposal
The project is open source: [https://github.com/Much1r1/project_kijiji]
Elvis Muchiri is an AI Engineer building Project Kijiji, a GNN-based African internet routing intelligence platform. He writes about real-time ML systems and infrastructure for emerging markets.
