Data & Platform Engineering

Data platforms, built end‑to‑end — federation to pixels.

I think as a Data Engineer

Given an open-ended goal, I scope it, choose the architecture, and ship the whole stack — the data layer and the interface people actually use.

STORAGE · QUERYBACKENDSERVE GCS AWS S3 ClickHouse Trino FastAPI viewer SPA Superset Teams
My real stack — storage & query engines → one backend → the surfaces people use.

01 — Approach

The goal was set.
The path was mine to find.

01

A broad mandate

Ingest the data, learn the schemas, build dashboards — a starting point, but the capability teams truly needed was still unclear.

02

Embed & discover

Worked beside the people closest to the data to find where it actually hurt.

03

Decide & ship — solo

Architecture, stack, deployment, roadmap, vision: every call, mine.

Give me a direction, not a spec.

I turn ambiguity into infrastructure — and own every decision in between.

02 — Impact

What changed because the work exists.

<0s

To surface any case

Self-serve — no archive digging, no bespoke query.

0

Requests through a gatekeeper

Engineers answer their own data questions.

0%

Hands-free QA reporting

Results flow to dashboards & Teams on their own.

0mo

Empty repo → daily driver

Solo, from nothing to a tool teams open daily.

03 — The platform

A debugging & analytics platform, from scratch.

Locked-away telemetry → a self-serve tool teams open every day. ~6 months, solo.

Zero-build frontend

Hand-bundled vanilla JS — near-zero maintenance.

Feature "fences"

Every feature removes in one pass.

Graceful degradation

A missing piece never kills the page.

Federation → ClickHouse

Denormalized hot paths for speed.

Cache + warmer

Compute right-sized to refresh cadence.

Adaptive queries

Fan-out reclassifies heavy/light at runtime.

Shipped into one tool
Archive extractionMedia transcodingCross-source search SSE / NDJSON streamingInteractive analyticsAI-assisted insights Self-serve samplingRBAC & audit

04 — Inside the platform

Four engines under one roof.

On-demand extraction, federated search, live analytics, and self-serve sampling — one FastAPI backend behind them all.

Archive extraction

Production .tar / .tar.gz / .zip pulled from cloud storage and unpacked on demand.

unstructured structured .tar.gz extract frames .png video .mp4 radar .rlf meta .json logs .log
ffmpeg NumPy SciPy matplotlib base64 inline

Session & entity search

Resolve a user, org, or device, then stream its sessions — federated across engines.

user · org · device ClickHouse Trino ndjson stream
ClickHouse Trino NDJSON / SSE adaptive fan-out

Analytics

Interactive dashboards — data rate, distributions, tiered quality, compare mode, AI insights.

ECharts μPlot pandas Claude insights

Sampler

Filter, count, and pull every matching archive with one generated command.

device date range firmware 1,240 matches $ curl … | xargs gsutil cp
Python gsutil / GCS aws / S3
Structured — the pipeline

Scheduled ETL → aggregates

daily · cron docker sources Trino · ELT ClickHouse BI

Trino federates the sources; a daily, Dockerized batch job lands rollups in ClickHouse for the dashboards.

Unstructured — on demand

Media & signal

frames · video radar I/Q → FFT

Image frames · video · radar / IQ signal · run logs · nested JSON blobs.

05 — Process automation

QA results that report themselves.

A pipeline I built so quality reaches the team with no human in the loop.

Robot test Ingest ClickHouse Superset notify Teams dashboards view sessions informs the team
When a run finishes, results land in ClickHouse and refresh Superset — then Teams alerts the channel, linking straight to the exact session in the viewer.

06 — Roles as lenses

One person, six disciplines.

Data Analyst

Telemetry → tracked quality metrics.

Data Engineer

Pipelines & ClickHouse aggregates.

Platform Engineer

Federation, caching, streaming APIs.

Data Architect

Schema & safe, additive change.

Full-Stack / UI-UX

The SPA — layout, interaction, motion.

Process Improvement

Self-serve tooling that kills bottlenecks.

07 — Stack

Industry-standard tools, end to end.

08 — Contact

Let's talk about hard, ambiguous data problems.

I do my best work where the scope is unclear and the decisions are mine to make.