Data & Platform Engineering

I build data platforms, from raw data to the screen.

I think as a Data Engineer

Given an open-ended goal, I find the real problem underneath it, then design and own the system that solves it. Solutions are shaped around how people operate, aiming to make their work easier.

STORAGE · QUERYBACKENDSERVE GCS AWS S3 ClickHouse Trino FastAPI viewer SPA Superset Teams
My real stack: storage and query engines feed one backend that powers the tools people use.

01 · Approach

The goal was set.
The path was mine to find.

01

A broad mandate

Ingest the data, learn the schemas, build dashboards. A starting point, but the capability teams truly needed was still unclear.

02

Embed & discover

Worked beside the people closest to the data to find where it actually hurt.

03

Decide and ship it, solo

Architecture, stack, deployment, roadmap, vision: every call, mine.

Give me a direction, not a spec.

I turn ambiguity into infrastructure, and own every decision in between.

02 · Impact

What changed because the work exists.

~0%

Faster to root-cause

No more downloading archives, unzipping, and estimating where the fault might be. It surfaces the exact failing cases, and the sessions behind a complaint, so you inspect the right ones, not a random sample.

0+

Sources, one interface

Postgres, MongoDB, Elasticsearch, cloud storage, SharePoint and Jira, all reachable through one interface.

0

Roles rely on it

Adopted across engineering, product, and customer-facing teams.

0

Manual steps in QA

Test results reach the dashboards and the team's channel on their own.

One tool across 5 device families · 20+ modules · shipped solo in ~6 months.

03 · The platform

A debugging & analytics platform, from scratch.

Telemetry that used to be locked away is now a self-serve tool teams open every day. Built solo in about six months.

Zero-build frontend

Hand-bundled vanilla JS with no framework, so there is almost nothing to maintain.

Feature "fences"

Each feature is walled off behind its own markers, so any one can be removed in a single pass without touching the rest.

Graceful degradation

If a dependency or module fails to load, the rest of the page keeps working.

Federation to ClickHouse

Moved hot-path queries off live federation into denormalized ClickHouse tables, so common lookups stay fast.

Cache and warmer

Disk-backed result cache, pre-warmed on the pipeline's refresh cadence, so we recompute only when the data changes.

Adaptive queries

Search fan-out splits into chunks and reclassifies heavy vs light at runtime.

Shipped into one tool
Archive extractionMedia transcodingCross-source search SSE / NDJSON streamingInteractive analyticsAI-assisted insights Self-serve samplingRBAC & audit

04 · Inside the platform

Four engines under one roof.

On-demand extraction, cross-source search, live analytics, and self-serve sampling, all on one FastAPI backend.

Archive extraction

Production .tar / .tar.gz / .zip pulled from cloud storage and unpacked on demand.

unstructured structured .tar.gz extract frames .png video .mp4 radar .rlf meta .json logs .log
ffmpeg NumPy SciPy matplotlib base64 inline

Session & entity search

Look up a user, org, or device, then stream its sessions from across every source.

user · org · device ClickHouse Trino ndjson stream
ClickHouse Trino NDJSON / SSE adaptive fan-out

Analytics

Interactive dashboards for data rate, quality tiers, comparisons, and AI-written insights.

ECharts μPlot pandas Claude insights

Sampler

Filter, count, and pull every matching archive with one generated command.

device date range firmware 1,240 matches $ curl … | xargs gsutil cp
Python gsutil / GCS aws / S3
Structured · pipeline

A scheduled ETL pipeline

daily · cron docker sources Trino · ELT ClickHouse BI

Trino federates the sources; a daily, Dockerized batch job lands rollups in ClickHouse for the dashboards.

Unstructured · on demand

Media & signal

frames · video radar I/Q → FFT

Image frames · video · radar / IQ signal · run logs · nested JSON blobs.

05 · Process automation

QA results that report themselves.

A pipeline I built so quality reaches the team with no human in the loop.

Robot test Ingest ClickHouse Superset notify Teams dashboards view sessions informs the team
When a test run finishes, results land in ClickHouse, refresh the dashboards, and Teams posts the channel a link straight to the session.

06 · Roles as lenses

A problem-solver first.

I start with the problem, not a job title. Solving it well usually means owning the whole thing: the data, the platform, the interface, the deployment, and what it costs to run. Each part sized to what the company needs now, and built to scale later.

Data Analyst

Turning raw data into quality metrics teams can track.

Data Engineer

Building the pipelines and aggregates that keep the numbers fresh.

Platform Engineer

Federation, caching, and streaming APIs that make data quick to reach.

Data Architect

Designing schemas and changing them without breaking what already exists.

Full-Stack / UI-UX

The app itself: its layout, interactions, and motion.

Process Improvement

Self-serve tools that remove the manual bottlenecks.

07 · Stack

Industry-standard tools, end to end.

08 · Contact

Let's talk about hard, ambiguous data problems.

I do my best work where the scope is unclear and the decisions are mine to make.