Selected Work

Projects

Things I've built — mostly at the intersection of engineering and something I care about. Click any project to explore it in depth.

01

TFT Meta Mind

RAG-powered TFT meta chatbot

A RAG-powered chatbot that delivers real-time TFT meta insights, combining automated data pipelines with LLM-driven analysis and a polished conversational interface.

Python Streamlit Google Gemini 2.5 Flash ChromaDB Playwright Docker AWS Lightsail GitHub Actions APScheduler
Click to explore
02

Valorant Shop Checker

Your daily store, without the launch

Check your Valorant daily store from any device — without launching the game.

React 19 + TypeScript 5.9 Vite 7 Tailwind CSS 4 React Router 7 Python 3.12 + FastAPI httpx Riot OAuth 2.0 valorant-api.com Docker + AWS Lightsail GitHub Actions
Click to explore
03

Suzuki Intonation Trainer

Real-time pitch feedback for violin

Real-time pitch feedback for Suzuki violin students — listen, play, and see how accurate your intonation is, note by note.

React 18 + TypeScript Vite 6 Tailwind CSS v4 Zustand v5 Web Audio API + Pitchy VexFlow 5 Canvas API Recharts React Router v7 Vercel + vite-plugin-pwa
Click to explore

RAG-powered TFT meta chatbot

TFT Meta Mind

A RAG-powered chatbot that delivers real-time TFT meta insights, combining automated data pipelines with LLM-driven analysis and a polished conversational interface.

About the Project

TFT Meta Mind scrapes competitive TFT data daily, transforms it into natural language documents, embeds them into a vector database, and serves them through a Gemini-powered chatbot that understands the difference between asking about a specific unit, a team composition, or a broader strategy question.

Users interact through a dark-themed Streamlit interface with glassmorphic stat cards, session-based chat history, and pre-filled starter questions for discoverability.

Demo

Asking TFT Meta Mind about the current meta and getting grounded, data-backed answers.

Architecture & Design Decisions

Five-step daily pipeline with failure isolation

Each stage (scrape → generate docs → ingest to ChromaDB → ingest YouTube content → cleanup) runs independently. If scraping fails, it retries after 30 minutes. If ingestion fails, cleanup is skipped to preserve data for manual recovery. This keeps the system resilient without overengineering a full job queue.

Smart retrieval routing over pure embedding search

Rather than sending every question through the same vector similarity search, the chatbot classifies questions by keyword regex and adjusts which document types to prioritize — unit analysis chunks for unit questions, video guide chunks for strategy questions, a mix for comp questions. This hybrid approach significantly improved answer relevance compared to naive RAG.

Deterministic chunk IDs via MD5 hashing

Every chunk gets an ID derived from MD5(type + date + chunk_index), which means re-running the pipeline upserts instead of duplicating data. This was critical for a daily-refresh system — without it, ChromaDB would accumulate duplicate entries on every pipeline run.

Header-based semantic chunking

Documents are split on ## markdown headers (with a secondary ### split for oversized chunks). This keeps each comp or unit analysis as a single coherent chunk rather than breaking mid-sentence at a fixed token count. The tradeoff is occasional chunk size variance, but embedding quality is meaningfully better.

Two-tier YouTube processing

YouTube's transcript API blocks cloud provider IPs, so transcripts are fetched locally, processed through Gemini for structured extraction, and committed as JSON files. The Lightsail pipeline then ingests those pre-processed files. A pragmatic workaround that avoids introducing a proxy layer just for one data source.

Next.js __NEXT_DATA__ extraction

Instead of intercepting XHR requests from tactics.tools (fragile and race-condition-prone), the scraper extracts server-rendered data directly from the __NEXT_DATA__ script tag. More reliable and simpler to maintain.

Lazy-loaded Riot Data Dragon lookups

Unit/item/trait IDs from the raw data are resolved to human-readable names via Riot's CDN. The lookup table is fetched once per session, cached in memory, and falls back to raw IDs on failure — so the system never crashes over a cosmetic lookup.

Deployment

Fully containerized with Docker Compose. Two services share persistent named volumes: Streamlit (512MB) serves the web UI, and Pipeline (1024MB) runs the daily scheduled pipeline via APScheduler.

GitHub Actions CI/CD deploys on every push to master: SSH into Lightsail, pull, rebuild containers, prune old images. Simple and predictable.

What I learned

RAG quality is mostly a data problem, not a model problem. The biggest improvements to answer quality came from better document generation and chunking strategy — not from prompt engineering the LLM. Converting raw nested JSON into well-structured natural language documents, and keeping semantic units together during chunking, mattered far more than tweaking the system prompt.

Hybrid retrieval beats pure vector search for domain-specific applications. Adding keyword-based routing on top of embedding similarity was a straightforward change that noticeably improved relevance. Pure embedding search struggled to distinguish between "tell me about a unit" and "what comps use this unit" — the routing layer solved that.

Infrastructure constraints drive architecture. The YouTube two-tier pipeline, the deterministic chunk IDs, the failure-isolated pipeline steps — none of these were planned upfront. They emerged from hitting real problems (IP blocking, duplicate data, transient scrape failures) and solving them pragmatically. The best architecture decisions in this project were reactions to production issues, not predictions.

Playwright for scraping modern SPAs is powerful but heavy. It handles JavaScript-rendered content that requests + BeautifulSoup can't touch, but the memory footprint is significant. Allocating 1024MB to the pipeline container (vs 512MB for Streamlit) was a direct consequence of running headless Chromium in the pipeline.

Daily pipelines need to be idempotent. Any step that runs on a schedule will eventually run twice, fail halfway through, or process stale data. Building every stage to be safely re-runnable (via upsert semantics, deterministic IDs, and conditional cleanup) saved significant debugging time in production.

Future Enhancements

Experiment with utilizing other datastores/embedding models such as Gemini's Embedding 2 for images or video

Utilize a paid plan or openrouter for a larger token limit instead of just Gemini 2.5 Flash

Add ads for monetization and scale from small userbase (myself and friends) to larger public outreach

Support multi-region data (KR, EUW) for cross-meta comparisons

Integrate augment and item builder recommendations based on current patch data

Tech Stack

Python

Core application logic and data pipelines

Streamlit

Web UI with glassmorphic dark theme

Google Gemini 2.5 Flash

LLM for chat and document generation

ChromaDB

Vector database for embedded document chunks

Playwright

Headless browser scraping of SPAs

Docker

Containerized deployment with Compose

AWS Lightsail

Hosting for containers and persistent volumes

GitHub Actions

CI/CD pipeline for automated deploys

APScheduler

Daily pipeline scheduling

Your daily store, without the launch

Valorant Shop Checker

Check your Valorant daily store from any device — without launching the game.

About the Project

I'm a Valorant player, and one of the major painpoints I had with the game was with its shop. In order to buy a skin that you want, such as Kuronami Vandal or Soulstrife Knife (please show up in my shop soon), you either needed to buy it upon release of the skin bundle, or hope that it shows up as one of the four items that you have per daily shop. The daily shop is also different for each player and only visible in-game, meaning that you would need to launch the full game client just to take a peek at what the RNGods had in store for you.

I wanted to solve this painpoint for both me and my friends who also wanted the ability to easily check their personal shop outside of the game. The ideal end-state was to have this app be running constantly and have the ability to send email or Discord notifications once the skin(s) that you wanted finally appeared in your shop.

What started as a "this should be straightforward" side project turned into one of the more educational builds I've done, largely because Riot Games' auth system had other plans for me.

The app is a classic SPA + API setup: a React frontend on Vercel talks to a FastAPI backend running in a Docker container on AWS Lightsail, which in turn proxies authenticated requests to Riot's APIs.

Demo

Authenticating via the paste-URL flow and viewing the daily store rotation.

Architecture & Design Decisions

The paste-URL auth flow

Riot's OAuth system enforces redirect_uri=http://localhost/redirect with no option for custom redirect URIs. I tried three other approaches first — server-side credential auth (blocked by Cloudflare/captcha), OAuth redirect through the backend (impossible with the localhost restriction), and cross-origin cookies (blocked by mobile Safari). The solution: the user signs in on Riot's official login page, copies the resulting localhost URL from their address bar, and pastes it back into the app. It's unconventional, but it's the only approach that I could get working reliably for a deployed web app against Riot's OAuth, and it has the security benefit that user credentials never touch my server.

localStorage over cookies

After the cookie approach failed on mobile browsers with strict third-party cookie policies, I switched to localStorage + Authorization: Bearer headers. It works consistently across every platform I tested.

In-memory session store

For an MVP with a small user base and 3-hour access token TTLs, I chose simplicity over durability. Sessions live in a thread-safe Python dict. No database, no Redis — just restart and users re-authenticate. I don't have any plans on scaling this project to a larger userbase: it simply exists for convenience for me and my friends to easily check our shops even without access to our personal setups.

Startup asset caching

Riot's storefront API returns raw UUIDs — no skin names, no images, no tier info. I cache all metadata from valorant-api.com into O(1) lookup dictionaries when the server starts, so every request resolves instantly without downstream API calls.

sessionStorage for mobile tab-switch resilience

On mobile, switching tabs to complete Riot login kills the JS context. I persist the login UI stage in sessionStorage so users don't lose their place in the flow.

Deployment

Frontend deployed on Vercel. Backend runs as a Docker container on AWS Lightsail. GitHub Actions CI/CD auto-deploys via SSH + Docker rebuild on every push.

What I learned

Mobile is a different platform. Things I took for granted on desktop — cross-origin cookies, tab persistence, consistent localStorage behavior — all broke in subtle ways on mobile Safari. The sessionStorage fix for tab switches was a small code change but a big UX improvement that I wouldn't have caught without testing on real devices.

Caching strategy matters more than you think. The difference between resolving skin data per-request vs. at startup was the difference between a sluggish app and an instant one. Pre-loading asset data also made the app resilient to valorant-api.com downtime — a dependency I didn't want to be fragile.

Type safety across the stack pays off. Running mypy in strict mode on the backend and TypeScript strict on the frontend caught integration bugs before they reached runtime. The Pydantic response models on the backend are effectively the same shape as the TypeScript interfaces on the frontend, which made the API contract clear and self-documenting.

CI/CD is worth the upfront investment. Setting up GitHub Actions to auto-deploy the backend via SSH + Docker rebuild seemed like overkill for a personal project, but it paid for itself almost immediately. One gotcha: Docker's layer caching caused stale deployments where my code changes weren't reflected — adding --no-cache to the build step was a small but critical fix.

Tech Stack

React 19 + TypeScript 5.9

Frontend SPA with strict type safety

Vite 7

Build tooling with fast HMR

Tailwind CSS 4

Utility-first styling

React Router 7

Client-side routing

Python 3.12 + FastAPI

Async backend API with Pydantic models

httpx

Async HTTP client for Riot API proxying

Riot OAuth 2.0

Implicit grant flow with paste-URL workaround

valorant-api.com

Community-maintained skin, bundle, and tier metadata

Docker + AWS Lightsail

Containerized backend hosting

GitHub Actions

CI/CD with SSH deploy and Docker rebuild

Real-time pitch feedback for violin

Suzuki Intonation Trainer

Real-time pitch feedback for Suzuki violin students — listen, play, and see how accurate your intonation is, note by note.

About the Project

I starting playing violin in middle school, and heavily relied on the Suzuki repertoire to progress during my early stages of learning. One of the hardest parts of learning a string instrument is developing accurate intonation — unlike piano, there are no frets or keys to guide your fingers. Teachers catch what they can in a weekly lesson, but practicing alone means bad habits go uncorrected for days.

I wanted something that acts like a patient, always-available practice companion. Load the piece you're working on, hit start, and get immediate visual feedback on whether each note is sharp, flat, or in tune. The key difference from a generic tuner: it knows the score and follows along as you play, so it can judge accuracy in context — not just "you played an A" but "you played the A in measure 3 and it was 8 cents sharp."

The Suzuki repertoire was a natural fit. The pieces are standardized worldwide, progress in difficulty, and almost every violin student knows them. Structured tracking across a known curriculum makes progress data meaningful.

Demo

Playing through a Suzuki piece with real-time pitch feedback and color-coded notation.

Architecture & Design Decisions

Audio pipeline

Microphone → Web Audio AnalyserNode (FFT size 4096) → requestAnimationFrame loop (~30fps) → Pitchy autocorrelation → { frequency, clarity } → PitchProcessor → { noteName, octave, cents } → Zustand store → React UI. Mic configuration disables AGC, echo cancellation, and noise suppression — these are tuned for speech and actively distort instrument audio.

Score-following state machine

Detected pitch matches expected note → start sustain timer. Sustain exceeds beat duration × tolerance → record hit with averaged cents → advance cursor. Detected pitch matches next expected note → record miss on current → skip ahead. Silence for 300ms → reset detection state. Timing-based advancement is critical for pieces like the Twinkle variations, where "A, A, A, A" must register as four separate notes rather than one held note.

Three Zustand stores split by lifecycle

TrainerStore (ephemeral) — live pitch data, playback state, current piece. SessionStore (ephemeral) — note results and summary stats for the current session. ProgressStore (persisted) — long-term history, per-piece trends, practice streaks. Only the progress store serializes to localStorage. High-frequency pitch data stays ephemeral — no serialization overhead on the hot path.

Structured score data

Each piece is a structured JSON file with title, key, time signature, and measures containing notes with pitch, MIDI number, duration, string, and finger. Loaded via import.meta.glob for automatic code-splitting.

Four-tier intonation color system

Green (±5¢) — In tune. Yellow (±15¢) — Slightly off. Orange (±30¢) — Off. Red (>30¢) — Very off. Applied consistently across the pitch gauge, real-time graph, sheet music coloring, and post-session heatmap so the feedback language is always the same.

Deployment

Deployed on Vercel with vite-plugin-pwa for offline support. HTTPS (required for microphone access via getUserMedia) comes free with Vercel deployment. The PWA is installable on any device for practice rooms without WiFi.

What I learned

Pitch detection is harder than it sounds. My first attempt used FFT peak-picking — it was wildly inaccurate on violin. The instrument's rich harmonic series meant the algorithm would frequently lock onto an overtone instead of the fundamental. Switching to autocorrelation (via the Pitchy library) was a night-and-day improvement. I also learned that browser audio defaults (noise suppression, automatic gain control) are enemies of musical pitch detection — they're tuned for voice frequencies and actively distort instrument audio.

Score following is a surprisingly deep problem. The naive approach ("did the player play the right note?") breaks immediately on repeated notes — common in Suzuki Book 1. Timing-based heuristics solved most cases, but edge cases around tempo variation and missed notes required careful state machine design. A tolerance multiplier on beat duration keeps things forgiving without being sloppy. I considered Dynamic Time Warping but found the sequential matcher sufficient for the structured Suzuki repertoire.

Real-time rendering demands escape hatches from React. The cents deviation graph needed Canvas, and the sheet music cursor needed direct DOM manipulation. React's reconciliation cycle is too slow for 30+ fps visual updates. Learning when to step outside the framework — refs, Canvas, direct element attribute updates — was key to keeping the UI responsive while still using React for the structural UI.

VexFlow has a learning curve. Rendering proper music notation — beaming, accidentals, key signatures, multi-system layouts — required deep understanding of VexFlow's rendering pipeline. Getting responsive layout (measures-per-system recalculated via ResizeObserver on container resize) took significant iteration. The payoff is notation that looks professional, not toy-like.

PWA for musicians makes sense. Violin practice happens in music rooms, studios, and practice spaces — often without reliable WiFi. Making the app installable and offline-capable via service workers means students can use it anywhere.

Future Enhancements

Expand to Suzuki Books 2–4 with automatic difficulty progression

Add rhythm detection alongside pitch for complete performance feedback

Recording functionality — allow the user to listen back to their recording with the intonation graph

Add login and authentication for individual users to store their own personal progress

Tech Stack

React 18 + TypeScript

Type safety across audio, score, and UI layers

Vite 6

Fast HMR, native ESM, import.meta.glob for score loading

Tailwind CSS v4

CSS-first config with @theme for the custom color system

Zustand v5

Three stores split by lifecycle; persist middleware for progress

Web Audio API + Pitchy

Autocorrelation handles violin harmonics better than FFT

VexFlow 5

Professional SVG music rendering (beaming, key sigs, accidentals)

Canvas API

60fps cents graph without React reconciliation overhead

Recharts

Progress dashboard line/bar charts

React Router v7

Layout routes with lazy loading

Vercel + vite-plugin-pwa

HTTPS (required for mic access) + offline support