Projects

Production AI Systems

Systems designed and operated at enterprise scale inside large engineering organizations. Described at a high level; source is proprietary.

LLM Evaluation Framework

Designed and operated the evaluation framework for generative AI products in production. Defined methodology across retrieval quality (Recall@k, Precision@k, MRR) and generation quality (groundedness, faithfulness), built on Databricks and Spark and adopted as standard practice across AI product teams.

RAG Pipeline Infrastructure

Architected production RAG pipelines connecting LLMs to enterprise knowledge sources. Designed embedding workflows, vector search infrastructure, and retrieval optimization for high-stakes enterprise workloads where hallucination risk is unacceptable.

Agentic Workflow Platform

Designed and deployed agentic AI systems in production: multi-turn orchestration, function calling, and tool-use patterns at enterprise scale across AWS and Azure. Built the infrastructure enabling LLMs to execute multi-step tasks autonomously against internal systems.

Internal Developer Platform

Led the engineering platform serving hundreds of developers across build, ship, and operate workflows. Designed secure-by-default CI/CD pipelines, containerization strategies, and IaC tooling that drove the shift from monolithic deployments to microservices.

Full background on CV →

Live Demos

Running systems with public source. These are live; you can interact with them or read the code.

edge-ai-agent-lab: MCP Worker
Live · MCP

Live Cloudflare Worker implementing the Model Context Protocol (MCP). Exposes three tools (time_now, worker_info, and echo) demonstrating MCP server patterns, Workers AI binding, and edge AI deployment at the Cloudflare edge. A reference implementation for agentic tool-use architecture. Source on GitHub.

TrustClaw: Autonomous Email Summarization Agent
Live · Claude

Autonomous email-summarization agent that integrates a Gmail webhook with Claude for AI-powered summarization, proxies all dependencies through JFrog Artifactory, and deploys to Vercel. Includes a full Artifactory supply-chain audit trail across 900+ packages and a 5-point DX friction report. Demonstrates forward-deployed AI architecture with enterprise-grade dependency governance.

AI-Vic: Conversational AI on the Edge
Live · Workers AI

A portfolio assistant grounded in resume, projects, and background — deployed as a Cloudflare Worker using Workers AI and Llama 3.3 70B. Demonstrates system prompt grounding, edge inference, rate limiting, and conversational UX patterns without a RAG layer. Use the chat widget on this site to interact with it live.

Open Source

Public repos covering RAG evaluation, LLM benchmarking, agentic systems, and AI foundations.

RAG Evaluation Lab
RAG · Eval

Fully offline, beginner-friendly lab for evaluating RAG systems. Includes synthetic datasets, embeddings, vector search, and retrieval metrics (Recall@k, Precision@k, MRR, and groundedness scoring), built for reproducible RAG evaluation without API dependencies.

OpenAI Foundations
OpenAI

Tutorials and demos for building real-world applications with OpenAI APIs, covering function calling, RAG, embeddings, and multimodal applications through four progressive labs from API basics to production patterns.

Chatbot Evaluation System
LLM-as-Judge

Black-box pairwise evaluation baseline for comparing chatbot versions using LLM-as-judge. Supports fake and edge judge paths, audit metadata, and CI integration, with a roadmap toward conversation-level multi-turn evaluation.

AI Operating System (AI-OS)
Agents · MCP

Enterprise AI operating system built on LangGraph and MCP that orchestrates multi-agent workflows, tool integrations, and agentic task execution. Demonstrates production patterns for Claude-powered agents in enterprise environments.

LATAM GenAI Lakehouse Benchmark
LATAM · Spark

Lakehouse-native evaluation framework measuring regional Spanish LLM performance (El Salvador vs Peru) using Delta tables, Spark, and Databricks. Applies Bronze/Silver/Gold data architecture to LLM benchmarking at scale.

Graduate Research

Projects completed during the UC Berkeley Master of Information and Data Science (MIDS) program, 2021–2023.

ML System Engineering & MLOps
MLOps

End-to-end ML platform built on Kubernetes and microservices, including containerized model serving, automated retraining pipelines, CI/CD for ML, and production monitoring. Stack: Kubernetes, Docker, FastAPI, MLflow.

Machine Learning at Scale: Flight Delay Prediction
Spark

Distributed ML pipeline predicting flight delays across 30M+ records using MapReduce, Hadoop, and Apache Spark on Databricks. Applied ensemble methods (GBT, Random Forest) with feature engineering on temporal and weather data.

Machine Learning: Understanding Hate Crime Patterns
TensorFlow

Applied linear regression and TensorFlow to identify socioeconomic and demographic predictors of hate crime rates across U.S. counties. Surfaced statistically significant correlations to inform policy research.

Data Engineering: Location Recommendations with NoSQL
NoSQL

Multi-database recommendation engine using Neo4j (graph traversal), MongoDB (document store), and Redis (caching) to generate personalized store location suggestions at low latency.

Data Analysis: NFL Big Data Bowl
EDA

Exploratory data analysis on NFL tracking data using Python, NumPy, and Pandas. Analyzed player movement patterns and derived game-level insights from raw positional data.

Statistical Analysis: Movie Revenue Regression Study
Stats

Designed and executed a research study on movie revenue predictors using OLS regression, hypothesis testing, and diagnostic analysis to identify drivers of box office performance.

Data Visualization: Travel Guide Reimagined
Tableau

Interactive Tableau dashboard reimagining travel data as a visual guide, layering geographic, seasonal, and sentiment data to surface non-obvious destination insights.

Capstone: enRoute, Running Route Safety App
iOS

iOS app leveraging real-time safety data and ML-based route scoring to recommend safe running routes. Full mobile + backend stack built as UC Berkeley MIDS capstone project.

Projects

Production AI Systems

LLM Evaluation Framework

RAG Pipeline Infrastructure

Agentic Workflow Platform

Internal Developer Platform

Live Demos

edge-ai-agent-lab: MCP Worker Live · MCP

TrustClaw: Autonomous Email Summarization Agent Live · Claude

AI-Vic: Conversational AI on the Edge Live · Workers AI

Open Source

RAG Evaluation Lab RAG · Eval

OpenAI Foundations OpenAI

Chatbot Evaluation System LLM-as-Judge

AI Operating System (AI-OS) Agents · MCP

LATAM GenAI Lakehouse Benchmark LATAM · Spark

Graduate Research

ML System Engineering & MLOps MLOps

Machine Learning at Scale: Flight Delay Prediction Spark

Machine Learning: Understanding Hate Crime Patterns TensorFlow

Data Engineering: Location Recommendations with NoSQL NoSQL

Data Analysis: NFL Big Data Bowl EDA

Statistical Analysis: Movie Revenue Regression Study Stats

Data Visualization: Travel Guide Reimagined Tableau

Capstone: enRoute, Running Route Safety App iOS