Projects

Production AI Systems

Systems designed and operated at enterprise scale inside large engineering organizations. Described at a high level; source is proprietary.

LLM Evaluation Framework

Designed and operated the evaluation framework for generative AI products in production. Defined methodology across retrieval quality (Recall@k, Precision@k, MRR) and generation quality (groundedness, faithfulness), built on Databricks and Spark and adopted as standard practice across AI product teams.

RAG Pipeline Infrastructure

Architected production RAG pipelines connecting LLMs to enterprise knowledge sources. Designed embedding workflows, vector search infrastructure, and retrieval optimization for high-stakes enterprise workloads where hallucination risk is unacceptable.

Agentic Workflow Platform

Designed and deployed agentic AI systems in production: multi-turn orchestration, function calling, and tool-use patterns at enterprise scale across AWS and Azure. Built the infrastructure enabling LLMs to execute multi-step tasks autonomously against internal systems.

Internal Developer Platform

Led the engineering platform serving hundreds of developers across build, ship, and operate workflows. Designed secure-by-default CI/CD pipelines, containerization strategies, and IaC tooling that drove the shift from monolithic deployments to microservices.

Full background on CV →

Live Demos

Running systems with public source. These are live; you can interact with them or read the code.

Open Source

Public repos covering RAG evaluation, LLM benchmarking, agentic systems, and AI foundations.

Graduate Research

Projects completed during the UC Berkeley Master of Information and Data Science (MIDS) program, 2021–2023.

© 2026 Victor Ramirez
Built with Astro