All work
CompanyNewsCorp (USA)
RoleLead Consultant Engineer
PeriodMay 2023 — January 2024
TypeConsulting · Generative AI
Generative AI for the Wall Street Journal
Production Python/Go APIs for GenAI use cases at WSJ — LLM evaluation, vector retrieval, session management, and a pioneering screen-reader experiment to make articles accessible.
LLMs
Falcon
LLaMA
Pinecone
FastAPI
DynamoDB
Accessibility
4
Major LLMs evaluated for production
3
Vector stores benchmarked
1st
WSJ screen-reader experiment for accessibility
The brief
NewsCorp wanted to move generative AI from research demos into production capabilities for the Wall Street Journal — document understanding, retrieval-augmented generation, and session-aware AI interactions that could meet real production SLAs.
What I built
LLM evaluation and selection
- Led the design and development of Python/Go APIs tailored to Generative AI business use cases including document understanding and retrieval-augmented generation.
- Evaluated and benchmarked multiple LLMs — Raven, Falcon7b, LLaMA, and CodeLLaMA — assessing latency, cost, output stability, and hallucination risk for real-world deployment.
- Conducted innovative experiments with fine-tuning open-source LLMs and diffusion models (Stable Diffusion), exploring the memory-efficient LoRA technique.
Retrieval systems for production
- Designed vector-based retrieval systems using Pinecone, FAISS, and ChromaDB, implementing cosine similarity and retrieval strategies aligned with production SLAs.
- Pioneered the use of cosine similarity search techniques for efficient data retrieval and state-of-the-art RAG.
Session-aware AI services
- Built FastAPI services with DynamoDB-backed session state, supporting long-running AI interactions.
- Deployed microservices using ECS, GitHub Actions, and Lambda functions to take processing off the core API server.
- Wrote unit and integration tests to validate AI pipelines and prevent silent regressions.
Accessibility experimentation
- Led the first experimentation for screen reading of Wall Street Journal articles, aimed at helping users with disabilities engage with WSJ content.
"The fun part of LLM work in production isn't the model. It's the dozen unglamorous services around it that decide whether the model ever gets to do anything useful."