CompanyNewsCorp (USA) RoleLead Consultant Engineer PeriodMay 2023 — January 2024 TypeConsulting · Generative AI

Generative AI for the Wall Street Journal

Production Python/Go APIs for GenAI use cases at WSJ — LLM evaluation, vector retrieval, session management, and a pioneering screen-reader experiment to make articles accessible.

LLMs Falcon LLaMA Pinecone FastAPI DynamoDB Accessibility

Major LLMs evaluated for production

Vector stores benchmarked

1st

WSJ screen-reader experiment for accessibility

The brief

NewsCorp wanted to move generative AI from research demos into production capabilities for the Wall Street Journal — document understanding, retrieval-augmented generation, and session-aware AI interactions that could meet real production SLAs.

What I built

LLM evaluation and selection

Led the design and development of Python/Go APIs tailored to Generative AI business use cases including document understanding and retrieval-augmented generation.
Evaluated and benchmarked multiple LLMs — Raven, Falcon7b, LLaMA, and CodeLLaMA — assessing latency, cost, output stability, and hallucination risk for real-world deployment.
Conducted innovative experiments with fine-tuning open-source LLMs and diffusion models (Stable Diffusion), exploring the memory-efficient LoRA technique.

Retrieval systems for production

Designed vector-based retrieval systems using Pinecone, FAISS, and ChromaDB, implementing cosine similarity and retrieval strategies aligned with production SLAs.
Pioneered the use of cosine similarity search techniques for efficient data retrieval and state-of-the-art RAG.

Session-aware AI services

Built FastAPI services with DynamoDB-backed session state, supporting long-running AI interactions.
Deployed microservices using ECS, GitHub Actions, and Lambda functions to take processing off the core API server.
Wrote unit and integration tests to validate AI pipelines and prevent silent regressions.

Accessibility experimentation

Led the first experimentation for screen reading of Wall Street Journal articles, aimed at helping users with disabilities engage with WSJ content.

"The fun part of LLM work in production isn't the model. It's the dozen unglamorous services around it that decide whether the model ever gets to do anything useful."