Video poster

Pricelist Processing System

Developed an app leveraging GPT-4 Vision and custom YOLO models to digitize complex pricelist and menu images.

The Challenge

Manually transcribing restaurant menus or supplier pricelists from images into structured data is a slow, error-prone, and tedious task. The source images are often low-quality, taken at odd angles, and feature complex, multi-column layouts that cause traditional OCR tools to fail. A more intelligent, automated solution was needed to handle the visual complexity and variety of these real-world documents.

The Technical Solution

An end-to-end processing application was developed to turn messy menu photos into structured, ready-to-use data. The system uses a sophisticated, multi-stage AI pipeline to overcome the limitations of standard OCR.

  • 1. Semantic Menu Sectioning with YOLO: The core insight was that processing an entire complex menu at once was unreliable. To solve this, a custom YOLO object detection model was trained to first identify and crop logical sections from the menu (e.g., "Appetizers," "Main Courses"). This model was trained on a public dataset of over 2,000 real-world menu images, which I collected and annotated. (View the dataset on Roboflow Universe).
  • 2. Hybrid Vision-Text Extraction: Each cropped menu section was then processed individually for maximum accuracy.
    1. Azure OCR performed an initial text extraction on the image patch.
    2. Both the image patch and the raw OCR text were then fed to GPT-4 Vision. Providing both the visual context (the image) and the text improved the model's ability to correctly interpret items and prices, especially in noisy images.
  • 3. Structuring and Aggregation: GPT-4 Vision was prompted to return a structured JSON object for each menu section, containing the item name, price, and category. The system then aggregated the results from all sections into a single, complete JSON representation of the entire menu.
  • The Application: This pipeline was integrated into an internal web tool where a user could upload a menu image and, after a few moments of processing, download a perfectly structured Excel file, completely eliminating the need for manual data entry.

Results and Impact

This project successfully automated a highly manual workflow, demonstrating the power of a strategic, multi-stage AI approach.

  • Eliminated Manual Data Entry: The primary goal was achieved, freeing up staff from the tedious task of typing out menus by hand.
  • Superior Accuracy: The section-based, hybrid vision-text approach proved significantly more accurate than using a single AI model on the entire image, overcoming issues with complex layouts and poor image quality.
  • Practical, User-Focused Tool: The final output was a ready-to-use Excel file, fitting seamlessly into the existing business workflow.