Semantic Search with Hugging Face Embeddings

Overview

This project showcases a scalable, intelligent search and question-answering system that goes beyond traditional keyword lookups. It combines MongoDB’s native vector search with OpenAI’s GPT models to deliver a hybrid semantic search + AI assistant experience.

The result is a production-grade application that not only retrieves semantically relevant blog content but also generates direct, natural language answers from my own technical articles. This demonstrates how modern AI can be integrated into full-stack systems to create smart, user-friendly knowledge retrieval platforms.

How It Works: The Semantic Search Pipeline

The application runs on a multi-stage pipeline that I designed and implemented end-to-end:

  1. Content Preparation
    • My blog posts are stored as Markdown files.
    • Each file is processed, split into context-rich segments (chunks), and prepared for embedding.
  2. Vector Embeddings & Storage
    • A local Hugging Face model (all-MiniLM-L6-v2) converts each chunk into embeddings.
    • These embeddings are stored in MongoDB alongside the original content, indexed with a vector search index for fast similarity lookups.
  3. Search Modes
    • Semantic Search (/search): A user query is embedded and compared against the vector library, returning the top 5 most relevant chunks ranked by similarity.
    • AI-Powered Q&A (/ask):
      • MongoDB vector search retrieves the top 3 most relevant chunks.
      • These chunks are sent to OpenAI GPT, which generates a coherent, well-structured answer in natural language.
      • The answer is displayed with clear formatting, alongside the exact sources used.
  4. Frontend Experience
    • Built with React, Vite, and Tailwind CSS.
    • Clean, responsive UI with distinct displays for “search results” vs. “AI answers.”
    • Supports markdown rendering for GPT responses (lists, headings, emphasis).

Features

  • Semantic Search: Retrieve posts based on meaning, not just keywords.
  • AI Assistant Mode: Ask natural language questions and receive GPT-generated answers grounded in my blog content.
  • Source Attribution: Every AI answer includes the source text chunks for transparency.
  • Confidence Filtering: Low-similarity results are discarded to ensure accuracy.
  • Production-Ready UI: Interactive, user-friendly React frontend with loading states and responsive design.
  • Cost-Efficient Hybrid Design: Combines free local embeddings with selective GPT calls — practical for scaling.

Tech Stack

  • Frontend: React, Vite, Tailwind CSS — responsive UI with markdown rendering for GPT answers.
  • Backend: Node.js + Express.js — structured API endpoints (/search, /ask) with clear separation of concerns.
  • Database: MongoDB — optimized vector indexes for fast semantic similarity queries.
  • AI/ML Pipeline: Hybrid approach combining local Hugging Face embeddings (all-MiniLM-L6-v2) with OpenAI GPT for natural language answer generation.
  • Deployment: Full-stack app deployed on Render (single service for frontend + backend).
  • Content Source: Content Source: My own technical blog posts, stored as Markdown files from MHD’s Tech Hub.

What I Learned

This project provided hands-on experience with vector and semantic search, core technologies behind modern AI applications like search engines and recommendation systems. I learned to generate and manage text embeddings—turning natural language into numerical representations that capture meaning—and perform efficient similarity searches in a database.

A key lesson was optimizing search relevance: the size and granularity of text chunks significantly affect results, and fine-tuning this balance improves accuracy. I also gained practical experience with MongoDB’s vector search, creating optimized vector indexes and using the $vectorSearch aggregation operator for scalable similarity matching.

Performance optimization and confidence thresholding were essential: moving from a brute-force approach to indexed searches and tuning similarity thresholds eliminated irrelevant results, improving user experience. Overall, the project strengthened my AI/ML skills and full-stack engineering abilities, showing how to integrate sophisticated machine learning models into a production-ready web application with proper error handling and user feedback.

See It in Action

The best way to understand this project is to try it yourself. The live demo is fully functional, allowing you to ask questions based on the content of my blog.

"Share this project and encourage others to build something extraordinary!"
Mohamad Sabha
Mohamad Sabha
Articles: 27