Explore Multilingual Search and RAG with the SEA‑LION Embedding Demo

Imagine being able to search and ask questions across your documents in English, Malay, Indonesian, Thai, Vietnamese, Filipino, and more—all from a simple browser interface.

The SEA‑LION Embedding Demo makes this possible. It’s a simple Docker‑based app that utilizes the SEA-LION embedding models to demonstrate:

  • Retrieval‑augmented generation (RAG) over your own documents
  • Multilingual semantic search (search by meaning, not just keywords)
  • Text similarity comparison between excerpts from different languages
  • All running on a clean Gradio web UI that you can share with teammates and stakeholders

Under the hood, the app combines:

  • A SEA‑LION embedding model (e.g. aisingapore/SEA-LION-ModernBERT-Embedding-300M)
  • ChromaDB as the vector database
  • An OpenAI‑compatible LLM endpoint (local or cloud)
  • A Docker‑based deployment that keeps everything self‑contained

This guide is for anyone who wants to try our embedding model or get a hands‑on feel for multilingual RAG quickly – evaluating LLM capabilities in Southeast Asian languages.

Before You Start: What You’ll Need

To follow this guide, you need:

  • A machine running:
    • Linux, or
    • macOS, or
    • Windows with WSL2
  • Docker
  • Access to an OpenAI‑compatible LLM endpoint, for example:
    • Ollama running a SEA‑LION LLM locally, or
    • SEA‑LION API with an API key, or
    • Another provider that supports the OpenAI-compatible API format
  • Ample free disk space and RAM for running models locally. ~2 – 3GB for embedding model, ~4 – 20GB for Ollama LLM, depending on choice of model/quantization.

For Windows users: we recommend running Docker inside WSL2 (Ubuntu or similar), rather than directly on Windows. This typically results in smoother performance and fewer networking/file‑system issues.

Step 1 – Set Up Docker

macOS & Linux

  1. Install Docker Desktop (macOS) or Docker Engine (Linux):
    https://www.docker.com/products/docker-desktop/
  2. Start Docker and make sure it’s running.
  3. Open a terminal and check:docker --versionYou should see a version number.

Windows (Recommended: WSL2 + Docker Desktop)

  1. Make sure WSL2 is enabled on your machine (Windows 10/11). If not, follow Microsoft’s official WSL2 instructions.
  2. Install Docker Desktop for Windows:
    https://www.docker.com/products/docker-desktop/
  3. In Docker Desktop settings:
    • Enable WSL2 integration and select your Ubuntu distro.
  4. Open your WSL2 terminal (Ubuntu) and check: docker --version

You’ll run all demo commands inside WSL2. This keeps your environment closer to Linux, which is what most ML tooling targets.

Step 2 – Get the SEA‑LION Embedding Demo

Clone or download the demo repository:

git clone https://github.com/aisingapore/sealion-embedding-demo.git
cd sealion-embedding-demo

Alternatively, you can:

  • Download the ZIP from GitHub
  • Unzip it
  • Use your terminal to cd into the extracted sealion-embedding-demo folder

Once you’re in the project folder, you’re ready to configure the LLM backend and start the app.

Step 3 – Choose an LLM Backend

The app requires an LLM endpoint that speaks the OpenAI API format. Two popular options:

Option A: Local SEA‑LION via Ollama (Great for Offline Prototyping)

  1. Install Ollama:
    https://ollama.com
  2. In your terminal, pull a SEA‑LION LLM model, for example: ollama pull aisingapore/Qwen-SEA-LION-v4-32B-IT
    More SEA‑LION models for Ollama:
    https://ollama.com/aisingapore?sort=newest
  3. Ollama runs an API at http://localhost:11434. From Docker, we’ll reach it via host.docker.internal.

Option B: SEA‑LION API (Managed, Cloud‑Hosted)

  1. Get an API key from the SEA‑LION Playground:
    https://playground.sea-lion.ai/key-manager
  2. Note your API key and preferred model name.

Other OpenAI‑compatible endpoints (vLLM, Bedrock Access Gateway, etc.) can also be used; details are in the repo README and SEA‑LION docs:
https://docs.sea-lion.ai/guides/inferencing

Step 3a (Optional) – Pre‑Download the SEA‑LION Embedding Model via Hugging Face Hub

By default, the demo will download the SEA‑LION embedding model inside the Docker container the first time you run it. This is perfectly fine, but the first run can be a bit slow and the model may need to be re‑downloaded if you rebuild containers frequently.

If you’d like more control—and potentially faster first runs—you can pre‑download the model on your machine using the Hugging Face Hub, and let Docker reuse that cache.

(i) Install the Hugging Face Client Library

If you already have Python installed, install the Hugging Face Hub client:

pip install huggingface_hub

You can find a friendly quick‑start guide here:
https://huggingface.co/docs/huggingface_hub/quick-start

(ii) Download the SEA‑LION Embedding Model

The default embedding model used by the demo is:
aisingapore/SEA-LION-ModernBERT-Embedding-300M

From your terminal, run:

hf download aisingapore/SEA-LION-ModernBERT-Embedding-300M

This command uses the Hugging Face Hub client to download the model into your local cache. For more details on model downloads, see:
https://huggingface.co/docs/hub/models-downloading#using-the-hugging-face-client-library

Typical cache locations are:

  • Linux / macOS: ~/.cache/huggingface/
  • Windows (WSL2): /home/<your-username>/.cache/huggingface/ inside WSL2

(iii) Let Docker Reuse the Cache

Your docker-compose.yml is already set up to mount a Hugging Face cache folder into the container:

volumes:
  - ${HF_CACHE_PATH:-~/.cache/huggingface}:/root/.cache/huggingface

You have two options:

  • Use the default (if your cache is at ~/.cache/huggingface):
    Do nothing; Docker will see and reuse the cache automatically.
  • Use a custom cache path:
    Set HF_CACHE_PATH in your .env file to point to the directory you downloaded your model to, for example:HF_CACHE_PATH=/home/your-username/path/to/another/folder

With this in place, when the app container needs aisingapore/SEA-LION-ModernBERT-Embedding-300M, it will find it in the mounted cache instead of downloading it from scratch.

Step 4 – Quick Configuration via .env

In the project root, copy the example config file:

cp .env.example .env

Open .env in a text editor and update just a few key lines, depending on your chosen backend.

Example: Ollama Setup

For a typical local Ollama + SEA‑LION setup, you might use:

EMBEDDING_MODEL=aisingapore/SEA-LION-ModernBERT-Embedding-300M

OPENAI_BASE_URL=http://host.docker.internal:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=aisingapore/Qwen-SEA-LION-v4-32B-IT

CHROMA_HOST=chromadb
CHROMA_PORT=8000

If you want to tune chunk sizes, retrieval parameters, or switch to SEA‑LION API or another provider, refer to the detailed explanations in the repository README:
https://github.com/aisingapore/sealion-embedding-demo

Step 5 – Start the Demo with Docker Compose

The demo is designed to run as two Docker services:

  • chromadb: a ChromaDB vector database
  • app: the Gradio web app using SEA‑LION embeddings and your chosen LLM backend

The docker-compose.yml (already included) looks like this:

services:
  chromadb:
    image: chromadb/chroma:1.5.4
    volumes:
      - chroma_data:/data
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD-SHELL", "bash -c 'exec 3<>/dev/tcp/localhost/8000 && echo -e \"GET /api/v2/heartbeat HTTP/1.0\\r\Host: localhost\\r\\\r\\" >&3 && cat <&3 | grep -q nanosecond'"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 5s

  app:
    build: .
    ports:
      - "7860:7860"
    volumes:
      - ${HF_CACHE_PATH:-~/.cache/huggingface}:/root/.cache/huggingface
      - ./documents:/app/documents
      - ./sample_data:/app/sample_data:ro

    env_file:
      - .env
    depends_on:
      chromadb:
        condition: service_healthy
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  chroma_data:

To start everything:

docker compose up --build -d

(or docker-compose up --build -d if your Docker still uses the older CLI)

Exclude the -d flag if you wish to view the logs, otherwise you may inspect them via container logs on Docker Desktop.

On first launch, Docker will:

  • Build the app image
  • Pull ChromaDB
  • Download Python dependencies
  • Download the SEA‑LION embedding model (if not already cached)

This can take a few minutes for the very first time. Subsequent starts will be much faster as certain steps are cached.

View the logs in your terminal via docker compose logs -f. Cancel it using Ctrl + C.

Step 6 – Open the Web App and Explore

When the app is ready, you’ll see log messages in your app Docker container including something like:

Running on local URL:  http://0.0.0.0:7860

Open your browser and go to:
http://localhost:7860

You should see the SEA‑LION Embedding Demo UI, typically with tabs such as:

  • Semantic Search
  • Cross‑Lingual Similarity
  • RAG Q&A
  • Document Management / Re‑index

Here’s what you can do immediately:

(i) Try Multilingual Semantic Search

  • Go to the Semantic Search tab.
  • Use one of the built‑in sample documents (already mounted into the container).
  • Type a query in English, Malay, Indonesian, Thai, etc.
  • See the most relevant chunks, with scores and snippets.
Screenshot of a multilingual semantic search interface showing a query input box, search results with rankings, scores, source documents, and text excerpts about various iconic dishes, such as nasi lemak.

(ii) Play with Cross‑Lingual Similarity

  • Go to the Similarity tab.
  • Enter two sentences in different languages.
  • Check how similar they are according to SEA‑LION embeddings.
Interface of the Cross-Lingual Similarity Explorer tool showcasing sentence comparison and cosine similarity evaluation.

(iii) Ask Questions with RAG Q&A

  • Open the RAG Q&A tab.
  • Ask a natural‑language question.
  • The app retrieves the top matching document chunks and passes them to your configured LLM.
  • You get grounded answers, plus visibility into which sources were used.

It’s a quick way to show RAG in action and to demo how SEA‑LION behaves in everyday queries.

Screenshot of a Q&A interface showing a user asking about festivals in Thailand, with information about Songkran, the Thai New Year festival.

Step 7 – Bring Your Own Documents

The real magic happens when you connect SEA‑LION to your own content.

In the project folder, you’ll see a documents/ directory. It’s mounted into the container, so any files you drop there become candidates for indexing.

Supported formats include:

  • .txt, .md, .rst
  • .yaml, .yml, .json, .csv, .xml
  • .html, .htm
  • .pdf, .docx

To index your content:

  1. Copy your files into the documents/ folder.
  2. In the web UI, go to the Document Management or Re‑index tab.
  3. Click the button to re‑index.
Screenshot of a document management interface displaying indexed documents with their details, including document name, chunk count, last modified date, and source folder. A status update indicates 'Re-index complete.'

In this example, ricefarmer_id.md was added to documents folder and added to the database

The app will:

  • Read your files
  • Split them into chunks
  • Generate SEA‑LION embeddings
  • Store them in your local ChromaDB instance

You can now use Semantic Search and RAG Q&A directly on your own data.
This makes it incredibly easy to build internal demos and proof‑of‑concepts for use cases like:

  • Knowledge base search
  • Policy / SOP Q&A
  • Multilingual content discovery

For removal of files from your database, type the document name or use a wildcard to remove multiple documents.

Screenshot of a Document Management interface showing sections for managing indexed documents, including options to remove a document named 'sea.txt' and displaying status updates about removed chunks.

Step 8 – Stop and Restart

To stop the app, run the following command in the app’s root folder:

docker compose down

This stops the containers but keeps:

  • Your indexed vectors in the chroma_data Docker volume
  • Your documents in the documents/ folder on your machine

So the next time you want to use the demo, it’s as simple as:

docker compose up -d

and then visiting http://localhost:7860 again.

Where to Go Next

The SEA‑LION Embedding Demo is intentionally simple, so you can focus on exploring the quality of multilingual search and RAG rather than wrestling with infrastructure.

From here, you might want to: