Explore Multilingual Search and RAG with the SEA‑LION Embedding Demo
Imagine being able to search and ask questions across your documents in English, Malay, Indonesian, Thai, Vietnamese, Filipino, and more—all from a simple browser interface.
The SEA‑LION Embedding Demo makes this possible. It’s a simple Docker‑based app that utilizes the SEA-LION embedding models to demonstrate:
- Retrieval‑augmented generation (RAG) over your own documents
- Multilingual semantic search (search by meaning, not just keywords)
- Text similarity comparison between excerpts from different languages
- All running on a clean Gradio web UI that you can share with teammates and stakeholders
Under the hood, the app combines:
- A SEA‑LION embedding model (e.g. aisingapore/SEA-LION-ModernBERT-Embedding-300M)
- ChromaDB as the vector database
- An OpenAI‑compatible LLM endpoint (local or cloud)
- A Docker‑based deployment that keeps everything self‑contained
This guide is for anyone who wants to try our embedding model or get a hands‑on feel for multilingual RAG quickly – evaluating LLM capabilities in Southeast Asian languages.
- The code repository can be found here:
https://github.com/aisingapore/sealion-embedding-demo - Explore AI Singapore’s model collections here:
https://huggingface.co/aisingapore/collections - SEA‑LION documentation:
https://docs.sea-lion.ai
Before You Start: What You’ll Need
To follow this guide, you need:
- A machine running:
- Linux, or
- macOS, or
- Windows with WSL2
- Docker
- Access to an OpenAI‑compatible LLM endpoint, for example:
- Ollama running a SEA‑LION LLM locally, or
- SEA‑LION API with an API key, or
- Another provider that supports the OpenAI-compatible API format
- Ample free disk space and RAM for running models locally. ~2 – 3GB for embedding model, ~4 – 20GB for Ollama LLM, depending on choice of model/quantization.
For Windows users: we recommend running Docker inside WSL2 (Ubuntu or similar), rather than directly on Windows. This typically results in smoother performance and fewer networking/file‑system issues.
Step 1 – Set Up Docker
macOS & Linux
- Install Docker Desktop (macOS) or Docker Engine (Linux):
https://www.docker.com/products/docker-desktop/ - Start Docker and make sure it’s running.
- Open a terminal and check:
docker --versionYou should see a version number.
Windows (Recommended: WSL2 + Docker Desktop)
- Make sure WSL2 is enabled on your machine (Windows 10/11). If not, follow Microsoft’s official WSL2 instructions.
- Install Docker Desktop for Windows:
https://www.docker.com/products/docker-desktop/ - In Docker Desktop settings:
- Enable WSL2 integration and select your Ubuntu distro.
- Open your WSL2 terminal (Ubuntu) and check:
docker --version
You’ll run all demo commands inside WSL2. This keeps your environment closer to Linux, which is what most ML tooling targets.
Step 2 – Get the SEA‑LION Embedding Demo
Clone or download the demo repository:
git clone https://github.com/aisingapore/sealion-embedding-demo.git cd sealion-embedding-demo
Alternatively, you can:
- Download the ZIP from GitHub
- Unzip it
- Use your terminal to cd into the extracted sealion-embedding-demo folder
Once you’re in the project folder, you’re ready to configure the LLM backend and start the app.
Step 3 – Choose an LLM Backend
The app requires an LLM endpoint that speaks the OpenAI API format. Two popular options:
Option A: Local SEA‑LION via Ollama (Great for Offline Prototyping)
- Install Ollama:
https://ollama.com - In your terminal, pull a SEA‑LION LLM model, for example:
ollama pull aisingapore/Qwen-SEA-LION-v4-32B-IT
More SEA‑LION models for Ollama:
https://ollama.com/aisingapore?sort=newest - Ollama runs an API at
http://localhost:11434. From Docker, we’ll reach it viahost.docker.internal.
Option B: SEA‑LION API (Managed, Cloud‑Hosted)
- Get an API key from the SEA‑LION Playground:
https://playground.sea-lion.ai/key-manager - Note your API key and preferred model name.
Other OpenAI‑compatible endpoints (vLLM, Bedrock Access Gateway, etc.) can also be used; details are in the repo README and SEA‑LION docs:
https://docs.sea-lion.ai/guides/inferencing
Step 3a (Optional) – Pre‑Download the SEA‑LION Embedding Model via Hugging Face Hub
By default, the demo will download the SEA‑LION embedding model inside the Docker container the first time you run it. This is perfectly fine, but the first run can be a bit slow and the model may need to be re‑downloaded if you rebuild containers frequently.
If you’d like more control—and potentially faster first runs—you can pre‑download the model on your machine using the Hugging Face Hub, and let Docker reuse that cache.
(i) Install the Hugging Face Client Library
If you already have Python installed, install the Hugging Face Hub client:
pip install huggingface_hub
You can find a friendly quick‑start guide here:
https://huggingface.co/docs/huggingface_hub/quick-start
(ii) Download the SEA‑LION Embedding Model
The default embedding model used by the demo is:aisingapore/SEA-LION-ModernBERT-Embedding-300M
From your terminal, run:
hf download aisingapore/SEA-LION-ModernBERT-Embedding-300M
This command uses the Hugging Face Hub client to download the model into your local cache. For more details on model downloads, see:
https://huggingface.co/docs/hub/models-downloading#using-the-hugging-face-client-library
Typical cache locations are:
- Linux / macOS:
~/.cache/huggingface/ - Windows (WSL2):
/home/<your-username>/.cache/huggingface/inside WSL2
(iii) Let Docker Reuse the Cache
Your docker-compose.yml is already set up to mount a Hugging Face cache folder into the container:
volumes:
- ${HF_CACHE_PATH:-~/.cache/huggingface}:/root/.cache/huggingface
You have two options:
- Use the default (if your cache is at
~/.cache/huggingface):
Do nothing; Docker will see and reuse the cache automatically. - Use a custom cache path:
Set HF_CACHE_PATH in your .env file to point to the directory you downloaded your model to, for example:HF_CACHE_PATH=/home/your-username/path/to/another/folder
With this in place, when the app container needs aisingapore/SEA-LION-ModernBERT-Embedding-300M, it will find it in the mounted cache instead of downloading it from scratch.
Step 4 – Quick Configuration via .env
In the project root, copy the example config file:
cp .env.example .env
Open .env in a text editor and update just a few key lines, depending on your chosen backend.
Example: Ollama Setup
For a typical local Ollama + SEA‑LION setup, you might use:
EMBEDDING_MODEL=aisingapore/SEA-LION-ModernBERT-Embedding-300M OPENAI_BASE_URL=http://host.docker.internal:11434/v1 OPENAI_API_KEY=ollama LLM_MODEL=aisingapore/Qwen-SEA-LION-v4-32B-IT CHROMA_HOST=chromadb CHROMA_PORT=8000
If you want to tune chunk sizes, retrieval parameters, or switch to SEA‑LION API or another provider, refer to the detailed explanations in the repository README:
https://github.com/aisingapore/sealion-embedding-demo
Step 5 – Start the Demo with Docker Compose
The demo is designed to run as two Docker services:
chromadb: a ChromaDB vector databaseapp: the Gradio web app using SEA‑LION embeddings and your chosen LLM backend
The docker-compose.yml (already included) looks like this:
services:
chromadb:
image: chromadb/chroma:1.5.4
volumes:
- chroma_data:/data
ports:
- "8000:8000"
healthcheck:
test: ["CMD-SHELL", "bash -c 'exec 3<>/dev/tcp/localhost/8000 && echo -e \"GET /api/v2/heartbeat HTTP/1.0\\r\Host: localhost\\r\\\r\\" >&3 && cat <&3 | grep -q nanosecond'"]
interval: 10s
timeout: 5s
retries: 3
start_period: 5s
app:
build: .
ports:
- "7860:7860"
volumes:
- ${HF_CACHE_PATH:-~/.cache/huggingface}:/root/.cache/huggingface
- ./documents:/app/documents
- ./sample_data:/app/sample_data:ro
env_file:
- .env
depends_on:
chromadb:
condition: service_healthy
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
chroma_data:
To start everything:
docker compose up --build -d
(or docker-compose up --build -d if your Docker still uses the older CLI)
Exclude the -d flag if you wish to view the logs, otherwise you may inspect them via container logs on Docker Desktop.
On first launch, Docker will:
- Build the app image
- Pull ChromaDB
- Download Python dependencies
- Download the SEA‑LION embedding model (if not already cached)
This can take a few minutes for the very first time. Subsequent starts will be much faster as certain steps are cached.
View the logs in your terminal via docker compose logs -f. Cancel it using Ctrl + C.
Step 6 – Open the Web App and Explore
When the app is ready, you’ll see log messages in your app Docker container including something like:
Running on local URL: http://0.0.0.0:7860
Open your browser and go to:
http://localhost:7860
You should see the SEA‑LION Embedding Demo UI, typically with tabs such as:
- Semantic Search
- Cross‑Lingual Similarity
- RAG Q&A
- Document Management / Re‑index
Here’s what you can do immediately:
(i) Try Multilingual Semantic Search
- Go to the Semantic Search tab.
- Use one of the built‑in sample documents (already mounted into the container).
- Type a query in English, Malay, Indonesian, Thai, etc.
- See the most relevant chunks, with scores and snippets.

(ii) Play with Cross‑Lingual Similarity
- Go to the Similarity tab.
- Enter two sentences in different languages.
- Check how similar they are according to SEA‑LION embeddings.

(iii) Ask Questions with RAG Q&A
- Open the RAG Q&A tab.
- Ask a natural‑language question.
- The app retrieves the top matching document chunks and passes them to your configured LLM.
- You get grounded answers, plus visibility into which sources were used.
It’s a quick way to show RAG in action and to demo how SEA‑LION behaves in everyday queries.

Step 7 – Bring Your Own Documents
The real magic happens when you connect SEA‑LION to your own content.
In the project folder, you’ll see a documents/ directory. It’s mounted into the container, so any files you drop there become candidates for indexing.
Supported formats include:
- .txt, .md, .rst
- .yaml, .yml, .json, .csv, .xml
- .html, .htm
- .pdf, .docx
To index your content:
- Copy your files into the documents/ folder.
- In the web UI, go to the Document Management or Re‑index tab.
- Click the button to re‑index.

In this example, ricefarmer_id.md was added to documents folder and added to the database
The app will:
- Read your files
- Split them into chunks
- Generate SEA‑LION embeddings
- Store them in your local ChromaDB instance
You can now use Semantic Search and RAG Q&A directly on your own data.
This makes it incredibly easy to build internal demos and proof‑of‑concepts for use cases like:
- Knowledge base search
- Policy / SOP Q&A
- Multilingual content discovery
For removal of files from your database, type the document name or use a wildcard to remove multiple documents.

Step 8 – Stop and Restart
To stop the app, run the following command in the app’s root folder:
docker compose down
This stops the containers but keeps:
- Your indexed vectors in the chroma_data Docker volume
- Your documents in the documents/ folder on your machine
So the next time you want to use the demo, it’s as simple as:
docker compose up -d
and then visiting http://localhost:7860 again.
Where to Go Next
The SEA‑LION Embedding Demo is intentionally simple, so you can focus on exploring the quality of multilingual search and RAG rather than wrestling with infrastructure.
From here, you might want to:
- Dive into the full README for further configuration, alternative LLM backends, and environment variables:
https://github.com/aisingapore/sealion-embedding-demo - Explore more about SEA‑LION models in the docs:
https://docs.sea-lion.ai - Get API keys and try the managed SEA‑LION API:
https://playground.sea-lion.ai/key-manager
