SEA-LION v3: 128K Context Length and 70B Models
We are excited to announce the release of two new variants for SEA-LION v3, our latest large language models tailored specifically for Southeast Asian languages. Building upon Meta’s Llama and SEA-LION’s data, these variants have strong capabilities in handling diverse linguistic and cultural nuances inherent to Southeast Asian region.
New SEA-LION v3 Variants
1. SEA-LION v3 8B
|
Architecture 983_58c0d8-9b> |
Based on Llama 3.1 8B 983_d08684-47> |
|
Parameters 983_b67f09-eb> |
8 billion 983_ff0812-48> |
|
Context Length / Performance 983_d6d343-2e> |
Features a large context length of 128K tokens, enabling the model to handle extensive and complex dialogues effectively. 983_30c5d3-fd> |
|
Use Case 983_7f7415-f3> |
Ideal for applications requiring deep contextual understanding and long-form content processing. 983_9f49ae-30> |
2. SEA-LION v3 70B
|
Architecture 983_d0970d-a8> |
Based on Llama 3.1 70B 983_eeb641-a7> |
|
Parameters 983_fb3123-18> |
70 billion 983_c031d8-9b> |
|
Context Length / Performance 983_31fee8-7f> |
Our largest model to date (as of Dec 2024), also with 128K context length, offering superior performance metrics compared to its predecessors and contemporaries. 983_f3d5ed-a8> |
|
Use Case 983_14aeca-79> |
Suited for high-demand environments where advanced reasoning and comprehensive language comprehension are essential. 983_b4226d-a2> |
Technical Enhancements in SEA-LION v3
Continued Pre-Training
Both variants underwent continued pre-training on Llama 3.1 using an additional 200 billion tokens of Southeast Asian data. This extensive training enhances the models’ understanding of regional languages and cultural contexts, resulting in significant performance boosts in languages such as Thai, Vietnamese, Tamil, and Indonesian.
Post-Training
Both variants undergo supervised fine-tuning (SFT) in two stages:
- Stage 1: Focuses on math and reasoning instructions using approximately 9.5 million instructions, predominantly in English.
- Stage 2: Emphasizes chat and instruction-following tasks with around 7.3 million instructions, including a substantial portion in Southeast Asian languages.
This fine-tuning process, combined with model merging techniques, ensures that SEA-LION v3 maintains its superior performance while mitigating issues like catastrophic forgetting.
Multilingual Proficiency
SEA-LION v3 supports up to 13 languages, including newly added languages like Javanese and Sundanese. This multilingual capability ensures that our models can cater to a wide array of Southeast Asian languages, fostering greater accessibility and usability across the region. In addition, our experiments show some modest cross-lingual transfer which help languages that are not as well representated in digital data.
Training Infrastructure
- Hardware: Utilized MosaicML Composer on AWS p5e.48xlarge and SingTel HGX-100 instances equipped with Nvidia H200 and H100 GPUs.
- Training Duration: The 8B variant was trained for approximately 136 hours, while the 70B variant underwent 495 hours.
- Configuration: Both models employ bfloat16 precision, decoupled_adamw optimizer, and a global batch size of 512.
Evaluation Metrics
SEA-LION v3 variants have been rigorously evaluated using both English and Southeast Asian benchmarks:
- English Evaluation: Utilizes tasks from the Open LLM Leaderboard v2, including MMLU-PRO, MUSR, and others.
- Southeast Asian Evaluation: Employs SEA-HELM metrics covering sentiment analysis, toxicity detection, causal reasoning, and more, tailored to regional languages.
Based on these narrow benchmarks, SEA-LION v3 outperforms many open source and even larger models like Llama 3.3 70B Instruct in several benchmarks, establishing new standards for regional AI capabilities. See our Leaderboard for details.
Accessibility and Availability
All SEA-LION v3 variants are open-source and freely available for research and commercial use. Developers and enterprises can immediately access the models on platforms such as Hugging Face, Kaggle, and Ollama.
Hugging Face
Gemma-SEA-LION-v3-9B
Gemma-SEA-LION-v3-9B-IT
Gemma-SEA-LION-v3-9B-IT-GGUF
Llama-SEA-LION-v3-8B
Llama-SEA-LION-v3-8B-IT
Llama-SEA-LION-v3-8B-IT-GGUF
Llama-SEA-LION-v3-70B
Llama-SEA-LION-v3-70B-IT
Llama-SEA-LION-v3-70B-IT-GGUF
Ollama
Gemma-SEA-LION-v3-9B-IT
Llama-SEA-LION-v3-8B-IT
Llama-SEA-LION-v3-70B-IT
Kaggle
Gemma-SEA-LION-v3-9B
Gemma-SEA-LION-v3-9B-IT
Llama-SEA-LION-v3-8B
Llama-SEA-LION-v3-8B-IT
Partner Models
We are also happy to share a few models built by our partners and collaborators in the region:
|
Indonesia 983_b7e4a1-70> | 983_286662-1f> |
|
Thailand 983_6028bc-11> | 983_87bcfb-a3> |
Acknowledgments
We extend our gratitude to our partners and collaborators across Southeast Asia.

We are also grateful for the support of the Infocomm Media Development Authority (IMDA) of Singapore.
