SEA-LION v3.5 and Updated v3: Enhanced Language Models for Southeast Asia
We are proud to launch SEA-LION v3.5, our first set of hybrid reasoning models trained on Southeast Asian data. Mode selection is managed through the tokenizer’s chat template and offers versatile functionality, handling both complex reasoning tasks and general text generation.
Trained off our SEA-LION v3, SEA-LION v3.5 is explicitly enhanced for reasoning tasks with the inclusion of thinking blocks, continuing our mission to create language models that understand and respond with greater cultural awareness and depth across Southeast Asia.
At the same time, we have also updated SEA-LION v3, incorporating more data and techniques used in post-training v3.5.
New SEA-LION v3.5 Variants
| Llama-SEA-LION-v3.5-8B-R | Llama-SEA-LION-v3.5-70B-R | |
|---|---|---|
| Architecture | Based on Llama 3.1 8B | Based on Llama 3.1 70B |
| Parameters | 8 billion | 70 billion |
| Context Length | 128K | |
| Performance | Cost-efficient, low-latency applications where regional language support and advanced reasoning is critical. | Offers superior performance metrics compared to its contemporaries, and suited for knowledge-intensive tasks, where advanced reasoning and nuanced language comprehension are essential. |
Updated SEA-LION v3
| Llama-SEA-LION-v3-8B-IT | Llama-SEA-LION-v3-70B-IT | Gemma-SEA-LION-v3-9B-IT | |
|---|---|---|---|
| Architecture | Based on Llama 3.1 8B | Based on Llama 3.1 70B | Based on Gemma 2 |
| Parameters | 8 billion | 70 billion | 9 billion |
| Context Length | 128K | 8192 | |
| Performance | Ideal for applications requiring contextual understanding and long-form content processing. | Offers superior performance metrics with comparable results to GPT-4o, outperforming models such as Deepseek R1 and GPT-4o-mini. | Best performing model that is small (<10B), ideal for very lightweight tasks. |
Qualitative updates for SEA-LION v3.5
Technical Specifications
Post-Training
Both reasoning models, SEA-LION-R, were trained with an additional series of supervised fine-tuning (SFT) atop our existing SEA-LION-IT models, which have demonstrated top performance on SEA-HELM. Multiple stages of SFT was conducted, culminating in a final tune with distilled reasoning data of 1.5M traces from Deepseek-R1 across multiple SEA languages such as Indonesian Tamil, Thai, Tagalog and Vietnamese. This allows SEA-LION-R to provide more nuanced, elaborate and complete responses.
A distinctive feature of SEA-LION-R is its dynamic reasoning toggle. By default, the model operates in a detailed reasoning mode, thoughtfully guiding users through step-by-step solutions. Users retain full control, easily switching reasoning mode off using customizable chat template configurations, allowing concise interactions suitable for straightforward queries. During the tuning process, reasoning and non-reasoning data were simultaneously incorporated, resulting in a versatile model adaptable to varied user needs.
We also scaled up our instruction set to 30M instructions (across a training time of a month on a single node for the 70B), incorporating the latest in open-source alongside multiple rounds of synthetic aggregation and rewrite, improving the quality of its responses and leaning the model towards accounting for our region’s unique cultural diversity and history. It comprises a mix of curated publicly available open source data, synthetic generations from stronger models and handwritten instructions centered around Southeast Asian culture (particularly from Project SEALD), general multilingual instruction-following and chat prompt-response pairs.
SEA-LION-R training uniquely emphasises region-specific data aggregation and synthetic instruction generation, undergoing multiple refinement cycles and model merging to enhance multilingual proficiency and reasoning capabilities, ensuring exceptional performance across both complex and general-purpose tasks. This ensures that SEA-LION v3.5 maintains its superior performance while mitigating issues like catastrophic forgetting.
Multilingual Proficiency
SEA-LION v3.5 builds on SEA-LION v3’s existing multilingual capabilities, supporting 13 languages across Southeast Asia – including regional languages like Javanese and Sundanese. By leveraging on SEA-LION v3’s strong foundation, SEA-LION v3.5 ensures broader accessibility and usability, empowering diverse communities and use cases throughout the region.
Evaluation Metrics
SEA-LION v3.5 variants have been rigorously evaluated using both English and Southeast Asian benchmarks:
- English Evaluation: Utilises tasks from the Open LLM Leaderboard v2, including IFEval, MMLU-PRO, MuSR, and others.
- Southeast Asian Evaluation: Employs SEA-HELM, which covers languages such as Filipino, Indonesian, Tamil, Thai, and Vietnamese, and includes tasks such as summarisation, toxicity detection, SEA-IFEval, SEA-MTBench, and more.
Based on SEA-HELM’s holistic evaluation suite tailored for the SEA region:
- SEA-LION v3.5 is on par with GPT-4o and Deepseek-671B-R1, and outperforms models such as GPT-4o-mini, Llama 3.3 70B Instruct, and Qwen 2.5 72B, establishing new standards for regional AI reasoning capabilities.
- The updated SEA-LION v3 achieves like results with GPT 4o, and outperforms models such as DeepSeek R1, GPT-4o-mini, Llama 3.3 70B Instruct, and Qwen 2.5 72B.
See our Leaderboard for details.
Limitations: SEA-HELM was designed to provide quantitative and aggregated evaluations of model performance based on specific tasks, the current metrics do not fully capture the qualitative improvements of the more nuanced, elaborate and complete responses that come with complex new reasoning models. SEA-HELM will be continuously improved with new and better metrics to more accurately capture the performances of all model types.
Accessibility and Availability
SEA-LION v3.5 and v3 variants are open-source and freely available for research and commercial use. Developers and enterprises can immediately access the models on platforms such as Hugging Face, AWS Bedrock and GCP Vertex with availability on Kaggle and Ollama rolling out in the coming days. You can also interact with our models on our Playground and on Telegram!
Hugging Face
Llama-SEA-LION-v3.5-8B-R
Llama-SEA-LION-v3.5-8B-R-GGUF
Llama-SEA-LION-v3.5-70B-R
Llama-SEA-LION-v3.5-70B-R-GGUF
Llama-SEA-LION-v3-8B-IT
Llama-SEA-LION-v3-70B-IT
Gemma-SEA-LION-v3-9B-IT
Ollama
Llama-SEA-LION-v3.5-8B-R
Llama-SEA-LION-v3.5-70B-R
Llama-SEA-LION-v3-8B-IT
Llama-SEA-LION-v3-70B-IT
Gemma-SEA-LION-v3-9B-IT
AWS Bedrock
GCP Vertex, and Kaggle [Coming Soon]
Other releases:
Along with SEA-LION v3.5, we are also thrilled to share the release of:
- SEA-LION paper
- First part of SEA-PILE v2:
Over 120 Billion tokens of diverse, multilingual content from across Southeast Asia – open for all to build and fine-tune on.
Conclusion:
The SEA-LION team is continuously looking to improve and uplift the AI community for SEA. We acknowledge that SEA-LION v3 currently outperforms SEA-LION v3.5 on our leaderboard, but this is just the start. We encourage users to try out our models and are more than happy to receive feedback, which will benefit the community. Do stay tuned for our upcoming updates and releases!
Acknowledgments
We extend our gratitude to our partners and collaborators across Southeast Asia.
AI Singapore is a national programme supported by the National Research Foundation, Singapore and hosted by the National University of Singapore. Any opinion, finding, conclusion or recommendation expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore, or the National University of Singapore.
We also grateful for the support of the Infocomm Media Development Authority (IMDA) of Singapore.
