5th Languages Summit: Southeast Asia and India Power Up to Supercharge AI

On 22 July 2025, the 5th Languages Summit took place in Bangalore, India, jointly hosted by AI Singapore and Google. The event at Google Ananta brought together a vibrant community of innovators to shape the next chapter of AI in Southeast Asia and India.
Overview of the Summit
The 5th Languages Summit wasn’t just another summit; it was a powerful gathering of AI pioneers from Southeast Asia and India, uniting to tackle some of the most critical challenges in AI model training. Hosted by AI Singapore and Google in Bangalore on 22 July 2025 , the event was a pivotal moment dedicated to addressing Western-centric biases, data scarcity, and infrastructure hurdles in a nuanced Southeast Asian and Indian context. The summit was a vibrant showcase of innovation, focusing on three key areas: democratizing AI access, creating culturally sensitive evaluation methods, and unveiling next-generation large language models and AI agents.
AI model training struggles with complex data preparation, western-centric biases, and a lack of culturally relevant datasets in many parts of the world including Asia, alongside infrastructure and evaluation challenges. The Vertex Model Development Service (VMDS) addresses these gaps with an open-source, scalable framework that supports language-sensitive data synthesis, automated cluster management, and tailored experimentation. Also highlighted were Google’s advancements in agent-based systems, including tools like Agent Space, Agent Designer (low-code), and Google ADK for secure enterprise development. They also introduced Gemma 3n, a powerful new multimodal, multilingual model supporting 140 languages and 32,000-token contexts, featuring a “Matryoshka Transformer” architecture for flexible inference speeds, memory-efficient per-layer embeddings, KV cache sharing, and a novel residual connection called “Laurel” that significantly boosts inference speed and efficiency.
Highlights from the Summit
Presented by the Indian Team
- EkStep Foundation discussed their work on Voice AI in India. They believe Voice AI in Indic languages is a powerful tool to provide conversational access for citizens with low literacy or digital skills. The organization is involved in initiatives like “Jan Ki Baat” and “Assisted Language Learning” to empower people through AI in areas such as e-commerce, banking, agriculture, and government services.
- Josh Talks, a content platform, aims to inspire and upskill young people in India by providing content in over 10 vernacular languages. The majority of their audience is non-English speaking , and this strategy has led to significant viewership growth from that demographic. Their work in data collection is integral to their business model and their goal of reaching the “next billion users”
- Sarvam AI, a Generative AI startup, was chosen by the Government of India to build the country’s first homegrown, sovereign large language model (LLM) under the IndiaAI Mission. The company is developing models that are fluent in multiple Indian languages, support voice-based tasks, and are built to be secure and scalable for the entire population. They are collaborating with AI4Bharat at IIT Madras to create three different model variants.
- SoketAI outlined “Project Agni,” an initiative to build a 120-billion parameter model for Indian languages. Soket AI Labs, a startup chosen by the IT Ministry, is leading Project Agni to build India’s first open-source, 120-billion-parameter foundational model. The model is being optimized for India’s diverse languages and will be used in key sectors like defense, healthcare, and education. The company has also released the “Bhasha” series of datasets to support the development of AI models for Indian languages. They previously created “Pragna-1B,” an open-source multilingual model.
Presented by our SEA Partners
- Aquarium Update: Aquarium, a regional platform co-developed by AI Singapore, Google, and Project SEALD partners, was presented to attendees. It is designed to combat language data scarcity in Southeast Asia and beyond. Key updates included a dashboard view of datasets, access to over 300 datasets for various training stages, and an AI assistant that supports dataset searches in local languages.
- SEA-HELM: Developed by AI Singapore, SEA-HELM (Southeast Asian Holistic Evaluation of Language Models) is an evaluation framework and leaderboard that assesses large language models in Southeast Asian languages. The latest version supports 7 languages, including Indonesian, Tamil, Thai, Vietnamese, Filipino, Malay, and Burmese. It uses culturally relevant, professionally translated datasets to provide comprehensive metrics.
- BASAibu: The BASAibu platform empowers local language communities by creating a community-driven model to advance AI for underrepresented languages. The platform, which has been piloted in Indonesia with over 5 million users, sources high-quality data and engages users in civic issues and policy dialogue.
- WangchanLION-v3: A new large-scale, open-source Thai LLM was released. This model has a pretraining dataset of 47.4 billion tokens and was created using a Thai-specific data cleaning pipeline to ensure high quality.
Summit Attendees
The 5th Language Summit brought together a dynamic and passionate group of AI experts, researchers, and practitioners.
- Vistec: Sarana Nunatong
- BASAibu: Alissa Stern, Ita Ibnu, Ni Nyoman Clara Listya Dewi
- IIT Madras: Mitesh M. Khapra
- EkStep Foundation: Santosh Kevlani
- Digital India Bhashini Division (DIBD) and India AI: Amitabh Nag
- Josh Talks: Supriya Paul and Shobhit Banga
- ARTPARK: Prasanta Kumar Ghosh
- Sarvam AI: Sumanth Doddapaneni and Abhigyan Raman
- SoketAI: Abhishek Upperwal
To all our amazing attendees: thank you for your active participation! Your insights and contributions are directly helping us build a more representative AI landscape for Southeast Asia and India, and we can’t wait to see what we can achieve together.
Closing / Call to Action
The summit’s attendees, a dynamic and passionate group of AI experts, researchers, and practitioners, are directly helping to build a more representative AI landscape for Southeast Asia and India. The journey to build a more representative AI for Southeast Asia is just beginning, and you can be a part of it.
To learn more and get involved, visit the Project SEALD webpage. You can also contact seald@aisingapore.org to join the Aquarium journey or learn more.
