Archival AI
Overview
In Southeast Asia, vast archives of historical photographs remain under-annotated and inaccessible. These images, rich with cultural and historical data, are often labeled with only basic information, leaving their deeper stories untold. Singapore’s National Archives alone holds over six million images, yet most have only superficial descriptions in their databases. This lack of detailed metadata makes it extremely difficult for researchers and the public to discover and engage with this vital visual heritage. Furthermore, existing AI models, predominantly trained on Western data, often fail to interpret the unique cultural and historical nuances of these images, providing generic or inaccurate descriptions.
Our solution, Archival AI, is a collaborative platform that bridges this gap by pairing advanced AI with human expertise. It transforms how we document and interact with visual heritage, making it more discoverable and meaningful for everyone.
Solution
Archival AI leverages powerful Vision-Language Models (VLMs), including AI Singapore’s SEA-LION v4, to automatically generate initial annotations for historical photographs. These AI-generated drafts—which include descriptive captions, keyword tags, and contextual information—are then passed to a human-in-the-loop workflow. Here, historians, archivists, and community members can review, fact-check, and enrich the annotations with their specialized knowledge.
Our approach is unique due to its dual focus on practical application and fundamental research. Key features include:
- Human-AI Collaboration: We combine the speed and scale of AI with the precision and interpretive depth of human experts, ensuring that annotations are both rich and accurate.
- Cultural AI Benchmarking: We are developing the first public benchmark dataset specifically designed to evaluate and improve the “cultural intelligence” of AI models on Southeast Asian heritage imagery.
- Multilingual Support: The platform provides annotations in multiple languages, including English, Chinese, and Malay, to broaden public access.
- Open and Collaborative: We are building Archival AI as an open-source tool and are actively partnering with libraries, archives, and museums to co-develop a platform that meets their needs.
This human-centric approach not only scales up the process of heritage documentation but also creates a valuable, structured knowledge base for future research. Behind the scenes, our platform uses AWS to deploy different open-source models including SEA-LION’s API to generate descriptive text. All versions of AI annotations and human revisions are saved to grow our cultural knowledge base, which will be published as datasets to improve the cultural understanding of current AI models and to test their performance.
Conclusion
Our prototype for Archival AI won First Place (Public Sector) in the 2025 AI Singapore Pan-SEA AI Challenge, demonstrating the feasibility and promise of our approach. This achievement also earned us USD $10,000 in AWS cloud credits to support our continued development. Building on this success, we have recently been awarded the Heritage Research Grant (HRG) from the Singapore National Heritage Board (S$118,500), which will fund the full-scale development of our platform over the next two years.
We are currently partnered with NUS Libraries, which has provided access to its extensive postcard collection for our pilot phase. Our initial trials have already shown that our system can uncover granular details—from architectural styles like Straits Eclectic pilasters and Fujian “swallow-tail” roof ridges to social practices like laundry poles protruding from upper windows—that were previously unindexed in archival records.
Moving forward, we plan to:
- Expand our partnerships with more museums, archives, and libraries across Southeast Asia to co-develop and deploy the platform.
- Publish our annotated datasets as open-source benchmarks to drive global research into more culturally aware AI.
- Incorporate community contributions to capture and preserve local stories and intangible heritage connected to the images.
Our long-term vision is for Archival AI to become the standard tool for heritage institutions across the region, modernizing how we preserve, share, and connect with our rich cultural heritage.
About the team
- Lin Du (Principal Investigator): Assistant Professor jointly appointed in the Departments of Chinese Studies and Japanese Studies at the National University of Singapore. She completed her PhD at the Department of Asian Languages and Cultures at UCLA. Lin holds an MA from the Regional Studies East Asia Program at Harvard University and a BA in Chinese Language and Literature from Peking University. Her pioneering work in machine learning has been published in the ACM Journal on Computing and Cultural Heritage (JOCCH), and her contributions to humanities research are published or forthcoming in the Journal of Chinese Cinemas and Asia Pacific Perspectives.
- Niharika Shrivastava (Co-Investigator): A Senior Research Engineer at the Singapore Institute of Technology (SIT)’s Applied AI Centre. She is an expert in building complex LLM and multimodal AI systems and has extensive experience in developing AI-driven platforms.
