Close Menu
Startnet India
  • News
  • Stories
  • AI First
  • Insights
  • Startup 101

Subscribe to Updates

Get the latest creative news from StartNet about News and Insights.

What's Hot

Link Vào 188bet Chính Thức

August 1, 2025

Hướng Dẫn Cách Tải Ứng Dụng 188bet Cho Điện Thoại

August 1, 2025

Online Sportsbetting In Addition To Live Casino

August 1, 2025
Facebook X (Twitter) Instagram YouTube LinkedIn
Startnet India
Join Now
  • News
  • Stories
  • AI First
  • Insights
  • Startup 101
Startnet India
  • News
  • Stories
  • AI First
  • Insights
  • Startup 101
Home » Sarvam AI Launches India’s First Homegrown Multilingual Language Model for 10 Languages
Uncategorized

Sarvam AI Launches India’s First Homegrown Multilingual Language Model for 10 Languages

Revolutionary 2-billion parameter AI model tackles token inefficiency and poor data quality challenges in Indian language computing, outperforming larger international models.
startnetBy startnetOctober 25, 2024No Comments4 Views
Facebook Twitter LinkedIn WhatsApp Email

India’s artificial intelligence landscape has reached a significant milestone with Sarvam AI’s launch of Sarvam-1, the country’s first homegrown multilingual large language model (LLM). The model, built specifically for Indian languages, represents a major breakthrough in making advanced AI technology accessible to India’s diverse linguistic population.

Developed with 2 billion parameters, Sarvam-1 supports 10 major Indian languages – Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu – alongside English. The model was built from scratch using domestic AI infrastructure powered by NVIDIA H100 Tensor Core GPUs, in collaboration with key partners including NVIDIA, Yotta, and AI4Bharat.

The innovation addresses two critical challenges in Indian language computing: token inefficiency and poor data quality. Sarvam-1’s tokenizer has achieved remarkable efficiency with fertility rates of 1.4 to 2.1 tokens per word, a significant improvement over existing models that require 4-8 tokens for Indian languages. This enhancement leads to faster processing and more efficient language handling.

On the performance front, Sarvam-1 has demonstrated impressive results across various benchmarks. The model achieved an accuracy of 86.11 on the TriviaQA benchmark across Indic languages, substantially outperforming Llama-3.1 8B’s score of 61.47. Its performance on the IndicGenBench for cross-lingual tasks has also been noteworthy, achieving an average chrF++ score of 46.81 on Flores for English-to-Indic translation.

The model’s training corpus, Sarvam-2T, comprises approximately 2 trillion tokens, with content distributed across supported languages. Hindi constitutes about 20% of the dataset, while other languages share the remaining portion equally. The dataset also includes substantial English and programming language content, enabling strong performance across both monolingual and multilingual tasks.

Key Statistics:

  • 2 billion parameters in the model
  • 10 Indian languages supported plus English
  • 1.4-2.1 tokens per word efficiency rate
  • 4-6 times faster inference speed compared to larger models
  • 86.11 accuracy on TriviaQA benchmark
  • $41 million Series A funding secured in December 2023

The launch comes at a crucial time for India’s GenAI market, which is projected to grow at a CAGR of 48% between 2023 and 2030, potentially becoming a $17 billion opportunity. This development has significant implications for the Indian startup ecosystem, particularly in democratizing AI access across language barriers and establishing India’s capability to develop sophisticated AI models domestically.

Sarvam-1’s launch marks a turning point in India’s AI journey, demonstrating that carefully curated training data can yield superior performance even with modest parameter counts. As the model becomes available on Hugging Face, it opens new possibilities for developers and businesses to create language-inclusive applications, potentially transforming how millions of Indians interact with technology in their preferred languages.

AI Innovation Indian languages LLM multilingual model Natural Language Processing Sarvam-1 token efficiency
Previous ArticleNVIDIA’s Hindi AI Model Launch: Breaking Language Barriers in India’s Digital Revolution
Next Article OpenAI’s Rumored ‘Orion’ Model: What This Next-Generation AI Could Mean for the Industry
startnet

Related Posts

Топовые онлайн-казино 2025: играйте на финансы в топовых слотах

August 1, 2025

Азартный дом без учетной записи – мгновенный вход к азартным играм казино 7К

August 1, 2025

Азартный дом без аккаунта – мгновенный допуск к развлечениям On X casino

August 1, 2025

Исследование онлайн игорных заведений: основной портал и акционные предложения.

August 1, 2025
Leave A Reply Cancel Reply

Follow Us
  • Facebook
  • Twitter
  • Instagram
  • YouTube
Don't Miss

Link Vào 188bet Chính Thức

By Poonthamil KumaranAugust 1, 202500 Views

Get in to a broad variety regarding online games which includes Blackjack, Baccarat, Different Roulette…

Hướng Dẫn Cách Tải Ứng Dụng 188bet Cho Điện Thoại

August 1, 2025

Online Sportsbetting In Addition To Live Casino

August 1, 2025

Rzetelna Podest Hazardowa Internetowego W Polsce

August 1, 2025

Subscribe to Updates

Get the latest creative news from StartNet.

[sibwp_form id=2]
NEWS
  • Tamilnadu Startups
  • Indian Startups
  • Global Startups
Stories
  • Founder Stories
  • Innovation & Impact
  • Funding Stories
  • Women in Tech
AI First
  • AI Startups
  • AI Technology
  • AI Funding
  • AI Resources
Insights
  • SaaS & Tech
  • Fintech & Commerce
  • Healthcare & Biotech
  • Emerging Sectors
Startup 101
  • Getting Started
  • Growth & Scale
  • Funding Guide
  • Ecosystem Connect
Facebook X (Twitter) Instagram YouTube LinkedIn
  • Terms of Use
  • Privacy Policy
  • Refund Policy
  • Disclaimer
  • Contact Us
© 2025 Startnet Ventures Private Limited. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?