Through its collaboration with Google as part of Project Vaani, Bengaluru’s research university Indian Institute of Science is now open sourcing the first set of speech data comprising over 4,000 hours across 38 languages.
“Out of the 125 languages we committed to supporting, 75 had a zero data corpus which means the amount of data corpus available in digital form to AI researchers was zero. In the 4,000 hours of speech data that has been put out, for a few languages, it’s the first such instance that digital data has been made available. We could expect innovations in these zero corpus languages now,” Gupta said speaking on the challenges arising out of the diversity of Indian languages.
The company for the first-time hosted Google I/O connect for developers in India in Bengaluru on Wednesday. Senior Google executives including Ambarish Kenghe – Vice President, Product, Google Pay; Rahul Sukthankar – VP, Google Research; Will Grannis – VP & CTO, Google Cloud; Mathew McCollough – VP, Product, Android Developer; and Una Kravets, Developer Relations Engineer, counted among the speakers.
Google also unveiled a slate of AI tools and technologies for India to support innovation among local developers. While noting that there are already more than 60 generative AI startups in India, Kenghe said, helping developers build AI-powered products, Google is making its large language model accessible through PaLM API, MakerSuite, and features on Vertex AI.
Google is also releasing Open Buildings information of over 200 million buildings in the country to help organisations plan infrastructure projects.
Discover the stories of your interest