The Gemini 1.5 Flash-8B, the newest member of Gemini’s family of artificial intelligence (AI) models, is now generally available for production use. Google announced the general availability of the model on Thursday, noting that it is a smaller and faster version of the Gemini 1.5 Flash that was introduced at Google I/O. Because it is fast, it has low inference latency and more efficient output generation. More importantly, the tech giant stated that the Flash-8B AI model is “the lowest cost per intelligence of any Gemini model.”
The Gemini 1.5 Flash-8B is now generally available
In a developer blog post, the Mountain View-based tech giant detailed the new AI model. The Gemini 1.5 Flash-8B evolved from the Gemini 1.5 Flash AI model, which was focused on faster processing and more efficient output generation. The company now claims that Google DeepMind has developed this even smaller and faster version of the AI model over the past few months.
Despite being a smaller model, the tech giant claims it “almost matches” the 1.5 Flash’s performance on multiple benchmarks. Some of these include chat, transcription and long context language translation.
One of the main advantages of the AI model is its cost effectiveness. Google said Gemini 1.5 Flash-8B will offer the lowest token price in the Gemini family. Developers will have to pay $0.15 (approx. Rs. 12.5) for one million exit tokens, $0.0375 (approx. Rs. 3) for one million input tokens and $0.01 (approx. Rs. 0.8) for one million tokens on cached queries.
Additionally, Google is doubling the speed limits of the 1.5 Flash-8B AI model. Now, developers can send up to 4000 requests per minute (RPM) while using this model. Explaining the decision, the technology giant stated that the model is suitable for simple, high-volume tasks. Developers who want to try the model can do so for free through Google AI Studio and the Gemini API.