Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
2025 is anticipated to be the year AI gets real, bringing specific, tangible benefit to enterprise.ย
However, according to a new State of AI Development Report from AI development platform Vellum, weโre not quite there yet: Just 25% of enterprises have deployed AI into production, and only a quarter of those have yet to see measurable impact.ย
This seems to indicate that many enterprises have not yet identified viable use cases for AI, keeping them (at least for now) in a pre-build holding pattern.ย
โThis reinforces that itโs still pretty early days, despite all the hype and discussion thatโs been happening,โ Akash Sharma, Vellum CEO, told VentureBeat. โThereโs a lot of noise in the industry, new models and model providers coming out, new RAG techniques; we just wanted to get a lay of the land on how companies are actually deploying AI to production.โ
Enterprises must identify specific use cases to see success
Vellum interviewed more than 1,250 AI developers and builders to get a true sense of whatโs happening in the AI trenches.ย
According to the report, the majority of companies still in production are in various stages of their AI journeys โ building out and evaluating strategies and proofs of concept (PoC) (53%) beta testing (14%) and, at the lowest level, talking to users and gathering requirements (7.9%).ย
By far, enterprises are focused on building document parsing and analysis tools and customer service chatbots, according to Vellum. But they are also interested in applications incorporating analytics with natural language, content generation, recommendation systems, code generation and automation and research automation.
So far, developers report competitor advantage (31.6%), cost and time savings (27.1%) and higher user adoption rates (12.6%) as the biggest impacts theyโve seen so far. Interestingly, though, 24.2% have yet to see any meaningful impact from their investments.ย
Sharma emphasized the importance of prioritizing use cases from the very start. โWeโve anecdotally heard from people that they just want to use AI for the sake of using AI,โ he said. โThereโs an experimental budget associated with that.โย
While this makes Wall Street and investors happy, it doesnโt mean AI is actually contributing anything, he pointed out. โSomething generally everyone should be thinking about, is, โHow do we find the right use cases? Usually, once companies are able to identify those use cases, get them into production and see a clear ROI, they get more momentum, they get past the hype. That results in more internal expertise, more investment.โย
OpenAI still at the top, but a mixture of models will be the future
When it comes to models used, OpenAI maintains the lead (no surprise there), notably its GPT 4o and GPT 4o-mini. But Sharma pointed out that 2024 offered more optionality, either directly from model creators or through platform solutions like Azure or AWS Bedrock. And, providers hosting open-source models such as Llama 3.2 70B are gaining traction, too โ such as Groq, Fireworks AI and Together AI.
โOpen Source models are getting better,โ said Sharma. โClosed source competitors to OpenAI are catching up in terms of quality.โ
Ultimately, though, enterprises arenโt going to just stick with just one model and thatโs it โ they will increasingly lean on multi-model systems, he forecasted.ย
โPeople will choose the best model for each task at hand,โ said Sharma. โWhile building an agent, you might have multiple prompts, and for each individual prompt the developer will want to get the best quality, lowest cost and lowest latency, and that may or may not come from OpenAI.โ
Similarly, the future of AI is undoubtedly multimodal, with Vellum seeing a surge in adoption of tools that can handle a variety of tasks. Text is the undisputed top use case, followed by file creation (PDFs or Word) images, audio and video.ย
Also, retrieval-augmented generation (RAG) is a go-to when it comes to information retrieval, and more than half of developers are using vector databases to simplify search. Top open-source and proprietary models include Pinecone, MongoDB, Quadrant, Elastic Search, PG vector, Weaviate and Chroma.ย
Everyoneโs getting involved (not just engineering)
Interestingly, AI is moving beyond just IT and becoming democratized across enterprises (akin to the old โit takes a villageโ). Vellum found that while engineering was most involved in AI projects (82.3%), they are being joined by leadership and executives (60.8%), subject matter experts (57.5%), product teams (55.4%) and design departments (38.2%).ย
This is largely due to the ease of use of AI (as well as the general excitement around it), Sharma noted.ย
โThis is the first time weโre seeing software being developed in a very, very cross functional way, especially because prompts can be written in natural language,โ he said. โTraditional software usually tends to be more deterministic. This is non-deterministic, which brings more people into the development fold.โ
Still, enterprises continue to face big challenges โ notably around AI hallucinations and prompts; model speed and performance; data access and security; and getting buy-in from important stakeholders.ย
At the same time, while more non-technical users are getting involved, there is still a lack of pure technical expertise in-house, Sharma pointed out. โThe way to connect all the different moving parts is still a skill that not that many developers have today,โ he said. โSo thatโs a common challenge.โ
However, many existing challenges can be overcome by tooling, or platforms and services that help developers evaluate complex AI systems, Sharma pointed out. Developers can perform tooling internally or with third-party platforms or frameworks; however, Vellum found that nearly 18% of developers are defining prompts and orchestration logic without any tooling at all.ย
Sharma pointed out that โlack of technical expertise becomes easier when you have proper tooling that can guide you through the development journey.โ In addition to Vellum, frameworks and platforms used by survey participants include Langchain, Llama Index, Langfuse, CrewAI and Voiceflow.
Evaluations and ongoing monitoring are critical
Another way to overcome common issues (including hallucinations) is to perform evaluations, or use specific metrics to test the correctness of a given response. โBut despite that, [developers] are not doing evals as consistently as they should be,โ said Sharma.ย
Particularly when it comes to advanced agentic systems, enterprises need solid evaluation processes, he said. AI agents have a high degree of non-determinism, Sharma pointed out, as they call external systems and perform autonomous actions.
โPeople are trying to build fairly advanced systems, agentic systems, and that requires a large number of test cases and some sort of automated testing framework to make sure it performs reliably in production,โ said Sharma.ย
While some developers are taking advantage of automated evaluation tools, A/B testing and open-source evaluation frameworks, Vellum found that more than three-quarters are still doing manual testing and reviews.ย
โManual testing just takes time, right? And the sample size in manual testing is usually much lower than what automated testing can do,โ said Sharma. โThere might be a challenge in just the awareness of techniques, how to do automated, at-scale evaluations.โ
Ultimately, he emphasized the importance of embracing a mix of systems that work symbiotically โ from cloud to application programming interfaces (APIs). โConsider treating AI as just a tool in the toolkit and not the magical solution for everything,โ he said.
source: https://venturebeat.com/ai/early-days-for-ai-only-25-of-enterprises-have-deployed-few-reap-rewards/


