“Our goal is not to be the ‘best’ search vector search engine in the world. There are many vector search engines on the market right now – we want to be the most useful a; one that really solves a problem,” says Barcelona-based Nuclia CTO Ramon Navarro. Together with co-founder and CEO Eudald Camprubí, Navarro is building an AI-powered search engine and database designed for unstructured data that enables multilingual, multi-format searches not just for information, but for ” concepts” using natural language processing (NLP). This week, Nuclia landed a $5.4 million seed round from Crane Venture Partners and Ealai, and open source their new cloud-native database, NucliaDB (albeit cautiously under the strong copyleft GPLv3 Licence.)
We make Nuclia The battery latest “One to Watch” – a startup that really excites us for its potential.
The company offers an end-to-end Nuclei API – able to connect to any data source and automatically index its content, regardless of format or language – that allows developers to build search functions powered by AI on unstructured data. Underpinning this is the freely available NucliaDB database, which allows users to deploy their own vectorization and normalization algorithms while providing storage, indexing, and querying.
(Commercially, the idea is that users will use NucleiDB to store all their unstructured data; pay for the API and also, if desired, pay for NucliaDB-as-a-Service, hosted on a multicloud infrastructure. )
“Where is the data, what format is it and in what language – it’s a nightmare for most companies when trying to index and access this information,” says Camprubí, pointing out that 80 to 90 % of any organization’s data is unstructured, and spread across different sources: “We connect these data sources, we index the information, if it’s a video, we transcribe the video. And then we run our algorithms to extract named entities first. So we automatically detect people’s names, organization names, dates, quantities, lots of different things – then we index all the text, we index all the paragraphs,” the Nuclia CEO notes in a call. this week.
He adds: “And then once we have everything, we vectorize the information, that is to say, we transform the text into vectors, into numbers. We store everything in NucliaDB, which is the database we created, which offers vector search as well as text search, paragraph search and fuzzy search…”
(NucleiDB is written in Rust and Python and built on the Tantivy library. It is designed to run on Kubernetes, with eventual consistency transactions based on the Nats.io architecture and support for TiKV and Redis).
There is a growing demand for access to this unstructured organizational data and Nuclia wants organizations to build their own search engines and other applications alongside their technology.
“I was always missing a tool that helped me with search. Search is an extremely complex problem; no one imagines how complex it is until you need to build a search engine,” says Navarro, adding, “We don’t know what kind of software is going to be built on Nuclia, we can’t imagine how far people will go.
“But the more freedom you can give developers or companies to build their own system, the better – I’m not going to know and understand customer issues better than they do.”
Nuclia’s mixed open/proprietary model
Commercially, Nuclia’s “understanding API” will remain proprietary.
“People will pay by consuming [for] this understanding API and the training API…. So all the information you put into the database will be useful for training accurate models and for creating more information specifically for the database,” says Navarro, noting that having the open source code of Nuclia will make it easier to find and attract talented developers; whether as employees or users. (Nuclia will not retain user data processed by its API, only statistics for accounting and customers can use NucliaDB without using the API.)
This transparency offered by open source is essential, they say: “It’s not easy to find ex-Google search engineers, for example. Or it’s not easy to find people who really have a lot of experience in this world of research, and who can provide the value that we need right now. So for us, it’s good to show that we can do it, and that people can trust us, because we are a tool for developers. It’s hard to trust him if you can’t see what’s behind it.
Nuclia Secures $5.4M Seed Funding
“Nuclia has built something amazing. Imagine being taken to the exact time in a video or podcast, or the exact block in a PDF or presentation, that contains the content you’re looking for. And then go deeper, looking not only for content, but also for concepts,” says Aneel Lakhani, Venture Capital Partner at Crane – an early-stage venture capital fund, in a press release, adding: ““We believe that the explosion of unstructured data like audio and video will only continue.
“Nuclia is poised to support how engineers embed research into their applications and services and how modern enterprises unlock insights from unstructured data that simply isn’t accessible today.”
The scale of what Nuclia is trying to accomplish is impressive – and indeed, when discussing the company’s long-term ambitions, CEO Camprubí refers to both Elastic and Algolia as lodestars.
“Our efforts are not about building something super smart. It’s to do something super useful. And sometimes in tech, it’s hard, because you’re too excited about tech that you end up building something super complex for developers to use,” Navarro says, “We don’t want to not break the wall, we are not trying to fix everything. We want to have a tool so that people with words, with the way they speak, can find any material or information that is within their knowledge – even if it’s spoken, even if it’s written on an image, or even if it is a PDF or Word Document or other. And we’re super focused on this problem, to try to solve this problem.
Investing in developers – and users
Nuclia will use its new investment to expand the team.
Arguably unusually, says Camprubí, in addition to traditional domain-expert developers with research experience, Nuclia is also looking to attract “citizen developers” who may not have coding experience, but who build systems and applications using low-code or no-code tools. : “Our [focus in] coming months, in addition to offering this technology to pure developers and stabilizing everything, it is also starting to approach these other [citizen] developers who are a huge and growing market, and who are also offering Nuclia to them,” he says.
Although Nuclia is a completely distant organization in terms of hiring talent, the company still has a strong sense of place – both in terms of the tech community and its origins in Barcelona.
Navarro explains how the founders’ experience was vital in creating Nuclia: “We come from a wide world of eDiscovery on the Internet, building very large search engines with Elastic and many different search engines – it gave us all the knowledge and all the experience to understand what the pains of this system are.
Nuclia is the third company on which Camprubí and Navarro, who have known each other since childhood, have collaborated, having founded Iskra.cat, a technology agency, together 11 years ago, before both working at Onna.com as COO and CTO respectively during his early years: “It’s the best partnership, because I’m super techy and Eudald is super good at business and marketing. Because explaining what vector searching is is something that seems simple when you know, but when you don’t know what it is, it’s really complex,” explains Navarro.
Fly the flag of Barcelona
Camprubí also says how proud they are of their Barcelona heritage and the fact that Nuclia is able to fly the flag for Barcelona’s tech start-ups. He is keen to show that local start-ups can bring something really unique – something that is not currently happening: “[In Spain] we kinda lack the concept of inventing, of writing a white paper to invent something new, that’s what we’re trying. And that’s why for us it’s good to be from Barcelona, because inventing wheels from Barcelona is not that easy. We don’t copy anything from outside.
“But the challenge is not only within the local ecosystem, but also in the mindset of European buyers. Camprubí says too often that he sees organizations attracted to software developed outside their own country or region and neglecting local innovations: “’American software is always better than European software’. This is the conviction of many administrations. I hope that projects like ours [and other] deep technology developed in Barcelona, will help change this perception,” he concludes.