Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
AI development is akin to the early wild west days of open source โ models are being built on top of each other, cobbled together with different elements from different places.ย
And, much like with open-source software, this presents problems when it comes to visibility and security: How can developers know that the foundational elements of pre-built models are trustworthy, secure and reliable?
To provide more of a nuts-and-bolts picture of AI models, software supply chain security company Endor Labs is today releasing Endor Labs Scores for AI Models. The new platform scores the more than 900,000 open-source AI models currently available on Hugging Face, one of the worldโs most popular AI hubs.ย
โDefinitely weโre at the beginning, the early stages,โ George Apostolopoulos, founding engineer at Endor Labs, told VentureBeat. โThereโs a huge challenge when it comes to the black box of models; itโs risky to download binary code from the internet.โ
Scoring on four critical factors
Endor Labsโ new platform uses 50 out-of-the-box metrics that score models on Hugging Face based on security, activity, quality and popularity. Developers donโt have to have intimate knowledge of specific models โ they can prompt the platform with questions such as โWhat models can classify sentiments?โ โWhat are Metaโs most popular models?โ or โWhat is a popular voice model?โ
The platform then tells developers how popular and secure models are and how recently they were created and updated.ย
Apostolopoulos called security in AI models โcomplex and interesting.โ There are numerous vulnerabilities and risks, and models are susceptible to malicious code injection, typosquatting and compromised user credentials anywhere along the line.ย
โItโs only a matter of time as these things become more widespread, we will see attackers all over the place,โ said Apostolopoulos. โThere are so many attack vectors, itโs difficult to gain confidence. Itโs important to have visibility.โ
Endor โwhich specializes in securing open-source dependencies โ developed the four scoring categories based on Hugging Face data and literature on known attacks. The company has deployed LLMs that parse, organize and analyze that data, and the companyโs new platform automatically and continuously scans for model updates or alterations.ย
Apostolopoulos said additional factors will be taken into account as Endor collects more data. The company will also eventually expand to other platforms beyond Hugging Face, such as commercial providers including OpenAI.ย
โWe will have a bigger story about the governance of AI, which is becoming important as more people start deploying it,โ said Apostolopoulos.ย
AI on a similar path as open-source development โ but itโs much more complicated
There are many parallels between the development of AI and the development of open-source software (OSS), Apostolopoulos pointed out. Both have a multitude of options โ as well as numerous risks. With OSS, software packages can introduce indirect dependencies that hide vulnerabilities.ย
Similarly, the vast majority of models on Hugging Face are based on Llama or other open source options. โThese AI models are pretty much dependencies,โ said Apostolopoulos.ย
AI models are typically built on, or are essentially extensions of, other models, with developers fine-tuning to their specific use cases. This creates what he described as a โcomplex dependency graphโ that is difficult to both manage and secure.
โAt the bottom somewhere, five layers deep, there is this foundation model,โ said Apostolopoulos. Getting clarity and transparency can be difficult, and the data that is available can be convoluted and โquite painfulโ for people to read and understand. Itโs hard to determine what exactly is contained in model weights, and there are no lithographic ways to ensure that a model is what it claims to be, is trustworthy, as advertised and that it doesnโt produce toxic content.ย
โBasic testing is not something that can be done lightly or easily,โ said Apostolopoulos. โThe reality is there is very little and very fragmented information.โ
While itโs convenient to download open source, itโs also โextremely dangerous,โ as malicious actors can easily compromise it, he said.ย
For instance, common storing formats for model weights can allow arbitrary code execution (Or when an attacker can gain access and run any commands or code that they please). This can be particularly dangerous for models built on older formats such as PyTorch, Tensorflow and Keras, Apostolopoulos explained. Also, deploying models may require downloading other code that is malicious or vulnerable (or that can attempt to import dependencies that are). And, installation scripts or repositories (as well as links to them) can be malicious.ย
Beyond security, there are numerous licensing obstacles, too: Similar to open-source, models are governed by licenses, but AI introduces new complications because models are trained on datasets that have their own licenses. Todayโs organizations must be aware of intellectual property (IP) used by models as well as copyright terms, Apostolopoulos emphasized.ย
โOne important aspect is how similar and different these LLMs are from traditional open source dependencies,โ he said. While they both pull in outside sources, LLMs are more powerful, larger and made up of binary data.ย
Open-source dependencies get โupdates and updates and updates,โ while AI models are โfairly staticโ โ when theyโre updated, โyou most likely wonโt touch them again,โ said Apostolopoulos.ย
โLLMs are just a bunch of numbers,โ he said. โTheyโre much more complex to evaluate.โย
source: https://venturebeat.com/security/what-open-source-ai-models-should-your-enterprise-use-endor-labs-analyzes-them-all/


