Hugging Face Has Become a Malware Magnet

Hosting over 1 million AI models, the site is attracting cyberthreats

Hugging Face Has Become a Malware Magnet

* Watch the Video Blog *

The growth of Hugging Face is remarkable and mirrors the explosion of AI usage and business applications. This eight-year-old startup began as a teenage-focused chatbot app (hence the adorable name) but did a hard pivot in 2018 to become a platform for machine learning. It’s now raised $400 million to date and has been dubbed the Github for AI researchers. 

A graph with numbers and a lineDescription automatically generated

The number of AI models and datasets hosted on Hugging Face has tripled in the last year, now topping 1 million. Data scientists and other AI builders love to download models, test new ideas, and share their innovations with this open-source community.

But of course, there’s a catch. The popularity of the site, and general lack of AI governance in most organizations, make Hugging Face hugely attractive to bad actors, embedding malware and phishing users with fake corporate identities.

This week it was reported in Forbes, that researchers from multiple security firms have identified over 3,000 instances of malicious models. These malicious files can compromise users by injecting harmful instructions into downloaded models, making detection difficult.

In other cases, hackers have used tactics such as creating fake profiles posing as trusted companies, such as Meta or 23AndMe, to deceive users into downloading infected models. One specific incident involved a fake model posing as 23AndMe, which, when downloaded, searched for Amazon Web Services (AWS) passwords, potentially allowing attackers to misuse cloud resources.

Hugging Face has been making efforts to address security, but the sheer scale of the problem makes it challenging. The site has integrated commercial scanning tools into its platform, helping users detect malicious code before downloading any models, and has started verifying the profiles of major tech companies like OpenAI and Nvidia to increase trust in the platform.

However, security is significantly trailing behind the explosive growth of AI usage. Surveys of enterprises have found that while over 80% of enterprises are using or experimenting with AI applications, more than 90% feel they are unprepared for AI security challenges.

Hugging Face CTO Julien Chaumond acknowledged the changing dynamics of AI in the Forbes article. “For a long time, AI was a researcher’s field and the security practices were quite basic,” said Chaumond. “As our popularity grows, so does the number of potentially bad actors who may want to target the AI community.”

The rise in cyberattacks targeting AI models underscores the need for stronger security measures in the AI ecosystem. In fact, cybersecurity agencies from the U.S., the U.K., and Canada have jointly issued warnings, advising businesses to carefully scan any pre-trained models for harmful code. 

Pointing fingers at Hugging Face is unfair, and trying to prevent researchers from using the site would be misguided and simply won’t work. AI has far too much momentum, and for every concerned CISO, there are dozens of line-of-business owners chomping at the bit to develop and launch AI applications to keep them competitive.

This discovery highlights the broader challenge of securing AI adoption, a key concern for many enterprises. In the evolving landscape of AI, protecting models and ensuring the safety of cloud-based resources is crucial for preventing breaches that could undermine trust in AI technologies, not to mention causing major security and data breaches. As the stakes go up for professionals and vendors, companies like AppSOC are developing capabilities to provide defense-in-depth for AI assets across the ML-Ops and DevOps lifecycle. AppSOC has launched multiple capabilities to help organizations gain visibility and get ahead of this next generation of cyber threats. We recommend an incremental approach to help organizations begin this process with the following steps:

AI Discovery  

The first step is for businesses to identify, categorize, and monitor usage of AI models, datasets, notebooks, or other components that may be in use within the organization. AppSOC scans ML-Ops environments and provides a comprehensive inventory of all AI components, referencing a knowledge base of models from Hugging Face and other sources. This feature provides organizations with visibility and control over what models are deployed.

AI Security Posture Management  

AppSOC provides deep integration with LLM-Ops platforms and automatically detects dangerous misconfigurations while ensuring proper access controls and permissions. This can help detect model theft, asset leaks, data poisoning, malicious libraries, or software supply chain vulnerabilities. This helps organizations avoid threats like the ones posed by fake models masquerading as trusted brands (e.g., 23AndMe) that attempt to steal cloud credentials or misuse computational resources.

AI Model Scanning

Because models are frequently changing, AppSOC provides both pre-deployment, and runtime scans of models to detect embedded malware, serialization vulnerabilities, insecure formats, and many other threats. The platform also scans AI notebooks to detect API calls to third-party, or business-critical SaaS applications (such as Salesforce or Workday). AppSOC is the only vendor to connect the dots between AI-specific threats, and vulnerabilities that could be exploited in non-AI connected applications.

AI Runtime Defense

Once AI applications are in operation, it’s important to ensure they don’t become conduits for data leaks, hallucinations, or inappropriate content. AppSOC monitors both prompts and responses through APIs or inline agents and can detect prompt injection, jailbreaking, malicious code, or inadvertent leaks of sensitive data.

By integrating AppSOC, companies can ensure that they have full visibility over their AI model ecosystem and enforce security protocols at every stage, from model discovery to deployment. These features provide much-needed assurance that external models, especially those sourced from repositories like Hugging Face, are safe to use and do not expose the business to hidden cybersecurity risks.  

* Watch the Video Blog *