How to avoid the Microsoft GitHub goof that exposed 38TB of sensitive employee data

Microsoft’s AI research team accidentally exposed 38 terabytes of private data through a Shared Access Signature (SAS) link it published on a GitHub repository, according to a report by Wiz research that highlighted how CISOs can minimize the chances of this happening to them in the future.

Dubbed “robust-models-transfer,” the repository was meant to provide open-source code and AI models for image recognition, and the readers of the repository were provided a link to download the models from an Azure storage URL.

This URL allowed access to more than just open-source models, according to a Wiz blog post. It was configured to grant permissions to the entire storage account, exposing additional private data by mistake.

“The Azure storage account contained 38TB of additional data — including Microsoft employees’ personal computer backups,” Wiz said. “The backups contained sensitive personal data including passwords to Microsoft’s services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.”

The slipup — a misconfigured SAS link that allowed access to sensitive information — could be easily avoided if one understood what exactly went wrong.

Misconfigured SAS tokens created  risks

The Microsoft repository meant for providing AI models for use in training code instructed users to download a model data file through a SAS link and feed it into their scripts, Wiz noted. To do this, Microsoft developers used an Azure mechanism called “SAS tokens,” which allow you to create a shareable link to grant access to data in an Azure Storage account that, upon inspection, would still seem completely private.

The token used by Microsoft not only allowed access to additional storage accidentally through wide access scope, but it also carried misconfigurations that allowed “full control” permissions instead of read-only, enabling a possible attacker to not just view the private files but to delete or overwrite existing files as well.

In Azure, a SAS token is a signed URL granting customizable access to Azure Storage data, with permissions ranging from read-only to full control. It can cover a single file, container, or entire storage account, and the user can set an optional expiration time, even setting it to  never expire.

The full-access configuration “is particularly interesting considering the repository’s original purpose: providing AI models for use in training code,” Wiz said. The format of the model data file intended for downloading is ckpt, a format produced by the TensorFlow library. “It’s formatted using Python’s Pickle formatter, which is prone to arbitrary code execution by design. Meaning, an attacker could have (also) injected malicious code into all the AI models in this storage account,” Wiz added.

SAS tokens are difficult to manage

The granularity of SAS tokens opens up risks of granting too much access. In the Microsoft GitHub case, the token allowed full control of permissions, on the entire account, forever.

Microsoft’s repository used an Account SAS token — one of three types of SAS tokens that also include Service SAS, and User Delegation SAS — to allow service (application) and user access, respectively.

Account SAS tokens are extremely risky as they are vulnerable in terms of permissions, hygiene, management, and monitoring, Wiz noted. Permissions on SAS tokens can grant high level access to storage accounts either through excessive permissions, or through wide access scopes.

Hygiene issues involve tokens having an expiry problem, where organizations use tokens with a very long (sometimes lifetime) expiry at default.

Otherwise, account SAS token are extremely hard to manage and revoke. “SAS tokens are created on the client side; therefore, it is not an Azure tracked activity, and the generated token is not an Azure object,” Wiz said. “There isn’t any official way to keep track of these tokens within Azure, nor to monitor their issuance, which makes it difficult to know how many tokens have been issued and are in active use.”

Recommendations include configuration hacks and monitoring

Wiz recommends avoiding external sharing of Account SAS, given the issues involving lack of security and governance. If external sharing can’t be helped, Service SAS must instead be selected with a stored access policy to allow for the management of policies and revocation in a centralized manner.

For sharing content in a time-limited manner, expiry for user-delegation SAS should be capped at seven days. Creating dedicated storage accounts can be a good practice too, in cases where external sharing is inevitable.

Wiz  also recommended  tracking active SAS token usage by “enabling storage analytics logs” on storage accounts. 

“The resulting logs will contain details of SAS token access, including the signing key and the permissions assigned,” Wiz said. “However, it should be noted that only actively used tokens will appear in the logs, and that enabling logging comes with extra charges — which might be costly for accounts with extensive activity.”

Azure Metrics can also be used to monitor SAS token usage in storage accounts for events up to 93 days. Additionally, secrets-scanning tools can also come in handy to detect leaked or over-privileged SAS tokens in artifacts and publicly exposed assets, according to Wiz.

Access Control, DevSecOps