Trend Micro Inc.

04/28/2025 | News release | Distributed by Public on 04/28/2025 02:09

NVIDIA Riva Vulnerabilities Leave AI-Powered Speech and Translation Services at Risk

However, even when all the certificate parameters are provided in the config.sh from the NVIDIA QuickStart package, the gRPC server enforces only TLS/SSL connection and encrypt the traffic between the client and the server. This means you will be able to verify that the server is what it claims to be. However, nobody will verify the client, and everyone will be able to use the service. This behavior might invoke a false sense of security while the services are exposed to everyone.

What about the other exposed ports? The Riva server internally communicates with the Triton Inference Server. In fact, it just translates API requests into a language that Triton Inference Server understands. Those ports expose the Triton Inference Server binary due to the container configuration:

  • REST API endpoint (default 8000)
  • gRPC API endpoint (default 8001)
  • HTTP metrics endpoint (default 8002) (only /metrics endpoint)

This make the REST and gRPC Triton Inference Server API available for everyone. So even when successfully securing the Riva server gRPC endpoint, it could still be completely bypassed by translating the requests to the Triton Inference Server endpoints.

Notably, some of the endpoints pose a significant security risk when Triton Inference Server is configured with advanced settings, as they might expose inherent flaws and previously disclosed vulnerabilities.

Many might dispute those issues as a security problem for the user. However, we first need to understand the problem's scope and prevalence. To answer that, we must describe how we first identified the problem. We previously wrote about a problem of insecure gRPC implementations, for instance.

Extending our previous research, we found 54 unique IP addresses with their NVIDIA Riva services exposed, all of which belonging to multiple cloud service providers. These finding led us to analyze the root source of the problem.

Security best practices and recommendations

We recommend all Riva service administrators check their configuration against unintended service exposure and ensure that they're running the latest version of the Riva framework. In addition to NVIDIA's best practices, consider implementing the following security measures:

  • Implement a secure API gateway and expose only intended gRPC or REST API endpoints. These help prevent unauthorized access and protect back-end services.
  • Apply network segmentation by restricting access to the Riva server and Triton Inference Server to trusted networks. This helps minimize the attack surface and prevents unauthorized access from the internet.
  • Require strong authentication mechanisms and enforce role-based access control to ensure only authorized users and services can interact with Riva APIs. Consider zero-trust approaches, such as identity-aware access, to ensure that only authenticated and authorized users and devices can interact with Riva services.
  • Review and modify container settings to disable unnecessary services, remove unused ports, and restrict privileged execution. This prevents attackers from exploiting exposed services or misconfigurations.
  • Enable logging and monitoring on Riva and Triton Inference Server to detect unusual access patterns, anomalous activities, or potential abuse.
  • Consider rate limiting and API request throttling, particularly if gRPC or REST endpoints are exposed to external networks or integrated into environments where threat actors could attempt brute-force or DoS attacks.
  • Keep the Riva framework, Triton Inference Server, and dependencies up to date to mitigate known vulnerabilities and protect against newly discovered exploits.