ingero: Ingero is a production-grade eBPF agent that traces GPU workloads from Linux kernel events through CUDA API calls to Python source lines. It provides causal root cause analysis with under 2% overhead and no code changes. It includes an MCP server for AI agent integration, enabling natural language diagnosis.; conductor: Conductor is a Netflix-developed platform designed to orchestrate complex workflows across microservices, supporting creation via JSON and code. However, Netflix discontinued its official OSS maintenance on December 13, 2023, while encouraging community forks and continued development.
Diagnose GPU stalls and latency spikes in training workloads
Coordinating complex business processes involving multiple microservices.