ingero: Ingero is a production-grade eBPF agent that traces GPU workloads from Linux kernel events through CUDA API calls to Python source lines. It provides causal root cause analysis with under 2% overhead and no code changes. It includes an MCP server for AI agent integration, enabling natural language diagnosis.; trigger.dev: Trigger.dev is an open-source platform designed for building AI workflows and agents using TypeScript. It provides a robust environment for long-running tasks with built-in features like retries, queues, observability, and elastic scaling, eliminating typical serverless timeouts.
Diagnose GPU stalls and latency spikes in training workloads
Building and deploying long-running AI agents and complex workflows.