ingero
Active·★ 84·Apache-2.0·Updated 2026-05-29
★ Trending★ Observability★ LLM Infra
eBPF-based GPU causal observability agent
Ingero is a production-grade eBPF agent that traces GPU workloads from Linux kernel events through CUDA API calls to Python source lines. It provides causal root cause analysis with under 2% overhead and no code changes. It includes an MCP server for AI agent integration, enabling natural language diagnosis.
#causal-tracing#cuda#cuda-graphs#ebpf#gpu#gpu-monitoring#gpu-observability#incident-response
01
Features
01Full-stack GPU tracing from kernel to Python source
02Causal root cause analysis with automated diagnosis
03<2% overhead, zero code changes, single binary
04MCP server for AI assistant integration
05Multi-node fleet queries and browser dashboard
02
Compatibility
Linux Kernel
Linux 5.15+ with BTF
Verified via docs
NVIDIA Driver
NVIDIA 550+
Verified via docs
CUDA
CUDA 11.x/12.x/13.x
Verified via docs
NVIDIA GPU
GH200/H100/A100/A10/RTX 4090/3090
Verified via docs
03
Quick start
1
$ VERSION=0.10.0
2
$ curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v${VERSION}/ingero_${VERSION}_linux_amd64.tar.gz" | tar xz
3
$ sudo mv ingero /usr/local/bin/
04
Use cases
↳Diagnose GPU stalls and latency spikes in training workloads
↳Identify CUDA API bottlenecks and CPU/GPU contention
↳Multi-node distributed training debugging with fan-out queries
05
Alternatives
Related searches
Comments
Log in to leave a comment
- CCameron WilsonMay 16, 2026
Causal tracing for GPU workloads gives insights that standard profilers miss
- TTaylor JacksonApr 21, 2026
eBPF-based GPU observability is a unique approach, very low overhead
- RRowan NguyenApr 4, 2026
The agent-based observability model is well suited to distributed ML training
- EEmerson PatelMar 23, 2026
Used for debugging GPU memory issues in training runs, caught subtle allocation patterns