[ Product ] EdgeTelemetry · v.0.9

The control layer for autonomous data centers.

EdgeTelemetry ingests heterogeneous telemetry from GPUs, hosts, cooling, power, and network systems, normalizes it into a unified schema, and validates system readiness — turning weeks of fragmented onboarding into hours of automated validation.

weeks → hrs
GPU rack onboarding
Time from rack landing to operational status, automated end-to-end.
unified
Schema across vendors
One operational view across GPU, host, cooling, power, network telemetry.
claude-ready
Reasoning layer integration
Architected for autonomous diagnosis and remediation via Claude tool use.
→ The architecture

Five sources. One schema. One control plane.

Modern GPU and data center environments stream telemetry from a fragmented vendor stack — GPU drivers, host OS metrics, cooling sensors, power systems, network fabric. Each in its own schema, sample rate, and reliability profile.

EdgeTelemetry ingests them in real time, normalizes them into a single operational schema, validates system state against readiness criteria, and exposes everything to a reasoning layer that turns telemetry into decisions.

// pipeline
SOURCES
gpu_telemetry · host_metrics · cooling_sensors · power_systems · network_fabric
UNIFIED SCHEMA
normalize · validate · enrich · stamp_lineage
READINESS VALIDATION
check_thresholds · check_dependencies · gate_for_operations
REASONING LAYER (CLAUDE)
interpret_state · explain_anomaly · plan_remediation
ACTION
diagnose · remediate · escalate
[ 01 ] Ingestion

Heterogeneous telemetry, one pipeline.

Real-time ingestion from GPU drivers, host metrics, cooling, power, and network fabric. Resilient to source variability. No vendor lock-in.

[ 02 ] Normalization

One unified operational schema.

Normalization into a consistent schema across vendors and source types. Validation, enrichment, lineage tracking. Queryable in real time.

[ 03 ] Readiness validation

Automated rack onboarding.

Validates system state against readiness criteria before declaring operational. Catches misconfigurations before they cost cluster time.

[ 04 ] Reasoning layer

Claude-powered autonomous ops.

Architected for a reasoning layer (typically Claude) that interprets state, hypothesizes root causes, plans remediation, and escalates with full context.

The right time to invest in unified telemetry isn't after your first major incident. It's before the rack ever powers on. EdgeTelemetry exists because we got tired of building this layer from scratch on every engagement.
DehazeLabs · Engineering Team
gpu_data_centers Operators standing up new GPU clusters who can't afford weeks of manual onboarding per rack — and whose customers expect immediate operational readiness. See GPU rack onboarding automation →
colocation_providers Colos serving AI workloads where customer SLAs depend on telemetry transparency and validated infrastructure state.
hyperscaler_capacity_partners Partners building capacity for hyperscalers where audit-grade telemetry, validation, and lineage are contractual requirements.
industrial_data_environments Manufacturing and industrial environments where the same fragmentation problems apply — heterogeneous sensors, vendor schemas, validation requirements — and the agentic operations vision is the same.
[ FAQ ]

Common questions about EdgeTelemetry.

How does EdgeTelemetry reduce GPU rack onboarding from weeks to hours?
EdgeTelemetry automates the validation steps that previously required manual SRE effort: hardware discovery across GPU, host, cooling, power, and network systems; schema normalization across vendor formats; configuration verification against readiness criteria; and a structured handoff to operations once the rack passes all checks. What used to take weeks of back-and-forth between teams completes as an automated validation run that generates a signed readiness record.
What GPU vendors and hardware types does EdgeTelemetry support?
EdgeTelemetry handles heterogeneous environments: NVIDIA GPU telemetry, AMD GPU metrics, host OS metrics, IPMI/BMC-based power and thermal data, cooling sensor feeds, and network fabric telemetry. The unified schema layer normalizes across vendor formats so operators get a single operational view regardless of procurement mix.
How does the Claude-powered reasoning layer work inside EdgeTelemetry?
The reasoning layer receives unified telemetry state and applies Claude via tool use to interpret anomalies, generate root-cause hypotheses, and plan remediation. Execution happens against a constrained action set — restart services, adjust configurations, file tickets, escalate with context. Escalation thresholds are configurable so autonomous actions stay within operator-approved scope.
Is EdgeTelemetry a SaaS product or deployed in our environment?
EdgeTelemetry is deployed in your environment — on-premises or in your cloud account. Data center telemetry data stays inside your infrastructure perimeter. The product ships as a deployable control plane you operate, not a third-party SaaS that processes your operational data externally. This is a deliberate architectural choice given the sensitivity of infrastructure telemetry.
How is EdgeTelemetry different from existing monitoring tools like Grafana, Datadog, or vendor-native dashboards?
Monitoring tools visualize telemetry. EdgeTelemetry normalizes, validates, and acts on it. The distinction matters most at onboarding time (when validation gates need to pass before declaring a rack operational) and at incident time (when a reasoning layer needs to interpret cross-system state and execute remediation, not just alert). Most customers run EdgeTelemetry alongside existing monitoring — it feeds validated telemetry into whatever visualization layer they already use.
What does early access involve and who qualifies?
Early access is a co-deployment engagement where we configure EdgeTelemetry for your specific hardware and operational environment. Qualifying operators are typically standing up new GPU clusters, running colocation with AI workloads, or managing environments with audit-grade telemetry requirements. We're working with a small number of operators. The initial step is a 30-minute briefing to see if the environment fits.

→ Related reading

AI for data center operators → GPU rack onboarding automation → Agentic operations → Real-time SIEM pipeline → Data center & infrastructure AI → DehazeLabs vs in-house AI team →

Get a briefing on EdgeTelemetry.

EdgeTelemetry is currently deployed with select customers. We're working with a small number of additional operators on early access. If your environment fits, we'd like to talk.