What robot log formats do you work with?

Our primary format is MCAP — the ROS 2 native format and the format Foxglove is built on. We also ingest ROS 1 bags (.bag), and can work with structured telemetry exports from proprietary platforms. If your fleet is generating data in a format we haven't seen, we scope a one-week ingestion sprint before starting the main engagement.

Do you need real-time data or does this work on historical logs?

Both. The initial engagement typically starts with historical logs — you have incidents already; we build the pipeline that surfaces and labels them. Ongoing, we add a streaming ingestion layer so new incidents flow into the dataset automatically. Most teams get more value in month one from historical log processing than they expect.

What does 90 days actually deliver?

At 90-day close: a working incident detection pipeline against your MCAP data, a labeled FiftyOne dataset covering at least two incident types, CVAT annotation task exports ready for your labeling team, and a Foxglove layout for replay review. The pipeline is yours — we document it, hand it over, and scope ongoing work if you want us to run it.

Why Foxglove, FiftyOne, and CVAT — why not build custom tooling?

Because your ML team already knows these tools, or will need to learn them regardless of who builds the pipeline. Foxglove is where the robotics ecosystem converges for replay and debugging. FiftyOne is the standard for visual ML dataset management. CVAT is the leading open-source annotation platform. We assemble them into a production workflow — we don't build alternatives to them.

Physical AI Data Flywheel | Robot Log → Labeled Failure Dataset

Physical AI Data Flywheel — raw robot logs become labeled ML datasets.

Every robot failure is a training signal. Most robotics teams have the logs — they don't have the pipeline to find the incidents, surface what happened, and get structured labels into the hands of the ML team. We build that pipeline: MCAP ingestion, multi-signal incident detection, Claude-assisted triage, FiftyOne curation, and CVAT annotation handoff. 90 days from raw logs to a queryable failure dataset.

→ Why robot data is hard

A warehouse robot running 12 hours a day generates hundreds of gigabytes of sensor data. Most of it is uneventful. Somewhere in that stream is the 90-second window where localization degraded, recovery behaviors triggered, and the mission aborted — and that window contains more information about model failure modes than a thousand hours of clean operation.

The problem isn't data volume. It's that the failure signal is buried: AMCL covariance spikes on one topic, path deviation appears on another, the behavior tree abort is a third. Nobody is correlating them manually across a fleet of 50 robots. The incidents that reach the ML team are the ones dramatic enough that a human noticed — which means the subtle, high-signal failures go unlabeled.

The Physical AI Data Flywheel is the pipeline that surfaces the subtle ones.

How the pipeline works

1. Log ingestion and topic parsing

We ingest MCAP files (the ROS 2 native format, Foxglove-native) and parse them into structured dataframes per topic. The target topics vary by robot type — for navigation failures: AMCL pose, lidar scans, behavior tree logs, odometry, velocity commands. We handle compressed images, pointclouds, and custom message types. The ingestion layer normalizes timestamps and joins topics into a coherent incident view.

2. Multi-signal incident detection

We monitor three or more independent signals simultaneously. An incident window is confirmed when at least two signals fire within a configurable time window of each other — this eliminates single-sensor noise and false positives. For navigation failures, the signals are: localization covariance spike, plan deviation beyond threshold, and Nav2 abort events. The combination is what matters; any single signal in isolation may be routine.

→ Example incident detection (warehouse lidar failure)

T+3:02 AMCL covariance trace spikes — 5× 30s rolling baseline /amcl_pose

T+3:21 Robot path deviation exceeds 0.5m from planned route /plan · /amcl_pose

T+4:30 Nav2 recovery behavior triggered — rotate_recovery /behavior_tree_log

T+4:58 Goal aborted after max recovery attempts /behavior_tree_log

3. Claude triage — hypothesis, not verdict

We send structured telemetry metrics from the incident window to Claude: covariance values, scan statistics, diagnostic messages, the sequence and timing of signal events. Claude returns a triage hypothesis — what most likely caused this incident, confidence level with justification, investigation steps for the robotics engineer, and data gaps that would increase diagnostic confidence.

This is not a chatbot. Claude receives structured telemetry context and returns structured output that gets cached and served from the dashboard. The language is intentional: "triage hypothesis" and "investigation steps" — the system augments engineer judgment, it doesn't replace it.

→ Example Claude triage output

"The incident pattern is consistent with progressive lidar degradation beginning at T+3:00. The simultaneous increase in AMCL covariance trace and reduction in mean scan range suggests sensor occlusion, dust accumulation, or hardware dropout rather than an environmental obstacle — a real obstacle would increase range readings in specific sectors, not reduce mean range uniformly. AMCL lost confidence in its particle distribution because the scan data no longer matched the map model, triggering the recovery cascade."

Confidence: High · model: claude-sonnet-4-6

4. Frame extraction and FiftyOne curation

We extract 50-100 frames from the MCAP spanning the incident window: baseline frames from before the incident, frames across each failure phase, and post-incident frames. Each frame is tagged with its phase (normal, lidar_degradation, localization_uncertainty, recovery_behavior, abort) and annotated with numeric metadata from the telemetry (covariance value, path deviation, scan dropout rate at that timestamp).

CLIP embeddings run on all extracted frames. This enables visual similarity search — "find frames that look like this failure frame" — and surfaces clusters of visually similar incidents across the dataset. Teams that run this on months of historical logs regularly find incident patterns they didn't know existed.

5. CVAT annotation handoff

The FiftyOne dataset exports to CVAT Image 1.1 format — a structured annotation task ready for the labeling team. Frame-level tags from the pipeline become CVAT labels. The labeling team opens structured tasks, not a folder of unlabeled images. This is where the flywheel closes: labeled data flows back to model training, which improves the behaviors that generated the incidents.

What Foxglove adds to the workflow

Foxglove is where engineers review incidents before labeling. We configure a panel layout that shows the incident replay seeked to the fault window: raw lidar vs. corrupted lidar side by side, particle cloud scatter, AMCL covariance plot over time, camera feed. The visual diff between clean and degraded sensor data is immediately obvious to any engineer looking at it — and it's the thing that makes triage decisions defensible. We embed Foxglove replay directly in the dashboard so stakeholders can scrub through incidents without a ROS environment.

→ Reference deployment

We have run this pipeline in production for a robotics client — MCAP ingestion, multi-signal incident detection, Claude triage hypothesis, FiftyOne dataset curation, and CVAT annotation handoff. Client details are available under NDA. The demo environment (simulated warehouse failure, full Foxglove replay, Claude output, FiftyOne dataset) replicates the production pipeline end to end.

→ The flywheel part

The name matters. A one-time data cleanup project isn't a flywheel. The flywheel is what happens after the first 90 days.

New incidents generate new MCAP files. The detection pipeline runs on new logs automatically. New incidents flow into the FiftyOne dataset tagged and ready for review. The labeling team has a continuous stream of structured annotation tasks instead of periodic fire drills. Model retraining triggers when the dataset grows past a threshold. The model improves, which changes the failure mode distribution, which generates new training signal.

Most teams that engage us for the initial 90-day build extend into ongoing pipeline operations. The alternative is a one-time dataset that ages out of relevance as the robot software and operational environment evolve.

Robot failures you don't know about yet — buried in logs you're not processing.

Tell us about your fleet: robot type, log volume, how incidents are currently surfaced, and what your ML team is blocked on. We'll tell you whether the Physical AI Data Flywheel maps to your situation and what a 90-day engagement would look like.

AMRs	Warehouse navigation
Inspection	Industrial + infrastructure
Field robotics	Outdoor unstructured
Manipulation	Pick-and-place, assembly

timeline	90 days to first dataset
buyer	VP ML · CTO · Head of Platform
format	Fixed-scope phases

localization	AMCL covariance trace
navigation	Plan deviation · recovery count
perception	Lidar dropout · range distribution
manipulation	Force torque anomalies · grasp success
system	Compute load · latency spikes
custom	Any published ROS topic

Physical AI Data Flywheel — raw robot logs become labeled ML datasets.