Last month, we released the Foxglove SDK, our toolkit that enables you to supercharge your integration with Foxglove, making data visualization easier than ever, regardless of your robotics software stack.
Fig. 1 Berkeley_autolab_ur5 data visualized in Foxglove.
In this tutorial, we will demonstrate how easy it is to streamline a dataset visualization using a selected Open X-Embodiment dataset.
Open X-Embodiment is a dataset resulting from a collaboration among 21 institutions. The dataset demonstrates 527 skills and contains 2,419,193 episodes.
In the Open X-Embodiment Dataset Overview, you will find a list of all datasets, with a neat table showing what you can expect to see in each of them. Note that each dataset may contain different modalities that may be of interest if you would like to explore the data.
The datasets in Open X-Embodiment don’t have a fixed structure, and what you will find in the dataset depends on the Researchers who created the dataset. Because of this, you will need to approach each dataset individually, as we will in this blog post.
A good entry point to Open X-Embodiment is this jupyter notebook on colab. We will base our data fetching logic on this resource.
For this tutorial, I selected the berkeley_autolab_ur5 dataset. The data contains multiple camera feeds (including a depth camera!) and plenty of robot states, making it a fun demonstrator to work with.
In this tutorial, we assume that you have Python 3.9 or later installed on your machine. You will also need to install the following Python packages:
We will start our adventure by specifying some imports, global variables, and some code to fetch the dataset that’s based on the official colab
import tensorflow_datasets as tfds
DATASET = "berkeley_autolab_ur5"
TARGET_EPISODE = 40
CONTROL_RATE_HZ = 5 # Depends on the dataset!
def dataset2path(dataset_name):
if dataset_name == "robo_net":
version = "1.0.0"
elif dataset_name == "language_table":
version = "0.0.1"
else:
version = "0.1.0"
return f"gs://gresearch/robotics/{dataset_name}/{version}"
b = tfds.builder_from_directory(builder_dir=dataset2path(DATASET))
ds = b.as_dataset(split="train[{}:{}]".format(TARGET_EPISODE, TARGET_EPISODE + 1))
episode = next(iter(ds))
print("Successfully loaded the dataset: ", DATASET)
assert "steps" in episode, "The dataset does not contain 'steps' key."
print(f"Number of steps in the episode: {len(episode['steps'])}")
In the above code, we ensure that we can fetch a dataset and that it contains an episode with some steps. When following this tutorial, you can freely select a dataset from the dataset table, but if you do, ensure that you update the CONTROL_RATE_HZ variable that we will implement in the next step.
The output we want to see when running this program is:
Successfully loaded the dataset: berkeley_autolab_ur5
Number of steps in the episode: 114
Note that fetching the data might take some time, and you might see some warnings that you might safely ignore:
2025-06-06 11:12:09.605146: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-06 11:12:09.608613: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-06 11:12:09.618830: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749201129.634912 33730 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749201129.640805 33730 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749201129.657395 33730 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
We have our episode loaded, let’s now iterate through all the steps of the episode, and let’s make sure the data we find interesting is there. We will write a function print_step_info:
def print_step_info(step):
print(f"Step {i}:")
print(f" image shape: {step['observation']['image'].shape}")
print(f" hand_image shape: {step['observation']['hand_image'].shape}")
print(f" image_with_depth shape: {step['observation']['image_with_depth'].shape}")
print(
f" natural language instruction: {step['observation']['natural_language_instruction']}"
)
print(f" Action rotation delta: {step['action']['rotation_delta']}")
print(f" Action world vector: {step['action']['world_vector']}")
print(f" Robot state: {step['observation']['robot_state']}")
And call it when we are iterating over the steps in our episode:
for i, step in enumerate(episode["steps"]):
print_step_info(step)
The terminal output we should now see for each step looks like this:
Step 95:
image shape: (480, 640, 3)
hand_image shape: (480, 640, 3)
image_with_depth shape: (480, 640, 1)
natural language instruction: b'pick up the blue cup and put it into the brown cup. '
Action rotation delta: [0. 0. 0.]
Action world vector: [0. 0. 0.]
Robot state: [-2.975159 -1.2078403 1.7616024 -2.044398 -1.7078537 3.1259193
0.53834754 0.16683726 0.02349306 0.6345158 0.76873034 -0.07564076
0.02686722 1.
If you've made it this far, great! We are now sure we can access our episode data and start streaming it!
We are now ready to start streaming our data to Foxglove in real time. To begin, we will ensure that the robot in our dataset actually performs the assigned tasks.
Let’s start by adding some imports that we will utilize in this section
import tensorflow_datasets as tfds
import foxglove
from foxglove import Channel
from foxglove.schemas import (
RawImage,
)
from foxglove.channels import RawImageChannel
import time
Let’s start the Foxglove server using our SDK:
server = foxglove.start_server() # We can start it before fetching tfds dataset
Now, let’s define the schema for a language instruction and a corresponding channel:
language_instruction_schema = {
"type": "object",
"properties": {
"text": {
"type": "string",
},
},
}
language_instruction_chan = Channel(
topic="/natural_language_instruction", schema=(language_instruction_schema)
)
Before we proceed, let’s discuss some terminology:
In short, we have created a schema for language instruction that includes a text entry, and we will publish it on the /natural_language_instruction
topic.
Now, let’s modify our initial iteration through the episode steps in the following way:
try:
while True:
for i, step in enumerate(episode["steps"]):
print_step_info(step)
# Publish the natural language instruction
instruction_str = (
step["observation"]["natural_language_instruction"]
.numpy()
.decode("utf-8")
)
instruction_msg = {"text": instruction_str}
language_instruction_chan.log(instruction_msg)
# Publish the image
image_msg = RawImage(
data=step["observation"]["image"].numpy().tobytes(),
width=step["observation"]["image"].shape[1],
height=step["observation"]["image"].shape[0],
step=step["observation"]["image"].shape[1] * 3, # Assuming RGB image
encoding="rgb8",
)
image_chan.log(image_msg)
time.sleep(1 / CONTROL_RATE_HZ)
except KeyboardInterrupt:
print("Keyboard interrupt received. Will stop the server.")
finally:
server.stop()
print("Server stopped.")
You can run the code now, and assuming the loop is constantly running, we will open Foxglove and connect to our server by selecting the “Open connection” button in the dashboard
Fig. 3 Connecting Foxglove to our server.
If everything went well, you will see at the top bar that Foxglove connected to the local server at ws://localhost:8765
and the current time is displayed. With Foxglove connected, we can now add a display for our natural language instruction, which will be shown in a Raw Messages panel.
Fig. 4 Adding a RawMessage and displaying our instruction in it.
Great, now we can display our instruction inside a Raw Messages panel, but the text being static throughout the episode duration hardly makes for an exciting demo. Let’s view some camera feeds.
We will start by creating a RawImageChannel
for the topic /image
outside our loops:
image_chan = RawImageChannel(topic="/image")
Inside our iteration through the episode steps, we will now create a RawImage object and publish it on the channel we made in the previous code snippet:
# Publish the image
image_msg = RawImage(
data=step["observation"]["image"].numpy().tobytes(),
width=step["observation"]["image"].shape[1],
height=step["observation"]["image"].shape[0],
step=step["observation"]["image"].shape[1] * 3, # Assuming RGB image
encoding="rgb8",
)
image_chan.log(image_msg)
time.sleep(1 / CONTROL_RATE_HZ)
In this snippet, it’s crucial that we correctly assign the step
value of the RawImage
that calculates the row stride sizes. For an RGB8 image, this means we multiply the width by 3 channels. If we were dealing with a depth image with float32 values with 32FC1 encoding, we would multiply the width by 4 bytes.
Since our image is now being published, let’s add it to our visualization:
Fig. 5 Adding an Image panel.
Now, that’s far more exciting! Remember that the dataset we selected also has the following image feeds:
So you have even more room to make things dynamic in your panels
Let’s make our visualization even more exciting by adding two more elements:
Reading the dataset website, we can learn that the step/observation/robot_state contains the following information:
robot_state: np.ndarray((L, 15))
Having values of x, y, z, qx, qy, qz, and qw makes it trivial for us to create a transform. We will start by updating the imports:
import tensorflow_datasets as tfds
import foxglove
from foxglove import Channel
from foxglove.schemas import (
RawImage,
FrameTransform,
Vector3,
Quaternion,
)
from foxglove.channels import (
RawImageChannel,
FrameTransformChannel,
)
import time
Then, we can create the FrameTransformChannel
:
transform_chan = FrameTransformChannel(topic="/tf")
Afterwards, we can craft a FrameTransform
message and log it:
# Publish the end-effector transform
robot_state = step["observation"]["robot_state"].numpy()
transform_msg = FrameTransform(
parent_frame_id="robot_base",
child_frame_id="end_effector",
translation=Vector3(
x=float(robot_state[6]),
y=float(robot_state[7]),
z=float(robot_state[8]),
),
rotation=Quaternion(
x=float(robot_state[9]),
y=float(robot_state[10]),
z=float(robot_state[11]),
w=float(robot_state[12]),
)
)
transform_chan.log(transform_msg)
The FrameTransform
message is simple; we specify two frames and provide translation and rotation from the parent frame to the child frame. Since the measurement units in the dataset are in the metric system and rotations are represented as quaternions, we don’t need to modify the data in any way to make it usable.
With the above change, we can add a 3D panel to our scene and visualize the transform:
Fig. 6 Frame Transforms in 3D panel.
Now, for the plotting of float value, we will create a custom schema:
float_schema = {
"type": "object",
"properties": {
"value": {
"type": "number",
"format": "float",
},
},
}
And we will create a Channel object using this schema, just like we did for the natural language instruction at the beginning of this tutorial:
gripper_chan = Channel(
topic="/gripper_state",
schema=(float_schema)
)
To log the gripper status, we can now do the following:
# Publish the gripper state
gripper_msg = {"value": float(robot_state[13])}
gripper_chan.log(gripper_msg)
Adding a plot panel to Foxglove, we can now drag and drop the gripper_state value from the Topics tab to the plot panel, and we will be able to display the state as follows:
We are now approaching my favorite part of the SDK implementation, recording data for future replay. It could not be any simpler:
filename = f"{DATASET}_episode_{TARGET_EPISODE}.mcap"
writer = foxglove.open_mcap(filename)
server = foxglove.start_server()
We construct a filename that will hold a meaningful name for us, and create a writer object. That’s it. Now, all the data we log will also be saved to the MCAP file.
Before we run this code, we should comment out our while loop; after all, we want our MCAP to contain a single episode.
We now invite you to experiment with the datasets, log even more data, and create your own layout.
As for me, here is my current setup for this dataset:
If you’d like to reuse my layout, or view the source code, please check out our tutorials repository.
In this tutorial, we discussed the basics of Foxglove SDK in Python. The code we’ve built is a great starting point for you to explore Open X-Embodiment and other datasets. Here are some ideas on what you can do next (consider this homework):