How many processing centers handle the Rubin Observatory's data?

All data from each night is sent to four processing centers to be processed and distributed to collaborators.

How are tasks run in parallel across the computing sites?

Physicists developed methods to manage many jobs running at distributed computing sites, enabling parallel processing.

What is the goal of the pipeline discussed for collaborators?

The goal is to create the best possible pipeline for collaborators to use the data products efficiently and reliably.

What will happen to the images captured by the observatory?

The images and the data pipeline will reveal outputs to you as part of the data product releases.

Key Moments

NSF-DOE Vera C. Rubin Observatory | Data management

Q: What computing approach did early sky surveys rely on before cloud computing?

Researchers relied on distributed computing to run hundreds of thousands of jobs in parallel, since cloud computing wasn't a thing in the 1990s.

Fermilab

Science & Technology3 min read2 min video

Jun 23, 2025|16,314 views|822|17

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

LSST scans the sky nightly for 10 years, generating huge data moved to four processing centers.

Key Insights

The Rubin Observatory will produce an enormous data load (about 20 TB per night) by taking 30-second exposures every night for a decade.

All captured data are routed to four processing centers to enable distributed processing and collaboration.

The data management approach builds on decades of experience from high-energy physics and prior sky surveys.

Cloud computing and distributed computing concepts evolved from earlier work, influencing current data pipelines and storage strategies.

Months of commissioning have been dedicated to finalizing the best data handling and pipelines for collaborators and public use.

Overview of the Rubin Observatory Mission

The Rubin Observatory is designed to systematically map the night sky over a ten-year period, capturing the cosmos in unprecedented detail. Each night, the telescope surveys the sky using a rapid cadence to build an ultra-wide, high-definition map of astronomical objects and phenomena. The mission aims to deliver continuous, repeatable observations that enable time-domain astronomy, allowing researchers to study changing objects such as supernovae, asteroids, and other transient events. This long-term, high-cadence approach requires reliable data handling to support ongoing scientific discovery.

Massive Data Volume and Storage Strategy

A single night of observations by the Rubin Observatory’s camera yields about 20 terabytes of data, underscoring the scale of modern astronomical surveys. With a ten-year operation horizon, the accumulated data volume becomes staggering, demanding robust storage, transfer, and processing capabilities. The project is designed to manage this flood of information efficiently, ensuring that raw and processed data remain accessible to the collaboration and, in time, to the broader scientific community. The high data rate drives thoughtful architecture for ingest, calibration, and long-term archival.

Distributed Data Processing and Center Collaboration

All data from the telescope are routed to four processing centers, creating a distributed computing ecosystem that enables parallel analysis and collaboration. Fermilab brings its expertise from high-energy physics experiments and previous sky surveys to help design data movement and pipelines that pull data from the telescope and distribute it effectively across sites. This distributed architecture supports scalable processing, quality control, and reproducibility, ensuring that collaborators can access and analyze the data efficiently regardless of their local infrastructure.

Historical Context: Lessons from Early Surveys and Parallel Computing

The project acknowledges that cloud computing was not a factor when the first large astronomical surveys began in the late 1990s. At that time, physicists developed methods to run hundreds of thousands of jobs in parallel across distributed computing resources. Some of these early innovations later contributed to the cloud technologies we rely on today, including platforms connected to everyday devices. This lineage highlights how fundamental lessons in distributed processing, scheduling, and resource management continue to shape how Rubin handles data.

Commissioning Phase and Pipeline Optimization

During commissioning, the team has focused on finalizing the best ways to run and track all data from the telescope, with the goal of building an optimal pipeline for collaborators. This involves testing, benchmarking, and refining workflows to ensure efficiency, reliability, and traceability. The pipeline design is intentionally aligned with the actual imaging data that will be released for analysis, providing a practical, end-to-end pathway from capture to analysis that researchers can depend on as new data become available.

Impact on Collaboration and Open Science

Data management is not just a technical concern; it affects every user of computer systems who relies on accurate, timely information. By creating robust data handling and processing infrastructure, Rubin ensures that researchers—across institutions and countries—can collaborate effectively. The approach emphasizes reproducibility, accessibility, and transparency, enabling the broader community to benefit from the survey's findings. As images and data are released, the established pipelines help ensure that science proceeds smoothly, responsibly, and with maximum scientific return.

Mentioned in This Episode

●Products

Common Questions

The LSST camera generates about 20 terabytes of data per night, illustrating the massive scale of nightly sky surveys.

Topics

Lsst Data Management Data Pipeline Distributed Computing Cloud Computing Astronomical Surveys Observatory Data

Mentioned in this video

Products

LSST camera

The camera used by the Vera C. Rubin Observatory; alone generates about 20 TB of data per night.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free