NSF-DOE Vera C. Rubin Observatory | Data management
Key Moments
LSST scans the sky nightly for 10 years, generating huge data moved to four processing centers.
Key Insights
The Rubin Observatory will produce an enormous data load (about 20 TB per night) by taking 30-second exposures every night for a decade.
All captured data are routed to four processing centers to enable distributed processing and collaboration.
The data management approach builds on decades of experience from high-energy physics and prior sky surveys.
Cloud computing and distributed computing concepts evolved from earlier work, influencing current data pipelines and storage strategies.
Months of commissioning have been dedicated to finalizing the best data handling and pipelines for collaborators and public use.
Overview of the Rubin Observatory Mission
The Rubin Observatory is designed to systematically map the night sky over a ten-year period, capturing the cosmos in unprecedented detail. Each night, the telescope surveys the sky using a rapid cadence to build an ultra-wide, high-definition map of astronomical objects and phenomena. The mission aims to deliver continuous, repeatable observations that enable time-domain astronomy, allowing researchers to study changing objects such as supernovae, asteroids, and other transient events. This long-term, high-cadence approach requires reliable data handling to support ongoing scientific discovery.
Massive Data Volume and Storage Strategy
A single night of observations by the Rubin Observatory’s camera yields about 20 terabytes of data, underscoring the scale of modern astronomical surveys. With a ten-year operation horizon, the accumulated data volume becomes staggering, demanding robust storage, transfer, and processing capabilities. The project is designed to manage this flood of information efficiently, ensuring that raw and processed data remain accessible to the collaboration and, in time, to the broader scientific community. The high data rate drives thoughtful architecture for ingest, calibration, and long-term archival.
Distributed Data Processing and Center Collaboration
All data from the telescope are routed to four processing centers, creating a distributed computing ecosystem that enables parallel analysis and collaboration. Fermilab brings its expertise from high-energy physics experiments and previous sky surveys to help design data movement and pipelines that pull data from the telescope and distribute it effectively across sites. This distributed architecture supports scalable processing, quality control, and reproducibility, ensuring that collaborators can access and analyze the data efficiently regardless of their local infrastructure.
Historical Context: Lessons from Early Surveys and Parallel Computing
The project acknowledges that cloud computing was not a factor when the first large astronomical surveys began in the late 1990s. At that time, physicists developed methods to run hundreds of thousands of jobs in parallel across distributed computing resources. Some of these early innovations later contributed to the cloud technologies we rely on today, including platforms connected to everyday devices. This lineage highlights how fundamental lessons in distributed processing, scheduling, and resource management continue to shape how Rubin handles data.
Commissioning Phase and Pipeline Optimization
During commissioning, the team has focused on finalizing the best ways to run and track all data from the telescope, with the goal of building an optimal pipeline for collaborators. This involves testing, benchmarking, and refining workflows to ensure efficiency, reliability, and traceability. The pipeline design is intentionally aligned with the actual imaging data that will be released for analysis, providing a practical, end-to-end pathway from capture to analysis that researchers can depend on as new data become available.
Impact on Collaboration and Open Science
Data management is not just a technical concern; it affects every user of computer systems who relies on accurate, timely information. By creating robust data handling and processing infrastructure, Rubin ensures that researchers—across institutions and countries—can collaborate effectively. The approach emphasizes reproducibility, accessibility, and transparency, enabling the broader community to benefit from the survey's findings. As images and data are released, the established pipelines help ensure that science proceeds smoothly, responsibly, and with maximum scientific return.
Mentioned in This Episode
●Tools & Products
Common Questions
The LSST camera generates about 20 terabytes of data per night, illustrating the massive scale of nightly sky surveys.
Topics
Mentioned in this video
More from Fermilab
View all 9 summaries
8 minIs dark matter hiding in the neutrino fog? | Even Bananas
2 minThe Dark Energy Survey | Investigating how the universe expands
79 minScientific Seminar: MicroBooNE finds no evidence for a single sterile neutrino
2 minMicroBooNE | Studying the elusive neutrino
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free