LINCC Frameworks will develop state-of-the-art analysis techniques that can meet the scale and complexity demanded by the Vera C. Rubin Observatory Legacy Survey of Space and Time (Rubin LSST) data. This project is supported by Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt as part of the Virtual Insitute of Astrophysics (VIA).
Through interactions with the community, we will continually refine the plans, identify new opportunities to collaborate, coordinate with groups already working in these areas, and seek other areas where software infrastructure development could strongly impact community software development for LSST. For a more comprehensive description of technical projects, including technical documents, visit the team’s wiki page.
Scalable Spatial Analysis Framework
Over its lifetime, Rubin LSST will compile an unprecedented catalog of tens of billions of astrophysical objects. This wealth of data will enable astronomers to answer a range of scientific and statistical questions about our universe, such as:
- Understanding structure by analyzing the distribution of objects
- Modeling the changes of variable sources over time
- Allowing prompt localization of electromagnetic counterparts to gravitational wave sources
- Providing interpretable and information-rich cross-correlations against multi-wavelength datasets, including the Cosmic Microwave Background
Supporting these science questions requires key functionality in an analysis framework, including the ability to:
- Store and manipulate catalog data at scale
- Perform distributed computation over this data
- Use spatial structure within searches and statistical computation
- Interoperate with data from other surveys
- Access these catalogs without having to download them directly
The LINCC Frameworks team is developing the Large Survey DataBase (LSDB), an infrastructure to facilitate the analysis of large survey data based on efficient spatial partitioning. Driven by the initial use cases of catalog cross-matching and distributed time series analysis, the team is developing an end-to-end technology stack. The work includes coordinating with the broader astronomy community on developing standardized data formats, enabling cloud or HPC-based analysis, and the development of a full suite of software tools.
Time-Domain Science Framework
Rubin is uniquely capable of monitoring the sky at unprecedented depths, detecting and characterizing the time-variability of tens of billions of astrophysical objects. Fast and efficient access to astronomical light curves (LCs) will enable the discovery of the most energetic events in the Universe and provide the first systematic characterization of variability within the Milky Way. Scientific discoveries we expect from this framework include:
- Identification of the pre-explosion outbursts from supernova progenitors, which are poorly understood and which hold the key to understanding the chemical enrichment of the intergalactic medium
- Understanding the evolution and death of stars by detecting the very rarest and the most distant transients that represent the most extreme ways stars die
- Probing the properties of dark matter by mapping the distribution of mass in the Milky Way using the kinematics of Cepheids and RR Lyrae
Supporting these science questions requires key functionality:
- A database or datastore for lightcurves that scales to the size of LSST data sets (1010 sources)
- Algorithms to automatically search and analyze light curves, including measures of multi-band periods, detection of outbursts, generation of classification features, identification of changes in the state of a source, and measures of distance between lightcurves accounting for phase and period variation
- A framework that scales user-defined algorithms to the size of LSST data
- Trained neural network architectures for classifying sparsely sampled time series data (including trusted training samples)
- A fast and scalable catalog cross-matching engine to match sources from the LSST with existing data sets to provide panchromatic information for detected sources
The LINCC Frameworks team is developing the TAPE library to provide a framework for scalable and automated time series analysis at Rubin data scales. Interoperability with the Large Survey DataBase (LSDB) will provide further scalability and functionality, such as catalog cross-matching. The initial use cases driving development are multi-band support for period finding in RR Lyrae stars and structure function calculation for active galactic nuclei and CARMA modeling.
Scalable Faint Object Detection
Asteroids and comets are the remnants of the Solar System’s early assembly. Their history of accretion, collisions, and perturbation by existing and vanished giant planets is preserved in their orbital elements and size distributions. Rubin has the potential to discover the nearest (Near Earth Objects; NEOs) and the most distant (Trans Neptunian Objects; TNOs) asteroid populations, mapping the Solar System in unprecedented detail. The discoveries we expect from this framework include:
- Detection and impact probabilities for ~80% of NEOs with sizes over 140m (the impact of which would cause devastation on a regional scale)
- The discovery of interstellar objects as they pass through the Solar System
- A 10-fold increase in the number of known TNOs, which can elucidate the evolution and origin of the Solar System
- The ability to find objects that originate from the inner Oort Cloud and potentially additional planets beyond 100 au (astronomical units)
The LINCC Frameworks team is currently working to scale the KBMOD algorithm – a shift-and-stack search that finds objects that may not be bright enough to detect in a single image. The goal is to enable the efficient detection of faint objects at Rubin’s data scales. Key technical challenges include:
- Scaling this approach to Rubin’s massive data volume
- Improving sensitivity and accuracy
Comprehensive Photo-Z Infrastructure
A key element in turning 2D images from LSST into a 3D view is determining distances to the astronomical objects studied. Scientifically, this 2D to 3D conversion is essential to map the expansion history of the Universe and the growth of cosmological structure as a function of time, which is the key to understanding dark energy, galaxy formation/evolution, and the physical processes that drive transient and variable phenomena. Photometric redshifts, or photo-zs, use the observed colors of the galaxies (and potentially other information) to produce an estimate of the distance to an object, which can be critically important information to understand the physics that may have given rise to it.
The development of photo-z estimation methods is an active field of research that gives rise to a number of software needs for application to Rubin data. The openly developed Redshift Assessment Infrastructure Layers (RAIL) software library was initiated by the Dark Energy Science Collaboration (DESC) to establish a unified framework for comprehensive photo-z development, validation, and optimization within the scope of cosmological analysis. LINCC Frameworks is working in collaboration or coordination with DESC as appropriate and with members of the other LSST Science Collaborations to extend RAIL to other extragalactic use cases beyond cosmology, including:
- The addition of new photo-z performance metrics;
- Working with Rubin Observatory on the compatibility of RAIL with the Rubin Science Platform to enable key commissioning activities
- Extending the probabilistic representations of photo-z uncertainty, i.e., probability density functions, p(z)
- Improving uncertainty estimation for individual and ensemble photo-z distributions
- Generally investing effort into optimizations that improve the robustness of the software at the LSST scale
Throughout their development work, the Frameworks team watches for common problems and develops cross-project infrastructure that will be beneficial for a wide range of use cases. An example of this is the reusable Python project template (based on the Copier package) that reduces the overhead of setting up new astronomy projects. This template has been used throughout the Framework’s projects to provide baseline productionisation capabilities such as continuous integration testing, style checking, and integration with readthedocs. For more information, see the Tech Talk.