Data storage, processing, and sharing are ongoing challenges, especially for research facilities. For National Science Foundation-funded (NSF) Major Facilities, processing data captured “at the edge,” sometimes in very remote locations, and disseminating it to their communities can be costly and difficult to maintain. Additionally, each Major Facility (MF) is in a unique environment with different structural and data needs.
In February 2021, CI Compass, the NSF Cyberinfrastructure Center of Excellence that supports and enhances the cyberinfrastructure (CI) of NSF Major Facilities (MFs), started an engagement with two MFs: Seismological Facilities for the Advancement of Geoscience (SAGE) and Geodetic Facility for the Advancement of Geoscience (GAGE). The two facilities joined forces to develop a Common Cloud Platform (CCP) to bring their data streams, long-term repositories, and services to a common platform to enable easy and enhanced access by their communities. The goal of the CI Compass and SAGE/GAGE engagement was to explore options for designing and deploying a cloud platform for cloud-based data processing, storage, and sharing with the scientific communities worldwide.
Many public cloud providers offer off-site options to manage and process data. Given the large amounts of unique data MFs generate - often petabytes - cloud infrastructure is an important option for both primary and backup data storage and processing. However, the choice of which cloud provider to use is not clear. Thus, the “Cloud”-focused working group was formed. The group included members of SAGE, GAGE, and CI Compass.
“One of the major issues for our future cloud platform is determining where it can be deployed in a cost-effective and sufficiently capable environment. We engaged CI Compass to help us understand the current cloud operator landscape and which of the many, many services were good fits for our facility needs. We found this engagement particularly useful as CI Compass members are broadly knowledgeable about the differences between major facility needs versus the more business-oriented customers that commercial clouds are designed for,” said Chad Trabant, project manager for CCP.
Jarek Nabrzyski, co-principal investigator for CI Compass and director of the Center for Research Computing at the University of Notre Dame, led the cloud working group from the CI Compass side. Speaking about the work, Nabrzyski said, “I believe when choosing any cloud solution, it is important to understand the infrastructure needs and how a facility needs to manage data.” He continued, “Understanding the facility’s data collection, data management, and data distribution are all critical to finding potential solutions. We also need to understand potential risks associated with particular cloud providers, and develop solutions, processes, and governance models that will mitigate these risks.”
As a part of the engagement, the working group defined its purpose to analyze public and academic cloud ecosystems, assess the benefits and risks of the major facility migrating its data to the cloud, and focus on risks and mitigations over recommending a specific cloud service.
One of the main concerns from the MFs was the financial impact of egress costs, which many cloud providers carry. While uploading the data into a cloud provider’s space may not incur large costs, downloading it can. Because the research data may be accessed by members of the scientific research community worldwide, working in academic, governmental, and commercial organizations, egress costs apply when these users download large datasets from the cloud. As a result, periods of time with heavy downloads can incur costs up to dozens of thousands of dollars per month. It is important to be able to control these costs.
Reflecting on the different options in cloud structure, Nabrzyski said, “In general, moving to the cloud should be more affordable and effective. By working closely with the two Major Facilities we were able to better understand the challenges of these unpredictable costs in order to propose a more cost-effective and efficient solution.”
The working group took into account data volumes, identity management concerns and operations, container orchestration and software development, and IT operations (DevOps), serverless computing, networking, bulk data and archive transfers, workforce development, cybersecurity compliance, cloud provider energy sources, and more.
In addition, the working group partnered with Internet2, a community with expertise in cloud solutions and research support, in order to further expand its knowledge base for the MF. Internet2 provides technology solutions tailored to the needs of research and education institutions, and its members engage with multiple public cloud providers to support the adoption and use of these services.
“This effort shows the power of different organizations working together across technical disciplines to enable research in the cloud,” said Bob Flynn, program manager for cloud infrastructure and platform services at Internet2.
Through research and conversations with cloud providers, the working group was able to gather data for the CCP project to consider when it came time for its team to make a decision.
“What this working group was charged with was to look at the requirements of the on-premise data centers and identify what cloud providers they should consider and engage with,” Nabrzyski said. “Therefore, through many in-depth and collaborative conversations the Cloud working group was able to compile costs, benefits, challenges, and features that may be desired while migrating to the cloud."
“CI Compass had the privilege to work with SAGE and GAGE at the time they were shaping their future cyberinfrastructure. As a project, we have learned more about the concerns of these Major Facilities and the processes they use for designing long-term solutions. Contributing to this process' success has been very rewarding for our team. I am looking forward to our continued collaboration with SAGE and GAGE, as well as with other NSF Facilities," said Ewa Deelman, PI of CI Compass and a Research Director at the University of Southern California’s Information Sciences Institute.
About CI Compass
CI Compass is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. Its participating research institutions include the University of Southern California, Indiana University, Texas Tech University, the University of North Carolina at Chapel Hill, the University of Notre Dame, and the University of Utah.
To learn more about CI Compass, please visit ci-compass.org.
Christina Clark, Research Communications Specialist
CI Compass / Notre Dame Research / University of Notre Dame
email@example.com / 574.631.2665
ci-compass.org / @cicompass
Originally published by ci-compass.org on January 21, 2022.at