Building a career path for research software engineers

From a $25 archaeologists’ trowel to a multibillion-dollar particle collider, the variety of tools used in scientific research is staggering. But if there’s one scientific instrument common to all disciplines, it’s the computer.

Computer software permeates every stage of the research process, from conducting literature reviews to analyzing data to typesetting journal articles. A 2017 survey of members of the US National Postdoctoral Association found that 95 percent respondents reported using research software.

Yet the ones who code, test, and patch it often lack a defined career path. Research software is typically built by graduate students or postdoctoral researchers who focus on getting their code to work for the job at hand, often at the cost of scalability and sustainability. Critics of this approach say that it slows the advancement of science.

But this is starting to change. The past five years have witnessed the emergence of the research software engineer (RSE) as a distinct role in US universities. Combining software expertise with a deep knowledge of their scientific domains, RSEs are becoming an increasingly vital part of the scientific community.

“It’s not really the role that’s completely new,” says Ian Cosden, the director of Princeton’s Research Software Engineering group “It’s the formality. It’s the awareness. It’s the title.”

The birth of a movement

The title traces its origins to a March 2012 workshop scientists and software engineers at Queen’s College Oxford. Hosted by the Software Sustainability Institute a publicly funded British nonprofit founded two years earlier, the gathering aimed to unite scientists with trained programmers.

A breakout discussion at the workshop raised concerns that academic programmers lacked institutional support, a defined career track, and, crucially, a name. Later that year, five of the discussion’s participants collaborated on a conference paper titled “The Research Software Engineer

The paper struck a chord in the research software community, and the SSI began spearheading a campaign for recognition. The following year, it hosted a gathering that gave rise to a professional association for RSEs, now called the Society of Research Software Engineering Affiliation has been expanding rapidly: In 2013 the society’s Slack workspace had 50 members. In 2018, that number was 1,272; at this time of writing it’s 2,887.

In the past five years, the British RSE society took the movement global, hosting international conferences in Britain 2016 and 2018 that spawned RSE groups in Germany, the Netherlands, the Nordic countries, Australia, New Zealand, and the United States.

Cosden serves on the Steering Committee of the US-RSE Association which he helped found in 2018 and whose Slack workspace now boasts 780 members. He says that he “can’t say enough good things,” about the existence of the UK group. “Knowing they were out there gave me so much confidence that we were on the right path.”

In 2016,  Jeroen Tromp, the director of the Princeton Institute for Computational Science and Engineering (PICSciE), played a key role in creating Princeton’s RSE team, which is now about to hire its 11th member. He says modern research software is far too complex and fragile to be left solely to students or other researchers whose positions are transient.

“RSEs are professional software engineers,” he says,  a professor of geosciences and applied and computational mathematics at Princeton. “They are highly trained individuals with the ability to make transformative contributions to a research effort. They need to be treated and rewarded as such.”

Led by Cosden, Princeton’s RSE group contributes to research projects campuswide including genomics, protein sequencing, hydrology, applied mathematics, and high-energy physics.

“It creates a collaborative supportive environment,” says Tromp. “Not everyone can be an expert in all aspects of research computing, but as a team they collectively cover many topics.”

Princeton’s RSE team is one of a handful of centralized research software groups at US universities. Other schools that have adopted this model include Notre Dame, the University of Chicago, Harvard, MIT, the University of Washington, UC San Diego, and the University of Illinois at Urbana-Champaign. National laboratories like Oak Ridge and Sandia are also home to nascent RSE groups. 

Like many of these other universities, Princeton is also working to formalize its RSE training. The Princeton Institute for Computational Science and Engineering administers a graduate certificate program students wishing to supplement their field of study with a comprehensive instruction in scientific computing. In February, Princeton’s Graduate School approved the certificate as a formal credential, with the first ones being conferred in June.

Since 2018, PICSciE has provided RSEs and scientific programmers as mentors for its computing bootcamps which train grad students and postdocs on computational tools and techniques for research, and its annual GPU Hackathons which bring together experts from industry, academia, and national labs to collaborate on leveraging the speed and efficiency of Graphical Processing Units for research software.

Co-sponsored by PICSciE and Princeton’s Center for Statistics and Machine Learning, the AI for Science Bootcamp will use instructors from NVIDIA, the US company that pioneered GPU technology, to train students on incorporating research AI into GPUs. It will take place online via Zoom on May 18 and 19.  The next GPU hackathon held in collaboration with NVIDIA and Oak Ridge National Laboratory, will run virtually from June 2 to June 10.

“There appears to be an insatiable demand for RSEs,” Tromp says. “My hope is to meet that need. Princeton is far ahead of its peers in this arena, but I suspect others will catch on fast.”

A sustainable future

The RSE movement’s proponents argue that research is best served when its software is developed and sustained over time by trained professionals holding secure jobs. Graduate students and postdocs may contribute, but relying solely on short-term programmers, they say, makes for short-term code.

“As soon as the PhD student leaves, the whole knowledge leaves and they have to start from scratch,” says Sandra Gesing, an associate research professor and computational scientist at the University of Notre Dame and another founding member of the US-RSE group. “That is so inefficient.”

“Software sustainability,” as it’s called, is particularly crucial to high-energy physics, where research projects can span decades. Beginning in 2027, the Large Hadron Collider is set to dramatically boost the amount of data it yields. Researchers expect the collider’s “exabyte era” to extend through the 2030s.

“We needed more structure, and not just grad students typing and then moving on,” says Peter Elmer, executive director and principal investigator for the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) a software institute funded by the National Science Foundation to develop a sustainable cyberinfrastructure to meet high-energy physics’ computational and data-science challenges. “It’s not something we should always be improvising.” IRIS-HEP recently held a workshop on Software Sustainability & High Energy Physics, which led to recommendations for HEP software developers, including around training, software, and people.

Education forms a core element of IRIS-HEP’s mission. Working with the HEP Software Foundation RSEs at the institute have led more than a dozen training events software and computing skills over the past two years for about 1,000 students worldwide. IRIS-HEP also trains PhD students and postdocs at the Computational and Data Science for High Energy Physics school at Princeton, and it connects students and postdocs with mentors through the IRIS-HEP Fellows Program

Advanced software skills are critical for those embarking on a career in high-energy physics, says Ianna Osborne, an RSE for IRIS-HEP at CERN.

“Pretty much everything runs on software,” she says. “We cannot afford to have people who are not engineers in some sense.”

Osborne, who has worked at CERN since 1997, studied physics and computer science at Novosibirsk State University. She says that her work at CERN requires deep knowledge of both domains.

“Knowledge of physics is essential to implementing the software … so that physicists can understand it,” she says. “You also need knowledge of what a computer is, from the high-level code to the assembler down to the hardware.”

Physicists are hardly alone in the need for RSEs. In recent decades, astronomy, genomics, and even the humanities begun relying on more sophisticated data analysis tools. “The research landscape is changing,” says Gesing, whose research focuses on science gateways and interdisciplinary projects that span a variety of fields, including bioinformatics, physics, chemistry, and the social sciences. “We have so much more data and so many more novel instruments.”

Cosden predicts that in the coming years, RSEs will be seen as increasingly essential to science. “I see this as being such a difference maker,” he says. “We’re going to see this environment where researchers who can collaborate with RSEs are going to be able to do things that others are not.”

Gesing, for her part, hopes that the job title will become commonplace. “I hope that someday children when they look for jobs in high school will know what a  research software engineer is,” she says.

Originally published by Eoin O'Carroll at iris-hep.org on May 12, 2021.