Big data's hidden cost

January 19, 2023

High performance computing has transformed how research works and our ability to make previously unthinkable discoveries. But while few people would argue against such progress, it comes with a cost. A few months earlier, Lannelongue had read about a study that equated training artificial intelligence (AI) to the carbon footprint of five cars over their lifetimes. Either way, this takes energy.”Part of the problem, he says, is that computing can feel as if it comes at no cost. “But let’s say you’re at the end of day and you think, ‘Who knows, maybe I could just keep it running overnight.

High performance computing has transformed how research works and our ability to make previously unthinkable discoveries. We’re able to model our future climate with unprecedented accuracy. We’re able to predict what a protein looks like from its genetic code. We even know what a black hole 55 million light-years away looks like.

But while few people would argue against such progress, it comes with a cost.

In 15 years of writing about medical research, I have found myself writing countless stories about genome-wide association studies, where researchers compare the DNA of potentially hundreds of thousands of people – patients and healthy ‘controls’ – to look for genetic variants that increase our risk of developing a particular disease. Never once did I find myself considering the environmental impact of such studies.

It turns out that it can be quite staggering.

Early this year, a team from Cambridge, together with colleagues at the Baker Institute in Melbourne, Australia, published research showing that a genome-wide association study (GWAS) trawling data from 500,000 participants registered to a biobank database would create a carbon footprint of 17.3kg of CO2e (carbon dioxide equivalent) for each genetic trait being studied.

But in fact, researchers would commonly look at thousands of traits. The same GWAS run for 1,000 traits would generate 17.3 tonnes of CO2e. That’s equivalent to 346 flights between Paris and London. (The researchers point out that upgrading the software used to the latest version would reduce this by three-quarters.)

Output from a DNA sequencer (Credit: National Human Genome Research Institute)

Output from a DNA sequencer (Credit: National Human Genome Research Institute)

At the start of 2020, Loic Lannelongue was in the middle of a PhD in health data science at Cambridge’s Department of Public Health and Primary Care. He was a computational biologist, using machine learning to predict how proteins interact in the human body. One of his collaborators was Jason Grealey, an academic based at University of Melbourne, Australia. Lannelongue was watching on the news – and hearing first hand from Grealey – about the bushfires tearing through Australia. This made him reflect on the climate emergency and the part we all play.

A few months earlier, Lannelongue had read about a study that equated training artificial intelligence (AI) to the carbon footprint of five cars over their lifetimes. He began to wonder what the impact of his own work was, and together with Grealey decided to work it out, expecting to find an online calculator that they could just plug their numbers into.

“We started thinking it would be a two week project, a nice break from our PhD research,” says Lannelongue, “just figuring out what the carbon footprint of what we were doing was to get a number and probably tweeting about it. Except there was nothing out there.

"We realised that there was a massive gap, that computational scientists weren’t really thinking about their carbon footprint yet.”

Dr Loic Lannelongue

Since then, with the support of his supervisor, Dr Michael Inouye, Lannelongue has been spending half of his time working on this project, leading to the development of Green Algorithms, a simple online calculator that allows researchers to work out the carbon footprint of their computing work.

This is not the first time the research community has turned the spotlight on its own practices. Some in the community have already been asking questions about the impact of flying across the globe to present their findings at scientific conferences, for example. Others have raised the issue of plastic and chemical waste and energy requirements from so-called ‘wet labs’ – that is, laboratories where experimental work takes place. Computer labs also have a significant impact: equipment needs updating and replacing every few years at a minimum, while even data storage itself requires energy.

2022 CTBT Science Diplomacy Symposium (Credit: The Official CTBTO Photostream)

2022 CTBT Science Diplomacy Symposium (Credit: The Official CTBTO Photostream)

And then there is the computing work itself, of which there is a phenomenal amount these days. To give you an idea of its scale, in 2020, the now-concluded US-based XSEDE (the Extreme Science and Engineering Discovery Environment – a virtual system to allow scientists to share computing resources, data and expertise) alone saw researchers use 9 billion compute hours, or 24 million hours per day.

“For powerful calculations, either you need a lot of cores – you basically plug together a lot of computers and they all do the work for you – or you need a lot of memory. Either way, this takes energy.”

Part of the problem, he says, is that computing can feel as if it comes at no cost. Research groups often have free access to high performance computing (HPC) facilities at their institution.

“When you first arrive as a PhD student, you’re like a kid in a candy store – you basically have unlimited computing power at your fingertips. It’s brilliant and it enables great research, so it definitely shouldn’t stop, but the problem is you just think it’s free.”

Dr Loic Lannelongue

He gives the example of a process in machine learning called hyperparameter tuning, which involves testing different configurations of your model to work out which works best. “You never know when you’ve hit the maximum. It just keeps getting better until at some point, you say, ‘Well I think I’ve made it as good as I can’.

“But let’s say you’re at the end of day and you think, ‘Who knows, maybe I could just keep it running overnight. Maybe I’ll get that extra half a percent of accuracy. It doesn’t cost anything and no one’s using the computers’. But actually, there is a cost – there’s a carbon cost.”

What he wants is not to limit research, but to cut computational waste, “to get people to think: ‘Do I really need to do that? Probably not.’

The source of this news is from University of Cambridge

Popular in Research

Presidential Debate TV Review: Kamala Harris Baits Raging Donald Trump Into His Worst Self In Face-Off

Oct 21, 2024

Impact of social factors on suicide must be recognised

Oct 21, 2024

Print on demand business with Printseekers.com

Sep 6, 2022

The conduct of some Trump supporters is crude, sleazy and...deplorable

Oct 21, 2024

Students learn theater design through the power of play

Oct 21, 2024

MSN

Oct 21, 2024