August 11, 2022
  • August 11, 2022
  • Home
  • Hardware stuff
  • Dell Technologies Interview: Getting ‘more science per pound’ at Durham University’s COSMA HPC service

Dell Technologies Interview: Getting ‘more science per pound’ at Durham University’s COSMA HPC service

By on July 7, 2022 0

[SPONSORED CONTENT] In this interview with Dr. Alastair Basden of Durham University in the UK, he discusses the latest activities of the university’s COSMA HPC service as it tests and integrates new high-performance technologies on their way to exascale. Dell Technologies’ HPC and AI Center of Excellence, the organization is committed to generating “more science per pound” from its memory-intensive HPC infrastructure, Basden said, while also updating us on the cosmological work of scientists, including by filling the remaining gaps. in the Big Bang theory.

Doug Black: Hello everyone, I’m Doug Black, editor at insideHPC, and today in our series of interviews with Dell Technologies, we’re with Dr. Alastair Basden, he’s COSMA HPC Services Manager at the Durham University in the UK. COSMA stands for ‘cosmology machine’, and cosmology is the science of the origin and development of the universe, the Big Bang theory and all that. COSMA HPC services in Durham include a farm of HPC systems and the facility is a Dell Technologies HPC and AI Center of Excellence. Welcome Alistair.

Alastair Basden: Thanks.

Black: I understand that Durham is on the exascale path and you are continually testing new technologies. Tell us about some of the technologies you’re considering as you move toward this HPC milestone.

Basden: We test some things that are either new technologies or that we are on the cutting edge of. We are still very interested in CPU technologies, we were one of the first HPC installs a few years ago to get AMD EPYC chips. And we have advanced “Milan” chips (AMD)… We’re also very interested in HPC fabrics, so we’re testing with Rockport Networks right now, which is a 16-core network switchless fabric. So very soon we will be upgrading half of one of our clusters to use the Rockport network. We also look at bluefield technologies and DPU technologies, and see how this can improve our science codex.

A lot of our investigations are really about how we can get more science out of the machines that we have, or, when we’re designing future machines, really how we can get the most value for money in terms of science per pound . And we’re also interested in carbon savings, so we have a water-cooled system, our latest system is on-chip, direct liquid-cooled.

COSMA (credit: University of Durham)

And we’re also very interested in composability. So we have a composable GPU system. This allows us to have a number of physical GPUs, which we can then, with one click, simply move them between different servers. So if there’s a job going on that requires a few GPUs, we can provide them. At the same time, there may be jobs that don’t need GPUs, so they can run on empty servers. And then at some point we might have a number of tasks that will require a set of GPUs, we just distribute them evenly across that system – that sort of thing. We are therefore also interested in this type of technology.

Black: OK! Great. Please keep us updated on the cosmology work you and the organization are doing and the role of the CHP in the future.

Basden: Our facility is funded by one of the research councils in the UK, and the research mandate of that council is in cosmology, particle physics, astronomy, nuclear physics, black holes and all that kind of things. So our system here in Durham is what we call a memory-intensive service. It has a large amount of memory per core, our current system has one terabyte of RAM per computer. And what we’re looking to do with that, one of the major workloads that we run is cosmology simulation. We have a simulation to start with the Big Bang in the universe and propagate it through time. And adjusting different input parameters and different models – things like dark matter and dark energy and so on. – we try to match what we get in simulation with what astronomers see in telescopes in real life. By doing a lot of statistical analysis, we are then able to fine-tune the input parameters of the models to get a better idea of ​​how the universe is made.

Black: Out of curiosity, would you say that as your work progresses, is the Big Bang theory more and more validated or are there holes in this theory?

Basden: It’s not that there are holes in it. It’s validated. It’s just pieces that we don’t understand. There are always these strangers. We don’t really understand what 50-75% of the universe is made of, this dark matter. We don’t really understand what it is, where it comes from, how it interacts, so it’s knowing more and more about the universe, that’s the key.

Black: Okay, so the premise is valid, but it completes the picture, I see. Now can you give us a profile of the Dell server and cluster technologies you have implemented?

Basden: So we currently have two generations of systems that are supplied by Dell. We have a system that was installed in 2018, … series servers (Dell EMC PowerEdge C6420). And that happens for 500 knots of that. Then our most recent system was installed last year, in 2021, it is, again, similar servers, four servers in 2U. And it is Dell C6525 series (servers) with AMD EPYC chips inside. These are the ones with one terabyte of RAM per node.

We also have a smaller test cluster of 24 nodes, which we use to test new technologies. So that’s where the Bluefield testing is done, that’s where the Rockport (Networks) testing took place. And we’re looking at converting half of COSMA to this Rockport thing. So in terms of Dell kit they provide all the service for that, we also use Dell hardware for our storage. So we have multi-petabyte Luster file systems, which, again, run on Dell hardware.

Black: Ok, and tell us about your Center of Excellence partnership with Dell and the value of it to the organization and (to) the development of your HPC infrastructure.

Basden: One of the key things that (the partnership) gives us is access to Dell engineers, which allows us to get an idea of ​​what’s to come. That’s one of the reasons we were involved with Rockport before they went public. (Dell) put us in touch with Rockport through this Center of Excellence. And then we were able to start testing this type of kit. So it really gives us an early look at new technologies, interesting new technologies, things that are useful to us. And by doing that, we’re able to (give) feedback to Dell on how useful this kind of kit is and where we’d like future technologies to take us. So it’s a two-way street with benefits for both parties.

Black: OK. Well, it was a pleasure. We caught up with Dr Alistair Basden from COSMA HPC at Durham University in the UK. Alistair, thank you very much.