Imperial College Research Software Community Newsletter - December 2024

As the 72nd edition of our newsletter, this month’s edition represents the milestone of 6 full years of publishing our monthly research software community newsletter! Things have changed quite a bit in that period but it’s good to see that the software that we included as Research Software of the Month in our first three editions continued to be updated for several years and to develop a growing following on GitHub - an excellent advert for sustainable research software. While things for many of us are now winding down before the holidays, December has been a busy month and several members of Imperial’s research software community and Research Computing Service attended the annual Computing Insight UK conference in Manchester in early December. Imperial was well-represented in the conference’s student Cluster Challenge with participants from Imperial involved in two of the teams. It was also great to see the conference’s Jacky Pallas Memorial Award being awarded to Lisa Lampunio from Imperial’s Nuclear Engineering Group in Mechanical Engineering - check out the news section below for more details.

The research software community committee would like to wish everyone all the best for the holidays and new year. We hope you have an enjoyable holiday and a chance to take a break. Thanks to everyone for being part of our community - it wouldn’t exist without you!

Dates for your diary

Research Computing at Imperial

This month, in our series highlighting members of the College community helping to support research computing, we hear from Emily Lumley:

I am working within the Research Computing Service (RCS) as the Research Engagement Lead. In this position, I am engaging with the Imperial community of users and future users of our services and applications, helping people access training and establishing groups that can work together and help each other. I started this role around 6 months ago, so I am still relatively new; I have been using this time to gain a better understanding of the role of RCS within Imperial and our products and services. I have organised several training events and reinstalled the RCS newsletter, which is now published monthly. Through these initiatives, I have gained a better understanding of our current community. I hope to follow up with this work in the new year and get to know more Imperial colleagues, their needs for research computing and how I can bring the community together more cohesively.

My background is in Chemistry (via a BA in Drama!), in which I did a PhD and some smaller research projects in organic chemistry on the border of biology. However, I realised (or was quite brutally told), that my skills were more suited to project management, and this is what I have since pursued. I worked for 5+ years as the project manager for a large European Centre of Excellence (15 partners across 6 countries) in Computational Biomedicine using high-performance computing (HPC). During this time, I learnt a lot of background information and got to know some of the wider community and their concerns. I then moved to Imperial College London as a programme manager for a bioengineering group for two years, before applying and being offered my current position. I consider this my first intentional career move (having moved around a lot); previously, I have enjoyed the community and engagement activities of my roles, and I am fascinated and awed by the work that takes place within the RCS.

If you have a project you want to shout about, a community of practice you are interested in being part of or setting up, or just want more information on our services, please get in contact with me, and don’t forget to subscribe to the RCS newsletter.

Research Software of the Month

This month, our Research Software of the Month is Auto-CORPus:

How do you train an accurate language model? With reliable data. However, not all domains have such data available: this is where Auto-CORPus comes in. Auto-CORPus has been developed since early 2020 through 7 separate MRes/MSc projects and built upon in the time since in a collaboration between Imperial and the Universities of Nottingham and Leicester. It takes in messy (to a machine) documents such as HTML files of papers that are not available as structured XML files, and standardises and structures these to a machine-readable BioC-compliant JSON format. These documents can then be used to train natural language processing algorithms to analyse these.

Auto-CORPus is written in Python and is a command line executable. For each article it returns a cleaned and structured JSON file with the main text where each paragraph is assigned an identifier from the Information Artefact Ontology based on the section of a document it came from (introduction, methods, supplementary details, etc). It is able to process a full-text article in well under a second, and its output has since been used to create text corpora for large-scale omics, create transformer-based entity recognition systems for metabolites and enzymes, and to understand the concepts associated with cognitive ability. It is also the central processing software for AI-assisted literature review work package of the €12M EU-Horizon and UKRI co-funded CoDiet project.

With help of the Open-Source Booster programme, the Auto-CORPus code was revamped, made pip installable (with PyPI distribution on its way), and updated to work with the latest configuration of the PubMed Central website. You can read the publication describing Auto-CORPus v1 here and see the GitHub repository for documentation, Auto-CORPus is available under a GPL-3.0 license.

RSE Bytes

News

Computing Insight UK 2024

As highlighted in this month’s intro, several members of our community were in attendance at CIUK 2024. We’d like to congratulate Lisa Lampunio from the Nuclear Engineering Group in the Department of Mechanical Engineering at Imperial on winning this year’s Jacky Pallas Memorial Award. In recognition of the award, winners are given a slot in the main conference programme and Lisa delivered a fascinating presentation entitled “Advanced Modelling and Simulation for the Analysis of Novel Radiation Detection Technology and Thermal Fatigue Phenomena within the Nuclear Energy Sector”, looking at her work to develop improved multiphysics modelling and simulation methods to support the engineering and nuclear energy communities.

Imperial also saw success at the conference in the student Cluster Challenge. 15 teams (each of up to 6 students) participated in the activity, undertaking online challenges in the run up to the conference, followed by a series of challenges undertaken at the conference. A large area of the exhibition space at the conference was set out with tables for each of the cluster challenge teams who worked through challenges set by a group of companies. The winning team, “Decarb”, included team members from both Warwick University and Imperial. Congratulations to the team on winning this year’s challenge.

STEP-UP / RSLondon / Imperial Community get-together

On the 17th December, we held a community get-together at Imperial that was jointly organised by the STEP-UP project, RSLondon, and the Imperial Research Software Community. The event provided a great way to get people together before the holidays, with around 30 attendees joining us from a number of different London institutions. We had an overview of community activities over the last year followed by 4 short talks and time for everyone to chat over food and drinks. We look forward to organising further such events in the new year.

Hidden REF Awards 2024

The results of the 2024 Hidden REF competition have been announced! You can watch the video of the awards event and also see the list of awardees on the Hidden REF competition page.

Blog posts, tools & more

Some reminders…

RS Community Slack

The Imperial Research Software Community Slack workspace is a place for general community discussion as well as featuring channels for individuals interested in particular tools or topics. If you’re an OpenFOAM user, why not join the #OpenFOAM channel where regular code review sessions are announced (amongst other CFD-related discussions…). Users of the Nextflow workflow tool can find other Imperial Nextflow users in #nextflow. You can find other R developers in #r-users and there is the #DeepLearners channel for AI/ML-related questions and discussion. Take a look at the other available channels by clicking the “+” next to “Channels” in the Slack app and selecting “Browse channels”.

If you want to start your own group around a tool, programming language or topic not currently represented, feel free to create a new channel and advertise it in #general.

Research Software Engineering support

If you need support with your code, seek no more! The Central RSE Team, within the Research Computing Service is here to help. Have a look at the variety of ways the team can work with you:

HPC documentation and tips

All the documentation, tutorials and howtos for using Imperial’s HPC are available in the Imperial RCS User Guide. See also the Research Computing Service’s Research Computing Tips series for a variety of helpful tips for using RCS resources and related tools and services.

Research Software Directory

Imperial’s Research Software Directory provides details of a range of research software and tools developed by groups and individuals at the College. If you’d like to see your software included in the directory, you can open a pull request in the GitHub repository or get in touch with the Research Software Community Committee.

Get in Touch, Get Involved!

Drop us a line with anything you’d like included in the newsletter, ideas about how it could be improved, or even offer to guest-edit a future edition! rse-committee@imperial.ac.uk.

If you’re reading this on the web and would like to receive the next newsletter directly to your inbox then please subscribe to our Research Software Community Mailing List.


This issue of the Research Software Community Newsletter was edited by Jeremy Cohen. All previous newsletters are available in our online archive.