As the 72nd edition of our newsletter, this month’s edition represents the milestone of 6 full years of publishing our monthly research software community newsletter! Things have changed quite a bit in that period but it’s good to see that the software that we included as Research Software of the Month in our first three editions continued to be updated for several years and to develop a growing following on GitHub - an excellent advert for sustainable research software. While things for many of us are now winding down before the holidays, December has been a busy month and several members of Imperial’s research software community and Research Computing Service attended the annual Computing Insight UK conference in Manchester in early December. Imperial was well-represented in the conference’s student Cluster Challenge with participants from Imperial involved in two of the teams. It was also great to see the conference’s Jacky Pallas Memorial Award being awarded to Lisa Lampunio from Imperial’s Nuclear Engineering Group in Mechanical Engineering - check out the news section below for more details.
The research software community committee would like to wish everyone all the best for the holidays and new year. We hope you have an enjoyable holiday and a chance to take a break. Thanks to everyone for being part of our community - it wouldn’t exist without you!
Don’t forget Advent of Code (AoC)! The competition runs every December with a series of fun and engaging daily challenges for you to test your coding skills with. Participating in AoC is a great way to help you to learn a new programming language or just to refresh, update or enjoy using your skills in a language you’re already familiar with. It’s not too late to get started with AoC 2024.
A third series of the UNIVERSE-HPC project’s “Byte-sized RSE” sessions is underway. The second session in the series which will be on the “Psychology of data visualisation” will take place on Friday 10th January 2025, 13:00-14:30. Registration is now open and you can find out more on the byte-sized RSE page. We have another session on UX Design planned for late January - the date will be confirmed soon. Keep an eye on the byte-sized RSE web page for further details. The first session in the current series covered containers with Podman and took place on Wednesday 27th November. You can check out the companion podcast episode which is now available through the Code for Thought podcast: ByteSized RSE: Fun with Containers - Simon Li.
The Digital Research Infrastructure (DRI) Retreat 2025 will take place in Manchester, Monday 13th-Friday 17th January 2025. In-person attendance for the event is now fully booked but you can still join the retreat online. Each day of the event focuses on a specific theme with panels and discussion covering a range topics.
FOSDEM2025 takes place in Brussels, 1st-2nd February 2025. The conference is free to attend, attracts several thousand attendees and has “developer rooms” focusing on a huge range of topics, programming languages, tools and applications.
deRSE25 - the Conference for Research Software Engineering in Germany will be held from 25th-27th February 2025 at Karlsruhe Institute of Technology, Karlsruhe, Germany. This year’s conference is co-located with the German Software Engineering conference (SE25).
The dates have been set for the 2025 edition of the Software Sustainability Institute’s Collaborations Workshop. It will take place from 13th-15th May 2025. The venue is yet to be announced but you can sign up via the event page to receive updates.
This month, in our series highlighting members of the College community helping to support research computing, we hear from Emily Lumley:
I am working within the Research Computing Service (RCS) as the Research Engagement Lead. In this position, I am engaging with the Imperial community of users and future users of our services and applications, helping people access training and establishing groups that can work together and help each other. I started this role around 6 months ago, so I am still relatively new; I have been using this time to gain a better understanding of the role of RCS within Imperial and our products and services. I have organised several training events and reinstalled the RCS newsletter, which is now published monthly. Through these initiatives, I have gained a better understanding of our current community. I hope to follow up with this work in the new year and get to know more Imperial colleagues, their needs for research computing and how I can bring the community together more cohesively.
My background is in Chemistry (via a BA in Drama!), in which I did a PhD and some smaller research projects in organic chemistry on the border of biology. However, I realised (or was quite brutally told), that my skills were more suited to project management, and this is what I have since pursued. I worked for 5+ years as the project manager for a large European Centre of Excellence (15 partners across 6 countries) in Computational Biomedicine using high-performance computing (HPC). During this time, I learnt a lot of background information and got to know some of the wider community and their concerns. I then moved to Imperial College London as a programme manager for a bioengineering group for two years, before applying and being offered my current position. I consider this my first intentional career move (having moved around a lot); previously, I have enjoyed the community and engagement activities of my roles, and I am fascinated and awed by the work that takes place within the RCS.
If you have a project you want to shout about, a community of practice you are interested in being part of or setting up, or just want more information on our services, please get in contact with me, and don’t forget to subscribe to the RCS newsletter.
This month, our Research Software of the Month is Auto-CORPus:
How do you train an accurate language model? With reliable data. However, not all domains have such data available: this is where Auto-CORPus comes in. Auto-CORPus has been developed since early 2020 through 7 separate MRes/MSc projects and built upon in the time since in a collaboration between Imperial and the Universities of Nottingham and Leicester. It takes in messy (to a machine) documents such as HTML files of papers that are not available as structured XML files, and standardises and structures these to a machine-readable BioC-compliant JSON format. These documents can then be used to train natural language processing algorithms to analyse these.
Auto-CORPus is written in Python and is a command line executable. For each article it returns a cleaned and structured JSON file with the main text where each paragraph is assigned an identifier from the Information Artefact Ontology based on the section of a document it came from (introduction, methods, supplementary details, etc). It is able to process a full-text article in well under a second, and its output has since been used to create text corpora for large-scale omics, create transformer-based entity recognition systems for metabolites and enzymes, and to understand the concepts associated with cognitive ability. It is also the central processing software for AI-assisted literature review work package of the €12M EU-Horizon and UKRI co-funded CoDiet project.
With help of the Open-Source Booster programme, the Auto-CORPus code was revamped, made pip installable (with PyPI distribution on its way), and updated to work with the latest configuration of the PubMed Central website. You can read the publication describing Auto-CORPus v1 here and see the GitHub repository for documentation, Auto-CORPus is available under a GPL-3.0 license.
As highlighted in this month’s intro, several members of our community were in attendance at CIUK 2024. We’d like to congratulate Lisa Lampunio from the Nuclear Engineering Group in the Department of Mechanical Engineering at Imperial on winning this year’s Jacky Pallas Memorial Award. In recognition of the award, winners are given a slot in the main conference programme and Lisa delivered a fascinating presentation entitled “Advanced Modelling and Simulation for the Analysis of Novel Radiation Detection Technology and Thermal Fatigue Phenomena within the Nuclear Energy Sector”, looking at her work to develop improved multiphysics modelling and simulation methods to support the engineering and nuclear energy communities.
Imperial also saw success at the conference in the student Cluster Challenge. 15 teams (each of up to 6 students) participated in the activity, undertaking online challenges in the run up to the conference, followed by a series of challenges undertaken at the conference. A large area of the exhibition space at the conference was set out with tables for each of the cluster challenge teams who worked through challenges set by a group of companies. The winning team, “Decarb”, included team members from both Warwick University and Imperial. Congratulations to the team on winning this year’s challenge.
STEP-UP / RSLondon / Imperial Community get-together
On the 17th December, we held a community get-together at Imperial that was jointly organised by the STEP-UP project, RSLondon, and the Imperial Research Software Community. The event provided a great way to get people together before the holidays, with around 30 attendees joining us from a number of different London institutions. We had an overview of community activities over the last year followed by 4 short talks and time for everyone to chat over food and drinks. We look forward to organising further such events in the new year.
The results of the 2024 Hidden REF competition have been announced! You can watch the video of the awards event and also see the list of awardees on the Hidden REF competition page.
Have you wondered why it’s so difficult to define an RSE career path?! Take a look at this interesting article from a speed blogging session at this year’s Software Sustainability Institute Collaborations Workshop (CW24).
Better scientific software has published a blog post on “Identifying the Foundational Competencies of a Research Software Engineer” written by the teachingRSE project team who were formed out of a session run at the German RSE conference back in 2023 (deRSE23). The blog post provides a short summary of a pre-print article “Foundational Competencies and Responsibilities of a Research Software Engineer” written by the group which has recently been submitted to F1000Research and is currently undergoing peer review.
As part of the Software Sustainability Institute’s recent Research Software Camp on Digital Skills for Research Technical Staff, the team have collated a list of free resources for technical staff: Career development and research software training.
Are you Spanish speaker? Maybe you’re learning and would like a chance to test out your skills in a technical context? Take a look at “Charlas RSE en español” and keep an eye out for details of the next talk which will take place in mid-January.
Are you an RStudio user? Are you aware of Positron and have you thought about switching? Take a look at this blog post: “Positron vs RStudio - is it time to switch?
Sahil Raja from Imperial’s RSE team has written a fascinating blog post “The 30-Day Map Challenge” on his experiences of participating in this annual challenge.
In early December, a “Book Dash” was held for the R Development Guide. You can read about it in the blog post “A Book Dash for the R Development Guide”.
The Imperial Research Software Community Slack workspace is a place for general community discussion as well as featuring channels for individuals interested in particular tools or topics. If you’re an OpenFOAM user, why not join the #OpenFOAM channel where regular code review sessions are announced (amongst other CFD-related discussions…). Users of the Nextflow workflow tool can find other Imperial Nextflow users in #nextflow. You can find other R developers in #r-users and there is the #DeepLearners channel for AI/ML-related questions and discussion. Take a look at the other available channels by clicking the “+” next to “Channels” in the Slack app and selecting “Browse channels”.
If you want to start your own group around a tool, programming language or topic not currently represented, feel free to create a new channel and advertise it in #general.
If you need support with your code, seek no more! The Central RSE Team, within the Research Computing Service is here to help. Have a look at the variety of ways the team can work with you:
All the documentation, tutorials and howtos for using Imperial’s HPC are available in the Imperial RCS User Guide. See also the Research Computing Service’s Research Computing Tips series for a variety of helpful tips for using RCS resources and related tools and services.
Imperial’s Research Software Directory provides details of a range of research software and tools developed by groups and individuals at the College. If you’d like to see your software included in the directory, you can open a pull request in the GitHub repository or get in touch with the Research Software Community Committee.
Drop us a line with anything you’d like included in the newsletter, ideas about how it could be improved, or even offer to guest-edit a future edition! rse-committee@imperial.ac.uk.
If you’re reading this on the web and would like to receive the next newsletter directly to your inbox then please subscribe to our Research Software Community Mailing List.
This issue of the Research Software Community Newsletter was edited by Jeremy Cohen. All previous newsletters are available in our online archive.