Hello everyone and welcome to this month’s edition of the Imperial College Research Software Community Newsletter. With summer underway and scorching temperatures, it’s natural to feel the desire to step away from the computer screen for a while.
So, how about connecting with fellow members of the larger London Research Software community at the upcoming RSLondonSouthEast 2023 workshop? It’s a great chance to expand your network and exchange ideas. Or, alternatively, why not explore how Machine Learning works by creating a functional system, all without the need for a computer? Well, at least if you have a number of matchboxes around…
Has your curiosity been tickled? These are just a taste of the events, news, and highlights featured in this issue. So, grab a refreshing drink, find a comfortable spot, and immerse yourself in the content that awaits you.
Last call for submitting workshop abstracts for the “Sustainable RSE ecosystems within eScience” RSEs in e-Science Workshop. The deadline is Friday 30th June 2023. The workshop is part of the IEEE eScience 2023 conference, taking place in Limassol, Cyprus from 9th-13th October 2023.
The hidden REF Festival will take place in Bristol on 21st September 2023. General registration will open on 3rd July 2023 but you can register your interest in a free ticket now.
The deadline to register for in-person attendance to the RSECon23, the 7th Annual Conference for Research Software Engineering, is Thursday 6th July. The conference will take place in Swansea on 5th-7th September 2023. The event anticipates the participation of over 300 RSEs, researchers, and research technology professionals, providing a prime opportunity for networking and collaboration.
Registration is open for the RSLondonSouthEast 2023 workshop, the annual workshop of the RSLondon regional research software community. The workshop will take place at Imperial College on Monday 17th July 2023.
On 27th September, 2023 the Royal College of Physicians of Edinburgh will host the one-day conference The Scotsman Data Conference 2023. It is free for both in-person and online attendance. More details about the event and the registration are available on the website.
In this month’s edition of our Research Computing at Imperial segment, we are delighted to introduce two more members of the Research Software Champions team. As mentioned previously, these Champions are actively involved in a project dedicated to fostering a thriving research software culture. The project is also developing an updated Research Software Directory to promote the software that is developed at Imperial.
Hubert Mohr-Daurat:
I am a second-year PhD student in the Department of Computing in the Large-Scale Data and Systems Group. My interest is in designing systems to extend the scope of applications for database technologies to a more extensive range of data processing applications. I have been working on a data management system that can store and efficiently execute data imputation (e.g., for data cleaning in ML pipelines) and on CPU/GPU co-processing by the composition of existing systems and efficient data & query exchange format.
I spent much time coding in C++ when developing these systems. As a former programmer in the video game industry, I have been working in large teams on large codebases and learned how caring about code quality demands time and effort but is rewarding in the long term.
Code reliability, availability and reproducibility are essential in research but should not impede too much the research work. I believe that knowing the right tools and applying good practices, such as coding rules, code reviews, unit tests, benchmarks and static analysis, helps minimize this effort. This is why I joined the project as a Research Software Champion: we all have our own experience and resources. I want to share my knowledge about good software practices and hope to learn from others to grow our research software culture together.
Anthony Onwuli:
I am a 3rd year PhD student in the Department of Materials. I am a member of the Walsh Materials Design Group (https://wmd-group.github.io/). My research focuses on computational materials discovery by trying to develop ways in which we can find new materials through machine learning and first-principles quantum chemistry calculations (i.e., Density Functional Theory).
My experience with research software has primarily been taking over the development and maintenance of one of our group codes, SMACT (https://github.com/WMD-group/SMACT). This code enables us to generate compositions through chemical heuristics and combinatorics and more recently has been expanded to enable one to assign a crystal structure to the generated compositions based on chemical similarity with databases of known materials.
The Research Software Champions scheme has provided a chance to delve deeper into the field of research software problems.
This month we thought we’d highlight an open source research tool that is not linked to Imperial but which we found very interesting and thought the community might be interested to check out: MapReader, a free, open-source software library written in Python for analysing large map collections.
According to the project’s GitHub repository:
“MapReader was developed in the Living with Machines project to analyze large collections of historical maps but is a generalizable computer vision pipeline which can be applied to any images in a wide variety of domains.”
“The MapReader pipeline consists of a linear sequence of tasks which, together, can be used to train a computer vision (CV) classifier to recognise visual features within maps and identify patches containing these features across entire map collections.”
MapReader allows users with little computer vision expertise to
The authors provide extensive documentation, including a section on input guidance that is of paramount importance when dealing with this type of technology. They also include details about project members, maintainer, and ways to collaborate. The Living with Machines project, through which the tool has been developed, is funded by UK Research and Innovation (UKRI).
The tool is released under a MIT License.
On the occasion of Pride Month, US-RSE published four “compelling stories about LGBTQ+ people who have been involved in computing, science, engineering, and/or math, and have inspired our members through their accomplishments in their careers and their personal stories”. Among them Sophie Wilson, who co-developed the ARM processor in Cambridge, in 1985.
After 8 events, the first Byte-sized RSE series has now come to an end. Run by the UNIVERSE-HPC project, byte-sized RSE began in October 2022 and will hopefully continue with a second series of sessions later in 2023 “providing key research software skills in just 1 hour!”. If you would like to run a byte-sized RSE session in the next series, you can contact the Imperial RSE committee.
The Netherlands eScience Center recently released Job Profile and Role Description for Research Software Engineers. The documents outline the centre’s definition of an RSE and detail their skills and responsibilities. The centre is sharing them through Zenodo for the benefit of other organisations.
Ispace, a Japanese lunar exploration company, conducted a thorough review and completed the analysis of flight data from its HAKUTO-R Mission 1 lunar landing sequence. The resulting public report highlights that the cause of the lander’s failure to make a soft landing was due to the software. Summarising, the landing site was modified after critical design review completed, without the possibility to update verification and validation plans accordingly.
London will host the London Data Week 2023, 3-9 July 2023, “a citywide festival about data to learn, create, discuss, and explore how to use data to shape our city for the better”. There are many events planned, including the PyData London Meetup #75 on 4th July 2023.
In terms of data visualisation aimed at a general audience, it is difficult to find something more compelling than the white cliffs of Dover illuminated with “Climate Stripes” on the 21st June 2023. A very powerful reminder of climate change. The “Climate Stripes” were created by Prof Ed Hawkins, a climate scientist at the University of Reading, in 2018. They show the progressive heating of the planet in a single image. You can generate the stripes for the whole planet or specific places on earth on the #ShowYourStripes website.
Code for Thought hosted the last (for now) episode of the byte-sized RSE companion podcast. Julian Lenz from the University in Swansea, UK talks about the importance of README files. This month there are also episodes about Agil in der Wissenschaft (in German) and Qubits and Qugates - with Oliver Brown. Finally, a number of interviews in the context of the Conference Report: JupyterCon 2023, Paris.
In 1961, artificial intelligence researcher Donald Michie designed MENACE, the Matchbox Educable Noughts and Crosses Engine. It was designed to play human opponents in games of noughts and crosses (tic-tac-toe) by using reinforcement learning. Lacking proper computational resources, Michie created the prototype out of 304 matchboxes. You can read about MENACE on Wikipedia and watch it recreated and run at the Museum of Science and Industry, Manchester. If you don’t have enough matchboxes at hand, you can play with a more convenient online simulation. The approach doesn’t scale well and it’s impractical for more complex problems or games, still it is quite amazing how clearly it conveys the concept of reinforcement learning.
sed
is one of those widely used Unix tools that can be challenging to handle at times, but it offers exceptional flexibility in processing text data. While you can discover numerous online guides, if you’re looking for a quick reference or a collection of handy sed oneliners, take a look at this script.
Are you using Bash and Bash scripts on a daily basis, on your computer or an HPC cluster? Maybe you would find it useful to know how to Use Bash Strict Mode (Unless You Love Debugging). and to learn some Control, Escape, and Meta Tricks.
If you are new to coding and you would like an introductory explanation of coding standards, you can check this Guide to Coding Standards to Improve Code Quality. They are general enough to be applicable to any programming language.
On the other hand, if you are already at an intermediate level as a research software developer, the Software Sustainability Institute already covered the important topic of software testing with three interesting guides: An introduction to unit testing, Scaling up unit testing using parameterisation, and Automating unit testing with Continuous Integration.
Are you working on a new Python library? You may be interested in a blog post about designing Pythonic library APIs by Ben Hoyt, an engineering manager and software engineer at Canonical.
If you are working with MATLAB instead, and planning to release a new toolbox, Mathworks provides MATLAB Toolbox Best Practices to provide some information on how to do this properly. In general, it describes some interesting resources about unit testing and CI/CD.
In the context of the Software Sustainability Institute’s Research Software Camp: FAIR software 2023, Eirini Zormpa, Community Manager for the AI for Multiple Long-Term Conditions Research Support Facility (AIM RSF) at the Alan Turing Institute, shared the slides of her talk about How to publish FAIR research outputs.
The Imperial Research Software Community Slack workspace is a place for general community discussion as well as featuring channels for individuals interested in particular tools or topics. If you’re an OpenFOAM user, why not join the #OpenFOAM channel where regular code review sessions are announced (amongst other CFD-related discussions…). Users of the Nextflow workflow tool can find other Imperial Nextflow users in #nextflow. You can find other R developers in #r-users and there is the #DeepLearners channel for AI/ML-related questions and discussion. Take a look at the other available channels by clicking the “+” next to “Channels” in the Slack app and selecting “Browse channels”.
If you want to start your own group around a tool, programming language or topic not currently represented, feel free to create a new channel and advertise it in #general.
If you need support with your code, seek no more! The Central RSE Team, within the Research Computing Service is here to help. Have a look at the variety of ways the team can work with you:
All the documentation, tutorials and howtos for using Imperial’s HPC are available in the HPC Wiki pages. See also the Research Computing Service’s Research Computing Tips series for a variety of helpful tips for using RCS resources and related tools and services.
Imperial’s Research Software Directory provides details of a range of research software and tools developed by groups and individuals at the College. If you’d like to see your software included in the directory, you can open a pull request in the GitHub repository or get in touch with the Research Software Community Committee.
Drop us a line with anything you’d like included in the newsletter, ideas about how it could be improved, or even offer to guest-edit a future edition! rse-committee@imperial.ac.uk.
If you’re reading this on the web and would like to receive the next newsletter directly to your inbox then please subscribe to our Research Software Community Mailing List.
This issue of the Research Software Community Newsletter was edited by Stefano Galvan. All previous newsletters are available in our online archive.