Mini-guide to reproducible Python code

A lot of modern research requires custom software to be written, either to do some calculations, analyse experimental data or something else. Creating good quality, sustainable software is always desirable, but ticking all the boxes that are often described as necessary to accomplish this can be a daunting task for people - researchers - who often have other priorities in mind.

Reproducibility is, however, not an optional feature of a piece of research - including software or otherwise - and that is something that researchers are fully responsible for addressing. Luckily, out of the many requirements of good quality and sustainable software, only a handful are necessary, or can go a long way, to support the reproducibility of the results.

In this post we describe these absolutely essential steps that researchers should take in order to support the reproducibility of their software. The recommendations in this blog post are for software developed using Python. It might not apply to all cases, and it is not fool proof as reproducibility is a really complex business, but it is a good start and will narrow the chances of things going wrong when other people try to use the software.

Building an R package using {fusen}

"Origami" by Andy Atzert is licensed under CC BY 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/2.0/?ref=openverse.

Writing your first full R package can feel overwhelming, but {fusen} can help support at this stage (Even if you are an experienced developer, there is something for you too in this blog. Please read on!). "Fusen" is a type of Japanese origami in which a flat piece of paper, when folded in a specific way and inflated, turns into a nice paper box/balloon. Similarly, the {fusen} package inflates a flat .Rmd template (which is filled in a specific way) and returns a nice package. In this blog post, I am sharing my experience of exploring {fusen} for the first time.

The 30-Day Map Challenge

Map challenge overview

Every year, cartography enthusiasts, geographers, data visualisers, and mapmakers from around the world come together to participate in the 30-Day Map Challenge. Organised by Topi Tjukanov, this event celebrates the art and science of mapmaking. Whether you’re an experienced Geographical Information Systems(GIS) professional or just starting out, this challenge invites you to create and share a map daily, guided by a unique daily theme.

The 30-Day Map Challenge is an inclusive and open-ended initiative that fosters creativity and experimentation in mapmaking. Each day in November presents a different theme, ranging from “Points” and “Lines” to “Fantasy” and “Historical.” Participants are encouraged to interpret these prompts flexibly, utilising any data, tools, or artistic styles that resonate with them.

Git and GitHub for efficient project management and collaboration: a mini-tutorial

Version control is an essential part of software development good practices, especially when combined with an online repository that enables easy collaboration with other people. One of the most common tool combinations is using git for version control and GitHub as the online hosting repository.

Despite being more common nowadays, and despite the long term benefits it brings to the table, much software development done in research environments like universities does not use version control. Maybe this is because researchers do not know about it or because they do not know how to do it right, or they see it more as a burden. This mini-tutorial on Git and, especially, GitHub aims to help users with those first steps and point them in the right direction to learn more about the topic.

An example of a software design pattern for European option pricing using UML diagrams

Design Patterns were first introduced in the seminal book Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, who are collectively known as the "Gang of Four" [Gamma et al. (1994)]. This book forms the foundation of Object-Oriented design theory and practice. A Design Pattern is a general, reusable solution to a commonly occurring problem in software design. These patterns are based on the philosophy of finding standard solutions to common problems in software engineering challenges.

Just as there are standard designs for car engines, like the four-stroke engine, design patterns serve a similar purpose in software. Each pattern provides a tried-and-true solution to a specific problem. Some of these patterns focus on object creation, others on structuring systems of objects, and still others on how objects should communicate.

Hacktoberfest 2024: Bring your own code

The Hacktoberfest Logo

Hacktoberfest is a month-long annual event that encourages people to contribute to open source throughout October. The motivation of Hacktoberfest is to celebrate all things open source, especially the people that make open source so special.

This year the central RSE team at Imperial planned two in-person events during Hacktoberfest on the 1st and the 23rd of October. The events were open to anybody interested in participating in open source software, either coding or with non-code contributions. Everybody was welcome to join the events and bring their own code along for a discussion with the central RSE team members. This included low code and non-code contributions - a big chunk of what surrounds good open source software has nothing to do with code!

RSECon24: Growing a community, building a career

The RSECon24 Logo

It's now a little over 12 years since the term "Research Software Engineering" (RSE) was first coined at an event held in Oxford, UK. Seeing around 400 people gather at the Frederick Douglass Centre at Newcastle University for this year's RSE Conference – RSECon24 – was proof of the amazing growth that we've seen within the community and the wide range of opportunities that now exist to undertake software development work within the research domain. We know that career opportunities within the research technical professionals space are also expanding rapidly but there's still a way to go in recognising and rewarding the people who contribute vital technical skills to support and undertake research. This year's conference offered a wide range of talks, workshops and Birds of a Feather (BoF) sessions, alongside a great opportunity to network with other RSEs and researchers and catch up with friends both old and new. This growth of the community, opportunities for the future and how we can help to support and grow careers, as well as making our community more diverse and inclusive, were some of the topics covered at this year's conference. Several members of the Imperial College London Research Computing Service were in attendance at the conference, as were other RSEs and researchers from departments across the institution. In this blog post, we talk about our experiences, highlights and key takeaways from this year's conference.

Highlights from PyData London 2024

The central RSE team at Imperial recently attended the PyData London 2024, the 10th Anniversary edition. It was an in-person event that brought together data scientists, data engineers, and developers from around the world. This event served as a platform for sharing ideas and learning from one another. In this blog post, we share our highlights from the event, showcasing why it's essential for anyone involved in data science to stay updated and connected with the global community.

With Artificial Intelligence, in particular Large Language Models (LLMs) being a significant topic in the wider world, it was naturally a very significant topic that was seen across the whole conference. Most talks involved some kind of data processing or Machine Learning workflow. Scratching beneath the surface, we found some additional highlights.

Adopting a more rational use of Continuous Integration with GitHub Actions

In the Imperial RSE Team we make extensive use of continuous integration (CI) with GitHub Actions. We use CI to ensure our projects build and are correct across a range of scenarios (OS, python version, dependency version, etc.). Widely accepted wisdom is that it is best practice to catch issues early via frequent and thorough CI rather than to catch them later. This must however be set against the monetary and environment cost of running unnecessary compute workloads on every push to GitHub. In particular, the pricing structure of GitHub Actions means workloads run on Windows and macOS are more costly (certainly financially and presumably environmentally). This is particularly the case for private repositories for which Imperial has a fixed budget of minutes.