Tom Ellis

Post-doctoral researcher interested in the intersection of quantitative genetics, data analysis and lunch.

Writing README pages for research projects that don't suck

Science is a game of show-and-tell, not hide-and-seek.

Richard McElreath

An important part of any transparant, reproducible research project is that it should be understandable to our colleagues, be they direct collaborators who need to pick up o what you’ve done, other colleagues who want to follow our results, or even ourselves some months in the future. If the work cannot be reproduced, it is little more than garbage taking up space on a harddrive.

In my view an indespensible part of this is ensuring a research project is sufficiently well documented to allow someone else to work out what is going on. The first port of call is to have a clear README folder in the top level of any project folder. This is the first place to look for others (or future you!) understand what the project is about and how to navigate it, or to decide whether the folder can be archived. However, it is rare for research groups to have any agreed guidelines on what information should be included, nor have I ever seen any training courses or online resources mention this. I have certainly written a lot of bad README files in the past, if I wrote one at all, which has repeatedly come back to bite me when I’ve needed to revisit what I’d done later.

That said, I do think I have learned a few things from trying to interpet my own and others’ research projects. In this post I try to distill some of those lessons and give a template for how to put together a solid README file for a research project. Since each project is different, it’s hard to give hard rules, but I hope that this post can serve as a template you can adapt for your own work.

What information do you need to convey?

If someone comes across your project they need to know:

What it is the project about?
What did you do?
What data were collected and where are they?
How have you processed those data and created results?
What information/scripts/packages do they need to reproduce this?
Who did what?

Construct a README file with a section addressing each of these points. Let’s go over each in turn.

Sections of a project README

Introduction

Give an informative title and a short overview (in one or two sentences) of what the project is about here. You can also include links to, for example, a GitHub repository, electronic notebook entries, or a manuscript.

If your project folder is shared on GitHub and the README is in markdown format (i.e. it ends in .md), README files will be rendered into HTML. It is not essential, but if so you can include a table of contents to aid navigation with links to each section. See also the example READMEs given below - they will be rendered by default on GitHub, but click on the README file itself on the folder overview, then on the “Raw” button to see the raw markdown code.

Experimental set up

Give a short overview of what you did. This doesn’t have to be exhaustive - you can link to somewhere the reader can find the details, like an electronic notebook, or other write-up.

Data files

Give an overview of the data/data types you collected, and where to find it. If there is a lot, consider placing one or more separate README file(s) in your data folder that goes into more detail, and mention in your main README where to find the data README.

Dependencies

If someone is going to follow what you’ve done they need to know what packages they need. At a minimum give a list of the important ones with package versions. If you have used something like conda, give an environment.yml file in the top level of your folder. If you use renv in R see the section on [https://rstudio.github.io/renv/articles/renv.html#collaboration collaboration].

Author information

This is often overlooked, but it is actually really useful to include information on who did what. This makes it much easier to work out what was done if someone comes back to it later.

It also helps to include your name and the date at the top of your scripts for the same reason.

License

If any of your could ever be shared, it is a good idea to include a license to state what/how people can use it. See https://choosealicense.com/ for how to choose a license. If you aren’t sure, use the MIT license.

Example README templates

Here are a few examples of projects with READMEs I have written. They differ in scale and complexity, so the READMEs do as well.

2023 1

2023