Tom Ellis - Keeping track of results

A lot of the research I have done could be described as “exploratory”, where there is no specific hypothesis being tested but we expected to find something interesting (a more cynical observer might describe this as a “fishing expedition”). That means generating lots of results, but expecting most to either not show anything interesting, or to demand refinement or raise additional questions. However, I do want to keep track of the things that ‘haven’t worked’.

My approach to this is to try and conceptually divide questions into individual, bite-sized results, and ensure I have a dedicated results subfolder inside my main results directory. I order results chronologically. Inside each result subdirectory I have a file called result_summary.md where I summarise what I did and what I found. The goal is that this should be really fast to parse.

Below is an example results summary file. You can find examples from real projects here or here.

# Plot the decay in LD over short distances

**Date:** 5th May 2022
**Author:** Tom Ellis

## Background

Linkage disequlibrium after one round of random mating ought
to be half that of parents.
Compare the decay of LD over short distances.

## What did you do?

- `01_get_LD.sh` calculates LD in plink.
- `02_plot_LD.R` plots the results

## Main conclusion

LD starts lower and decays faster in the progeny than in the
parents.
At distances of 100kb LD levels out at about
    r2=0.07 in the parents
    r2=0.05 in the progeny
So there is less of a difference than one would expect.

## Caveats

LD is lower than the parents even within 100bp.
I suspect something has gone wrong with the imputation.

## Follow-up

- Compare this with the non-imputed data.
- Calculate LD as the D statistic instead of r2

There are five sections:

Background: What is the question, and how does this relate to previous results?
What did you do?: A summary of the scripts used to get the results. This can be brief, because the scripts themselves should contain more detail.
Main conclusion: A high-level summary of the result.
Caveats: Are there any caveats to be aware of about the data, code or results?
Follow-up: Do the current results raise any additional questions?

In particular, the caveats and follow-up sections are good to write at the time you do the work, because you will most likely forget this, and future-you will thank you.