Snakemake
Date: 24 - 27 March 2025
Snakemake is a popular open-source tool to create reproducible and scalable data analyses. Workflows are described via a Python-based language that defines steps in the workflow as rules, and these are then used by Snakemake to construct and execute a work plan to yield the desired output. Re-calculation of existing results is avoided where possible, so you can add or update input data, then efficiently generate an updated result. Workflows can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition. A key appeal of workflow systems like Snakemake is that our workflows can be re-used, published and re-mixed as open-source code. We look at WorkflowHub.eu and other on-line resources for workflow sharing, and how we can best prepare our own workflows to be most effectively re-usable. Attendees must have a working knowledge of how to use the Linux BASH command line – our 1-day “Linux for bioinformatics” course is a suitable background.
Keywords: Data analysis, Reproducible, Workflow
Prerequisites:
Learning objectives:
- Best practises to make your code re-usable
- Chaining rules
- Choosing a test dataset
- Cleaning up
- Conda integration
- Configuring workflows
- Constructing a whole new workflow
- Handling awkward programs
- How Snakemake plans what jobs to run
- Optimising workflow performance
- Placeholders and wildcards
- Processing lists of inputs
- Re-using and sharing your workflows
- Running commands with Snakemake
- Source code control and versioning
- Where to find and share workflows on-line
Organizer: Edinburgh Genomics
Target audience: This course is for researchers who need to automate data analysis tasks for biological research involving next-generation sequence data, for example RNA-seq analysis, variant calling, CHIP-Seq, bacterial genome assembly, etc.
Event types:
- Workshops and courses
Activity log