Content from 00-introduction


Last updated on 2024-03-11 | Edit this page

Introduction


This workshop is designed for a beginner with little to now

By the end of this workshop you will have:

  • Created 16 contianers.

  • Executed 12+ commands with containers.

  • Build 2 custom containers.

  • Build 2 websites and viewed them with containers.

  • Run the most common singularity commands (pull, build, exec)

  • Run the exec command command with bind mounts to access data.

  • Encounter 3 gotchas and discuss solutions to deal with them.

  • Gotcha 1: Repositories are managed by other users.

  • Solution 1: Keep good documentiaton on how your software was built.

  • Gotcha 2: Cant mount the current working directory.

  • Solution 2: Try the -B $PWD flag.

  • Gotcha 3: Cant mount the the directory with the data.

  • Solution 3: Try the -B : flag.

Content from 01-intro-to-computers


Last updated on 2024-03-11 | Edit this page

A collection of computer parts including disks, ram, cpu
parts example

Computer Component Review


  • CPU: CPUs are the data process unit, they are composed of multiple cores. For legacy reasons software often refers the number of cores as the number of CPUs, so yeah that is confusing.

  • RAM (a.k.a MEMORY): RAM is fast digital storage. Most programs utilize RAM for access to data needed more than once. RAM is generally non-persistent when the powered off RAM memory is lost.

  • DISK: Disk is persistent digital storage that is not as fast as RAM. Disk storage can be made up of one or more disks such as hard drives (HDD) and/or Solid State Harddrives (SSD). Multiple disk can be configured together for increased performance and drive failure protection.

  • NETWORKING: Switches and network access cards within computers allow for computers to be networked together.

  • GPU: A Graphics Processing Unit (GPU) is a computer component that is capable of rendering graphics, these are also useful for conducting certain mathematical calculations.

Consumer Computer vs Servers vs HPC vs Sumhpc


Component Home/Busines Computer Server Typical Individual Node in HPC Typical Total HPC System Individual Node on Sumhpc Total Sumhpc System
CPU (cores) 4 - 8 12 - 128 32 - 128 1000s 70* 7,000
RAM(GB) 8 -16 64 - 960 240 - 3000 64,000 754 - 3TB 76.8 TB
DISK (TB) .5 - 1 TB 8 - 100 None - 1 TB 100s (Networked) NA 2.7 PB
Networking (Gbe) .1 - 1 1 - 10 40 - 100 40 - 100 40 40 +

Computer Ports


A port is a communication endpoint.

Introduction to OS, Virtual Machines and Containers


Why can’t I use Docker? Docker images are not secure because they allow users to gain root access to the compute nodes. Singularity effectively runs as the user running the command and does not result in elevated access. Also, docker interacts with the slurm job scheduler in a way that causes resource requests and usages to not match up, making it difficult to keep job queueing fair for all users. In that the clusters are multi-user systems, we want to make sure people can work without worry that others are accessing their data or unfairly using up resources.

Important notes on how they relate to singularity


-CPU There are 2 common CPU architectures in modern systems x86_64 and ARM. Singularity containers are architecture specific.
--arch string architecture to pull from library (default "amd64")

Content from 02-singularity-commands


Last updated on 2024-03-11 | Edit this page

Fequently used singularity commands

  • singularity pull

  • singularity shell

  • singularity cache

  • singularity build

  • singularity exec

These hands on tutorial are being conducted on a base Centos7 OS VM with the following programs installed.

  • apptainer (including the alias singularity)

  • nano

  • wget

  • unzip

The following ports have been made available to the system:

  • 8080,8787,8789

The following folders have been setup on the remote system:

  • /projects/my-lab
  • /flashscratch
  • /workshop_data

Content from 03-singularity-pull-hello-world


Last updated on 2024-03-11 | Edit this page

The Singularity Pull command

The most basic command:
singularity pull <URL>

Can specify a new container (*.sif) name:
singularity pull <New_container_Name> <URL>

Example (dont run yet):
singularity pull docker://rocker/tidyverse:4.2.1

Example making a new name (dont run yet):
singularity pull awesome_container.sif docker://rocker/tidyverse:4.2.1

Tags

Singularity and docker uses tags, these can be used to access specific versions or distributions of the softwares.

Of note the images and tags can change inside the repo, so for complete repoducibility purposes you may want to retain the software or build from source.

Singularity can pull from 5 types of URIs

  • library : Pull an image from the currently configured library. See here for configuring a libary.
    library://user/collection/container:tag

  • docker : Pull a Docker/OCI image from Docker Hub, or another OCI registry. OCI stands for open container registry.
    docker://user/image:tag

  • shub : Pull an image from Singularity Hub
    shub://user/image:tag

  • oras : Pull a SIF image from an OCI registry that supports ORAS. GCP artifcact registry supports.
    oras://registry/namespace/image:tag

  • http, https : Pull an image using the http(s?) protocol
    https://library.sylabs.io/v1/imagefile/library/default/alpine:latest

A note on security and saftey

Only pull from trusted sources. Generally sepeaking containers provided by trusted orginizations or the developer are ideal sources.

Dockerhub shows how many pulls each container had had as well as Dockerhub provided tags: Docker Official Image, Verified Publisher, and Sponsored OSS

Lets take look a some dockerhub docker images sources that we will be using today.

  • staphb
    https://hub.docker.com/search?q=staphb&image_filter=open_source

  • biocontainers
    https://hub.docker.com/r/biocontainers/biocontainers

Now lets look at a non-standard example of a program called Busco that can be used for Phylogenetic analysis as well as assessing genomic data quality (assemblies).

The home page:
https://busco.ezlab.org/

The User Guide:
https://busco.ezlab.org/busco_userguide.html

Here we see where to access the docker image and the directions to run it with docker (but not sinuglarity can we get it to work? we will see later).
https://busco.ezlab.org/busco_userguide.html#docker-image

Hello World Example

Verify R in not on system

Try to start R on they command line by typing R.

BASH

R --help

We get an error because R is not available.

OUTPUT

-bash: R: command not found

Lets pull a R containter from Docker Hub

BASH

singularity pull docker://rocker/tidyverse:4.2.1

Singularity builds the image. Singularity copies the OCI blobs/layers to the local cache then builds the image (SIF file).

OUTPUT

INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 7917df3ef3d8 done
Copying blob 270b4100b33a done
Copying blob 56e0351b9876 done
Copying blob 2faad9a83b09 done
Copying blob 81c9ee1c97bb done
Copying blob d518d22d5d29 done
Copying blob 27c3a6114c0b done
Copying blob 58e0d5c15b4e done
Copying blob 3ccbc1cfa6d1 done
Copying config 1bec811255 done
Writing manifest to image destination
Storing signatures
2023/07/31 02:43:50  info unpack layer: sha256:56e0351b98767487b3c411034be95479ed1710bb6be860db6df0be3a98653027
2023/07/31 02:43:50  info unpack layer: sha256:270b4100b33a95ddd4b4e0d4cce9c4a262eaf5043a4d6a33a82fc71224e7f857
2023/07/31 02:43:50  info unpack layer: sha256:2faad9a83b09e8155e7084ed53957d556333d8c78dbd66288dda084362d9a8a0
2023/07/31 02:43:56  info unpack layer: sha256:d518d22d5d29e561be6588568fd73aff10b6e658a3a3a9e8e98c0470e1b21a8a
2023/07/31 02:43:56  info unpack layer: sha256:81c9ee1c97bb79e966a4ea76644eb05ebc6b72f67dfdccb9e8f4bce3190cdd0a
2023/07/31 02:43:57  info unpack layer: sha256:7917df3ef3d8605361342bc11f7d527ebb4fea3f95704bb6b72e6a4f043faa6d
2023/07/31 02:44:11  info unpack layer: sha256:27c3a6114c0bacba4ceb4e0523ee67bfcc5bec7f7824247b6578cdcb629f4978
2023/07/31 02:44:11  info unpack layer: sha256:58e0d5c15b4e6c88ede882864475388b1479a3d81c1b4060aeb919a3a3b5f322
2023/07/31 02:44:11  info unpack layer: sha256:3ccbc1cfa6d1cbc33689c9e7c2ebcafcb0af4f895b38c84363f57417e6fbb7cb
INFO:    Creating SIF file...

Note when pulling the image it downloads each layer, storing them in the cache and stiches them together in the singularity image file. The singularity image is immutable.

Use ls to view the new file

BASH

ls -lF 

OUTPUT

total 333280
-rw-rw-r-- 1 student student         0 Jul 30 22:43 test_scp.txt
-rwxrwxr-x 1 student student 324775936 Apr  1 01:09 tidyverse_4.2.1.sif*

Use ls -lFa to see the hidden files.

BASH

ls -lFa

We can see the hidden directory .apptainer that contains the cache.

OUTPUT

total 692628
drwxr-xr-x  4 student student              155 Jul 31 02:45 ./
drwx------  3 student student               19 Jul 31 02:43 .apptainer/
-rw-r--r--  1 student student               18 Nov 24  2021 .bash_logout
-rw-r--r--  1 student student              193 Nov 24  2021 .bash_profile
-rw-r--r--  1 student student              231 Nov 24  2021 .bashrc
drwx------  3 student student               19 Jul 31 02:43 .local/
-rw-r--r--  1 student root                  38 Jul 31 02:34 test_scp.txt
-rwxrwxr-x  1 student student        709230592 Jul 31 02:45 tidyverse_4.2.1.sif*
-rw-r--r--  1 student student              658 Apr  7  2020 .zshrc

BASH

ls -lF .apptainer/

OUTPUT

cache

BASH

ls -lF .apptainer/cache/blob/blobs/sha256/

OUTPUT

-rw-r--r-- 1 student student      7763 Jul 31 02:43 1bec8112559e0494f45c74ee43af6d28b117a6faa7ff8e3aaefea9e741aedc47
-rw-r--r-- 1 student student      1807 Jul 31 02:43 270b4100b33a95ddd4b4e0d4cce9c4a262eaf5043a4d6a33a82fc71224e7f857
-rw-r--r-- 1 student student     27244 Jul 31 02:43 27c3a6114c0bacba4ceb4e0523ee67bfcc5bec7f7824247b6578cdcb629f4978
-rw-r--r-- 1 student student 250108813 Jul 31 02:43 2faad9a83b09e8155e7084ed53957d556333d8c78dbd66288dda084362d9a8a0
-rw-r--r-- 1 student student 164652314 Jul 31 02:43 3ccbc1cfa6d1cbc33689c9e7c2ebcafcb0af4f895b38c84363f57417e6fbb7cb
-rw-r--r-- 1 student student  27506421 Jul 31 02:43 56e0351b98767487b3c411034be95479ed1710bb6be860db6df0be3a98653027
-rw-r--r-- 1 student student     54081 Jul 31 02:43 58e0d5c15b4e6c88ede882864475388b1479a3d81c1b4060aeb919a3a3b5f322
-rw-r--r-- 1 student student 243220058 Jul 31 02:43 7917df3ef3d8605361342bc11f7d527ebb4fea3f95704bb6b72e6a4f043faa6d
-rw-r--r-- 1 student student  37945014 Jul 31 02:43 81c9ee1c97bb79e966a4ea76644eb05ebc6b72f67dfdccb9e8f4bce3190cdd0a
-rw-r--r-- 1 student student      2016 Jul 31 02:43 ae0677065e80ef796cdadccfbfb18370b194bc057bec8200d2dbe6b173048935
-rw-r--r-- 1 student student     22169 Jul 31 02:43 d518d22d5d29e561be6588568fd73aff10b6e658a3a3a9e8e98c0470e1b21a8a

BASH

singularity cache clean

BASH

ls -lF .apptainer/cache/blob/blobs/sha256/

Run the R help command

BASH

singularity exec tidyverse_4.2.1.sif R --help

It works now you can see the R program output.

OUTPUT

Usage: R [options] [< infile] [> outfile]
   or: R CMD command [arguments]

Start R, a system for statistical computation and graphics, with the
specified options, or invoke an R tool via the 'R CMD' interface.

Options:
  -h, --help            Print short help message and exit
  --version             Print version info and exit
  --encoding=ENC        Specify encoding to be used for stdin
  --encoding ENC
  RHOME                 Print path to R home directory and exit
  --save                Do save workspace at the end of the session
  --no-save             Don't save it
  --no-environ          Don't read the site and user environment files
  --no-site-file        Don't read the site-wide Rprofile
  --no-init-file        Don't read the user R profile
  --restore             Do restore previously saved objects at startup
  --no-restore-data     Don't restore previously saved objects
  --no-restore-history  Don't restore the R history file
  --no-restore          Don't restore anything
  --vanilla             Combine --no-save, --no-restore, --no-site-file,
                        --no-init-file and --no-environ
  --no-readline         Don't use readline for command-line editing
  --max-ppsize=N        Set max size of protect stack to N
  --min-nsize=N         Set min number of fixed size obj's ("cons cells") to N
  --min-vsize=N         Set vector heap minimum to N bytes; '4M' = 4 MegaB
  -q, --quiet           Don't print startup message
  --silent              Same as --quiet
  -s, --no-echo         Make R run as quietly as possible
  --interactive         Force an interactive session
  --verbose             Print more information about progress
  -d, --debugger=NAME   Run R through debugger NAME
  --debugger-args=ARGS  Pass ARGS as arguments to the debugger
  -g TYPE, --gui=TYPE   Use TYPE as GUI; possible values are 'X11' (default)
                        and 'Tk'.
  --arch=NAME           Specify a sub-architecture
  --args                Skip the rest of the command line
  -f FILE, --file=FILE  Take input from 'FILE'
  -e EXPR               Execute 'EXPR' and exit

FILE may contain spaces but not shell metacharacters.

Commands:
  BATCH                 Run R in batch mode
  COMPILE               Compile files for use with R
  SHLIB                 Build shared library for dynamic loading
  INSTALL               Install add-on packages
  REMOVE                Remove add-on packages
  build                 Build add-on packages
  check                 Check add-on packages
  LINK                  Front-end for creating executable programs
  Rprof                 Post-process R profiling files
  Rdconv                Convert Rd format to various other formats
  Rd2pdf                Convert Rd format to PDF
  Rd2txt                Convert Rd format to pretty text
  Stangle               Extract S/R code from Sweave documentation
  Sweave                Process Sweave documentation
  Rdiff                 Diff R output ignoring headers etc
  config                Obtain configuration information about R
  javareconf            Update the Java configuration variables
  rtags                 Create Emacs-style tag files from C, R, and Rd files

Please use 'R CMD command --help' to obtain further information about
the usage of 'command'.

Options --arch, --no-environ, --no-init-file, --no-site-file and --vanilla
can be placed between R and CMD, to apply to R processes run by 'command'

Report bugs at <https://bugs.R-project.org>.

Run the Rscript help command

BASH

singularity exec tidyverse_4.2.1.sif Rscript --help

It works now you can see the Rscript program output.

OUTPUT

Usage: Rscript [options] file [args]
   or: Rscript [options] -e expr [-e expr2 ...] [args]
A binary front-end to R, for use in scripting applications.

Options:
  --help              Print usage and exit
  --version           Print version and exit
  --verbose           Print information on progress
  --default-packages=LIST  Attach these packages on startup;
                        a comma-separated LIST of package names, or 'NULL'
and options to R (in addition to --no-echo --no-restore), for example:
  --save              Do save workspace at the end of the session
  --no-environ        Don't read the site and user environment files
  --no-site-file      Don't read the site-wide Rprofile
  --no-init-file      Don't read the user R profile
  --restore           Do restore previously saved objects at startup
  --vanilla           Combine --no-save, --no-restore, --no-site-file,
                        --no-init-file and --no-environ

Expressions (one or more '-e <expr>') may be used *instead* of 'file'.
Any additional 'args' can be accessed from R via 'commandArgs(TRUE)'.
See also  ?Rscript  from within R.

Lets go inside of the container and see what we see.

First lets verify that we can see other directories.

BASH

ls /projects/my-lab

BASH

singularity shell tidyverse_4.2.1.sif 

Notice the prompt changed to Apptainer

BASH

ls -lFa

Same outout as before.

OUTPUT

total 692628
drwxr-xr-x 4 student student       155 Jul 31 02:45 ./
drwxr-xr-x 4 student student        80 Jul 31 02:58 ../
drwx------ 3 student student        19 Jul 31 02:43 .apptainer/
-rw-r--r-- 1 student student        18 Nov 24  2021 .bash_logout
-rw-r--r-- 1 student student       193 Nov 24  2021 .bash_profile
-rw-r--r-- 1 student student       231 Nov 24  2021 .bashrc
drwx------ 3 student student        19 Jul 31 02:43 .local/
-rw-r--r-- 1 student student        38 Jul 31 02:34 test_scp.txt
-rwxrwxr-x 1 student student 709230592 Jul 31 02:45 tidyverse_4.2.1.sif*
-rw-r--r-- 1 student student       658 Apr  7  2020 .zshrc

Try to view the projects directory.

BASH

ls /projects/my-lab

We get an error because the container can not see the directory. By default the container only mounts the home and current working directory. If you need to access something outside these directories you’ll need to use a bind mount (more on this later).

OUTPUT

ls: cannot access '/projects/my-lab': No such file or directory

Lets exit the container

BASH

exit

View the container os

BASH

singularity exec tidyverse_4.2.1.sif grep -E '^(VERSION|NAME)=' /etc/os-release

Lets use the inspect command to see details about the container

BASH

singularity inspect tidyverse_4.2.1.sif 

We get some several detais about this contianer. Lets see if we can find out more about this container build by researching the source link.

OUTPUT

org.label-schema.build-arch: amd64
org.label-schema.build-date: Monday_31_July_2023_2:44:16_UTC
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.1.9-1.el7
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: rocker/tidyverse:4.2.1
org.opencontainers.image.authors: Carl Boettiger <cboettig@ropensci.org>
org.opencontainers.image.base.name: docker.io/rocker/rstudio:4.2.1
org.opencontainers.image.description: Version-stable build of R, RStudio Server, and R packages.
org.opencontainers.image.licenses: GPL-2.0-or-later
org.opencontainers.image.ref.name: ubuntu
org.opencontainers.image.revision: ef593dcd7b334e02e79188a9e17dcf6149c178b9
org.opencontainers.image.source: https://github.com/rocker-org/rocker-versioned2
org.opencontainers.image.title: rocker/tidyverse
org.opencontainers.image.vendor: Rocker Project
org.opencontainers.image.version: R-4.2.1

The Rocker github provided a lot of detail about the builds.

Content from 04-repos-and-registries


Last updated on 2024-03-11 | Edit this page

Building options

  • Build from a repositor with the build or pull command. Works fine for things that are ready to go

  • Use a remote builder Sylabs/GCP Builder.

  • Use a system with sudo, for instance a cloud VM like we are using today.

Review the remote builder directions

We will omit Sylabs remote builder and repositories in this tutorial since it requiress an external account creation.

A cloud instance with a few programs (apptainer, nano, wget, unzip,nvidia-smi) are a good alternative to remote builders.

https://docs.sylabs.io/guides/3.2/user-guide/cloud_library.html

Sylabs also have several YouTube Videos on using Singularity:

The 5 part Singularity Container Workflow Demo is a good place to start.

Part 1:
https://www.youtube.com/watch?v=nQTMJ9hqKNI&list=PL052H4iYGzyvdZ8VS-omTzj1FKMjdXzfB&index=1
Part 2:
https://www.youtube.com/watch?v=23KOlEouAiI&list=PL052H4iYGzyvdZ8VS-omTzj1FKMjdXzfB&index=2
Part 3: https://www.youtube.com/watch?v=I5M6er06lT0&list=PL052H4iYGzyvdZ8VS-omTzj1FKMjdXzfB&index=3
Part 4:
https://www.youtube.com/watch?v=eb8vFmYLNTg&list=PL052H4iYGzyvdZ8VS-omTzj1FKMjdXzfB&index=4
Part 5:
https://www.youtube.com/watch?v=CFxngpNl1nU&list=PL052H4iYGzyvdZ8VS-omTzj1FKMjdXzfB&index=5

Some note worthy registries and sources of software

Lets pull some bioinformatic software from the registries and see what we can get.

BASH

cd /projects/my-lab/04-pull

Check and see what is in the directory.

BASH

ls -lF

Nothing, just an empty directory.

OUTPUT

total 0

Lets look at the busco pull command from earlier, dont execute it.

docker pull ezlabgva/busco:v5.4.7_cv1

Modify for singularity and pull it down.
Maybe put docker, colon, forward slash,forward slash to a jingle as we will be using it often.

docker://

BASH

singularity pull docker://ezlabgva/busco:v5.4.7_cv1

OUTPUT

INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 700f250e37eb done
Copying blob da52c665ae6a done
Copying blob 230a319b6d10 done
Copying blob f7ec5a41d630 done
Copying blob a3ed95caeb02 done
Copying blob bffdb47af6a2 done
Copying blob 519cab61f8f7 done
Copying blob 3b07ee5f9c53 done
Copying config edd7eca642 done
Writing manifest to image destination
Storing signatures
2023/07/31 03:16:00  info unpack layer: sha256:f7ec5a41d630a33a2d1db59b95d89d93de7ae5a619a3a8571b78457e48266eba
2023/07/31 03:16:01  info unpack layer: sha256:da52c665ae6a3c231308941f65380e35950ef6c10aca2d47181c8ebf4915f6f1
2023/07/31 03:16:01  info unpack layer: sha256:bffdb47af6a29dd80fefdab2010f1c359c84e20797d0e22385589287bd992ace
2023/07/31 03:16:01  info unpack layer: sha256:230a319b6d10d18fb27a90920929d39624ae864f9d1b1ce2b82f579b084dcd94
2023/07/31 03:16:01  info unpack layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
2023/07/31 03:16:01  info unpack layer: sha256:700f250e37ebd4b22828d661f4a53537cd504b8d09d843bc1cbf01d36f622d3e
2023/07/31 03:16:31  info unpack layer: sha256:519cab61f8f7703cf31e02e406e11571d9432e3c3abbcd57ed5ea01b20a68199
2023/07/31 03:16:31  info unpack layer: sha256:3b07ee5f9c539bdb444c78a97e1578d7c10b0dc1a8f9b6c89de29ff1d367bdb4
INFO:    Creating SIF file...

Lets see what happened.

BASH

ls -lFa

We see a busco sif file but nothing else. The cache is in the home directory.

OUTPUT

ls -alF
total 802912
drwxr-xr-x  2 student root           34 Jul 31 03:17 ./
drwxr-xr-x 11 student root          165 Jul 31 02:34 ../
-rwxrwxr-x  1 student student 822181888 Jul 31 03:17 busco_v5.4.7_cv1.sif*

Lets see if it works by pulling down a couple of Staphylococcus aureus bacteria proteomes.

BASH

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/013/425/GCF_000013425.1_ASM1342v1/GCF_000013425.1_ASM1342v1_protein.faa.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/264/815/GCF_003264815.1_ASM326481v1/GCF_003264815.1_ASM326481v1_protein.faa.gz

gunzip ./GCF_000013425.1_ASM1342v1_protein.faa.gz ./GCF_003264815.1_ASM326481v1_protein.faa.gz

Lets run the busco command.

BASH

singularity exec ./busco_v5.4.7_cv1.sif busco -i GCF_000013425.1_ASM1342v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_000013425 -m protein 

Oh no, we got an error, its ok there is an easy fix. We can add the –bind $PWD to specifically request the current directory to be mounted.

BASH

singularity exec --bind $PWD ./busco_v5.4.7_cv1.sif busco -i GCF_000013425.1_ASM1342v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_000013425 -m protein 

BASH

singularity exec --bind $PWD ./busco_v5.4.7_cv1.sif busco -i GCF_003264815.1_ASM326481v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_003264815 -m protein 

Now lets download some more tools. What was that jingle again …

BASH

singularity pull docker://staphb/bwa:0.7.17
singularity pull docker://staphb/samtools:1.17-2023-06
singularity pull docker://staphb/bedtools:2.31.0
singularity pull docker://staphb/blast:2.14.0
singularity pull docker://staphb/bcftools:1.17
singularity pull docker://staphb/ncbi-datasets
singularity pull docker://biocontainers/vcftools:v0.1.16-1-deb_cv1
singularity pull docker://biocontainers/bedops:v2.4.35dfsg-1-deb_cv1

Lets see what we pulled down.

BASH

ls -lh -1 *sif

Thats alot of software ready to go. We see a wide range in sizes for the various software.

OUTPUT

-rwxrwxr-x 1 student student 119M Jul 31 07:53 bcftools_1.17.sif
-rwxrwxr-x 1 student student  59M Jul 31 07:53 bedops_v2.4.35dfsg-1-deb_cv1.sif
-rwxrwxr-x 1 student student  44M Jul 31 07:52 bedtools_2.31.0.sif
-rwxrwxr-x 1 student student 265M Jul 31 07:53 blast_2.14.0.sif
-rwxrwxr-x 1 student student 785M Jul 31 07:48 busco_v5.4.7_cv1.sif
-rwxrwxr-x 1 student student  76M Jul 31 07:52 bwa_0.7.17.sif
-rwxrwxr-x 1 student student  43M Jul 31 07:53 ncbi-datasets_latest.sif
-rwxrwxr-x 1 student student  44M Jul 31 07:52 samtools_1.17-2023-06.sif
-rwxrwxr-x 1 student student  61M Jul 31 07:53 vcftools_v0.1.16-1-deb_cv1.sif

Lets see what they used for the base images.

BASH

singularity exec bcftools_1.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bedtools_2.31.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec blast_2.14.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec busco_v5.4.7_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bwa_0.7.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec ncbi-datasets_latest.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec samtools_1.17-2023-06.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec vcftools_v0.1.16-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release

We see several different Ubuntu release and on Debian.

OUTPUT

[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bcftools_1.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (75) bind mounts
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bedtools_2.31.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (69) bind mounts
NAME="Ubuntu"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec blast_2.14.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (71) bind mounts
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec busco_v5.4.7_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bwa_0.7.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec ncbi-datasets_latest.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (73) bind mounts
NAME="Ubuntu"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec samtools_1.17-2023-06.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (70) bind mounts
NAME="Ubuntu"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec vcftools_v0.1.16-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"

Now lets download another bacteria with the ncbi-datasets

BASH

singularity exec -B $PWD ncbi-datasets_latest.sif  datasets download genome accession GCF_001719145.1 --include gff3,rna,cds,protein,genome,seq-report --filename GCF_001719145.1.zip

BASH

unzip GCF_001719145.1.zip

We just download a gff, protein fasta, transcript fasta, genome fasta and more for this accession.

BASH

ls -lah  ncbi_dataset/data/*

Lets run some of the bioinformatic tools.

Index the genome, transcriptome, and proteome cause why not.

BASH

singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna

BASH

singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/cds_from_genomic.fna

BASH

singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx ncbi_dataset/data/GCF_001719145.1/protein.faa

We can now see the .fai files have been added.

BASH

ls -lah  ncbi_dataset/data/GCF_001719145.1

BASH

head ncbi_dataset/data/GCF_001719145.1/protein.faa.fai

OUTPUT

WP_002802878.1  86      100     80      81
WP_002804494.1  76      255     76      77
WP_002804983.1  90      418     80      81
WP_002805908.1  46      580     46      47
WP_002806026.1  208     728     80      81
WP_002806565.1  121     1002    80      81
WP_002808218.1  128     1212    80      81
WP_002808376.1  71      1409    71      72
WP_002808458.1  227     1565    80      81
WP_002808480.1  112     1870    80      81

BASH

sort -nk 2 ncbi_dataset/data/GCF_001719145.1/protein.faa.fai | tail

OUTPUT

WP_069288208.1  1504    1554134 80      81
WP_005917363.1  1669    1264083 80      81
WP_033837051.1  1746    1522853 80      81
WP_033481971.1  1871    1504684 80      81
WP_269466082.1  2264    1673627 80      81
WP_033837667.1  2375    1534966 80      81
WP_033837612.1  2416    1528386 80      81
WP_033837670.1  2756    1537448 80      81
WP_005916482.1  2982    1177517 80      81
WP_005916057.1  3397    1136201 80      81

BASH

singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/protein.faa WP_005916057.1 

BASH

singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/protein.faa WP_005916057.1 > longest_prot.fasta

BASH

singularity exec -B $PWD blast_2.14.0.sif makeblastdb -in ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -input_type fasta -dbtype nucl

BASH

singularity exec -B $PWD blast_2.14.0.sif tblastn -query longest_prot.fasta -db ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -outfmt 7

BASH

singularity exec -B $PWD blast_2.14.0.sif tblastn -query longest_prot.fasta -db ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -outfmt 6 | cut -f2,9,10 > hits.bed 

BASH

cat hits.bed 

BASH

awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' hits.bed  | singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif sort-bed - > b.fixed.bed

BASH

singularity exec -B $PWD bedtools_2.31.0.sif bedtools merge -i b.fixed.bed

citations

Awk command: https://www.biostars.org/p/304852/

Content from 05-singularity-build


Last updated on 2024-03-11 | Edit this page

The build command

Containers can be build from the same sources as pull command library, docker, shub, oras as well as using a binary file as a base.

Today we will focus on one of the most common builds, building on top a container image from docker.

Builds that do not modify the source image generally can be built with out sudo privileges, this is equivalent to a pull command.

Take away parts of a the build command and definition file

  • singularity build is used to build images from definition file.
  • The definition file contains the directions to build the image, similar to installing an OS.
  • Bootstrap - the provider of the source images
  • From - the source image/layers
  • %post - commands issued to build the container
  • %environment - variables set at runtime
  • %runscrip - commands executed when the container image is run (either via the singularity run by executing the container directly as a command

Example of build definition file:

Bootstrap: docker
From: ubuntu:16.04

%post
    apt-get -y update
    apt-get -y install fortune cowsay lolcat

%environment
    export LC_ALL=C
    export PATH=/usr/games:$PATH

%runscript
    fortune | cowsay | lolcat

Example of all the options for a definition file.

Bootstrap: library
From: ubuntu:18.04

%setup
    touch /file1
    touch ${SINGULARITY_ROOTFS}/file2

%files
    /file1
    /file1 /opt

%environment
    export LISTEN_PORT=12345
    export LC_ALL=C

%post
    apt-get update && apt-get install -y netcat
    NOW=`date`
    echo "export NOW=\"${NOW}\"" >> $SINGULARITY_ENVIRONMENT

%runscript
    echo "Container was created $NOW"
    echo "Arguments received: $*"
    exec echo "$@"

%startscript
    nc -lp $LISTEN_PORT

%test
    grep -q NAME=\"Ubuntu\" /etc/os-release
    if [ $? -eq 0 ]; then
        echo "Container base is Ubuntu as expected."
    else
        echo "Container base is not Ubuntu."
    fi

%labels
    Author d@sylabs.io
    Version v0.0.1

%help
    This is a demo container used to illustrate a def file that uses all
    supported sections.

We will build containers later in the course.

citations

Lesson adapted from:

Content from 06-bioinformatics-qc


Last updated on 2024-03-11 | Edit this page

Lets build a container image for something not in dockerhub

First lets check our data files

BASH

cd /projects/my-lab/06-bio-qc

View contents of directory file:

BASH

ls -lF

OUTPUT

total 180
-rw-rw-r-- 1 student student 82010 Jul 14 05:34 SRR10233452_subset_1.fastq.gz
-rw-rw-r-- 1 student student    64 Jul 14 05:34 SRR10233452_subset_1.fastq.gz.md5
-rw-rw-r-- 1 student student    61 Jul 14 05:34 SRR10233452_subset_1.fastq.md5
-rw-rw-r-- 1 student student 80450 Jul 14 05:34 SRR10233452_subset_2.fastq.gz
-rw-rw-r-- 1 student student    64 Jul 14 05:34 SRR10233452_subset_2.fastq.gz.md5
-rw-rw-r-- 1 student student    61 Jul 14 05:34 SRR10233452_subset_2.fastq.md5

Lets make sure our files are correct with md5sum

BASH

cat SRR10233452_subset_1.fastq.gz.md5

OUTPUT

0e6b0d752ca7bd9019cc4f5994950cf4  SRR10233452_subset_1.fastq.gz

BASH

md5sum SRR10233452_subset_1.fastq.gz

The output is the same so we know the file is correct.

OUTPUT

0e6b0d752ca7bd9019cc4f5994950cf4  SRR10233452_subset_1.fastq.gz

For second read just use the check function of md5sum.

BASH

md5sum -c SRR10233452_subset_2.fastq.gz.md5 

Prints and OK since the file is correct, otherwise it would say FAILED.

OUTPUT

SRR10233452_subset_2.fastq.gz: OK

Now we can run some containers

Lets run FastQC.

BASH

singularity pull fastqc.sif docker://staphb/fastqc:0.12.1

the command above prints a lot of test to the screen.

Use ls to view the new sif file

BASH

ls 

OUTPUT

fastqc.sif  SRR10233452_subset_1.fastq.gz  SRR10233452_subset_1.fastq.gz.md5  SRR10233452_subset_1.fastq.md5  SRR10233452_subset_2.fastq.gz  SRR10233452_subset_2.fastq.gz.md5  SRR10233452_subset_2.fastq.md5

Now run fastqc

BASH

singularity exec fastqc.sif fastqc SRR10233452_subset_1.fastq.gz SRR10233452_subset_2.fastq.gz

OUTPUT

INFO:    underlay of /etc/localtime required more than 50 (81) bind mounts
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Skipping 'SRR10233452_subset_1.fastq.gz' which didn't exist, or couldn't be read
Skipping 'SRR10233452_subset_2.fastq.gz' which didn't exist, or couldn't be read

Oh no we dont see the files, again.
We can fix that.

BASH

singularity exec -B $PWD fastqc.sif fastqc SRR10233452_subset_1.fastq.gz SRR10233452_subset_2.fastq.gz

Worked great.

OUTPUT

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Started analysis of SRR10233452_subset_1.fastq.gz
Approx 100% complete for SRR10233452_subset_1.fastq.gz
Analysis complete for SRR10233452_subset_1.fastq.gz
Started analysis of SRR10233452_subset_2.fastq.gz
Approx 100% complete for SRR10233452_subset_2.fastq.gz
Analysis complete for SRR10233452_subset_2.fastq.gz

Use ls with the 1 flag to see the files in the directory

BASH

ls -1

OUTPUT

fastqc.sif
SRR10233452_subset_1_fastqc.html
SRR10233452_subset_1_fastqc.zip
SRR10233452_subset_1.fastq.gz
SRR10233452_subset_1.fastq.gz.md5
SRR10233452_subset_1.fastq.md5
SRR10233452_subset_2_fastqc.html
SRR10233452_subset_2_fastqc.zip
SRR10233452_subset_2.fastq.gz
SRR10233452_subset_2.fastq.gz.md5
SRR10233452_subset_2.fastq.md5

Use unzip to unzip the results for subset 1.

BASH

unzip SRR10233452_subset_1_fastqc.zip

OUTPUT

Archive:  SRR10233452_subset_1_fastqc.zip
   creating: SRR10233452_subset_1_fastqc/
   creating: SRR10233452_subset_1_fastqc/Icons/
   creating: SRR10233452_subset_1_fastqc/Images/
  inflating: SRR10233452_subset_1_fastqc/Icons/fastqc_icon.png
  inflating: SRR10233452_subset_1_fastqc/Icons/warning.png
  inflating: SRR10233452_subset_1_fastqc/Icons/error.png
  inflating: SRR10233452_subset_1_fastqc/Icons/tick.png
  inflating: SRR10233452_subset_1_fastqc/summary.txt
  inflating: SRR10233452_subset_1_fastqc/Images/per_base_quality.png
  inflating: SRR10233452_subset_1_fastqc/Images/per_sequence_quality.png
  inflating: SRR10233452_subset_1_fastqc/Images/per_base_sequence_content.png
  inflating: SRR10233452_subset_1_fastqc/Images/per_sequence_gc_content.png
  inflating: SRR10233452_subset_1_fastqc/Images/per_base_n_content.png
  inflating: SRR10233452_subset_1_fastqc/Images/sequence_length_distribution.png
  inflating: SRR10233452_subset_1_fastqc/Images/duplication_levels.png
  inflating: SRR10233452_subset_1_fastqc/Images/adapter_content.png
  inflating: SRR10233452_subset_1_fastqc/fastqc_report.html
  inflating: SRR10233452_subset_1_fastqc/fastqc_data.txt
  inflating: SRR10233452_subset_1_fastqc/fastqc.fo  

Use cat to view the summary results

BASH

cat SRR10233452_subset_1_fastqc/summary.txt

OUTPUT

PASS	Basic Statistics	SRR10233452_subset_1.fastq.gz
PASS	Per base sequence quality	SRR10233452_subset_1.fastq.gz
PASS	Per sequence quality scores	SRR10233452_subset_1.fastq.gz
FAIL	Per base sequence content	SRR10233452_subset_1.fastq.gz
WARN	Per sequence GC content	SRR10233452_subset_1.fastq.gz
FAIL	Per base N content	SRR10233452_subset_1.fastq.gz
WARN	Sequence Length Distribution	SRR10233452_subset_1.fastq.gz
PASS	Sequence Duplication Levels	SRR10233452_subset_1.fastq.gz
PASS	Overrepresented sequences	SRR10233452_subset_1.fastq.gz
PASS	Adapter Content	SRR10233452_subset_1.fastq.gz

Prep for BWA

Now we can uncompress the ‘gzipped’ files with gunzip. Use the ‘-c option to preserve the original file’

BASH

gunzip SRR10233452_subset_1.fastq.gz SRR10233452_subset_2.fastq.gz

Nothing is printed out.

OUTPUT

Can see changes with ‘ls’.

BASH

ls -lF

OUTPUT

-rwxrwxr-x 1 student student 297582592 Jul 14 12:21 fastqc.sif*
-rw-rw-r-- 1 student student    366018 Jul 14 12:06 SRR10233452_subset_1.fastq
-rw-rw-r-- 1 student student    218957 Jul 14 12:22 SRR10233452_subset_1_fastqc.html
-rw-rw-r-- 1 student student    225066 Jul 14 12:22 SRR10233452_subset_1_fastqc.zip
-rw-rw-r-- 1 student student        64 Jul 14 12:06 SRR10233452_subset_1.fastq.gz.md5
-rw-rw-r-- 1 student student        61 Jul 14 12:06 SRR10233452_subset_1.fastq.md5
-rw-rw-r-- 1 student student    355874 Jul 14 12:06 SRR10233452_subset_2.fastq
-rw-rw-r-- 1 student student    230235 Jul 14 12:22 SRR10233452_subset_2_fastqc.html
-rw-rw-r-- 1 student student    245117 Jul 14 12:22 SRR10233452_subset_2_fastqc.zip
-rw-rw-r-- 1 student student        64 Jul 14 12:06 SRR10233452_subset_2.fastq.gz.md5
-rw-rw-r-- 1 student student        61 Jul 14 12:06 SRR10233452_subset_2.fastq.md5

Try downloading the zip files with scp

Content from 07-bioinformatics bwa


Last updated on 2024-03-11 | Edit this page

Lets use bwa

BASH

cd /projects/my-lab/07-bwa

View contents of directory file:

BASH

ls -lF

OUTPUT

total 15804
-rw-rw-r-- 1 student student 16171456 Jul 14 05:34 chr20.fna.gz
-rw-rw-r-- 1 student student       47 Jul 14 05:34 chr20.fna.gz.md5
-rw-rw-r-- 1 student student       44 Jul 14 05:34 chr20.fna.md5

BASH

gunzip chr20.fna.gz

Now make an index

Pull a bwa image from dockerhub

BASH

singularity pull bwa.sif docker://staphb/bwa:0.7.17

Make and index

BASH

singularity exec -B $PWD bwa.sif bwa index chr20.fna chr20.fna

OUTPUT

[bwa_index] Pack FASTA... 0.20 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=108871774, availableWord=19660180
[BWTIncConstructFromPacked] 10 iterations done. 32429790 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 59910030 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 84330494 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 106031422 characters processed.
[bwt_gen] Finished constructing BWT in 42 iterations.
[bwa_index] 13.58 seconds elapse.
[bwa_index] Update BWT... 0.15 sec
[bwa_index] Pack forward-only FASTA... 0.13 sec
[bwa_index] Construct SA from BWT and Occ... 9.12 sec
[main] Version: 0.7.17-r1188
[main] CMD: /bwa/bwa-0.7.17/bwa index chr20.fna chr20.fna
[main] Real time: 23.272 sec; CPU: 23.231 sec

BASH

ls -lF

OUTPUT

total 249052
-rwxrwxr-x 1 student student 104620032 Jul 14 06:48 bwa.sif*
-rw-rw-r-- 1 student student  55116442 Jul 14 05:34 chr20.fna
-rw-rw-r-- 1 student student       160 Jul 14 06:49 chr20.fna.amb
-rw-rw-r-- 1 student student       135 Jul 14 06:49 chr20.fna.ann
-rw-rw-r-- 1 student student  54435968 Jul 14 06:49 chr20.fna.bwt
-rw-rw-r-- 1 student student        47 Jul 14 05:34 chr20.fna.gz.md5
-rw-rw-r-- 1 student student        44 Jul 14 05:34 chr20.fna.md5
-rw-rw-r-- 1 student student  13608973 Jul 14 06:49 chr20.fna.pac
-rw-rw-r-- 1 student student  27217992 Jul 14 06:49 chr20.fna.sa

BASH

singularity exec -B $PWD bwa.sif bwa mem chr20.fna /projects/my-lab/06-bio-qc/SRR10233452_subset_1.fastq /projects/my-lab/06-bio-qc/SRR10233452_subset_2.fastq > out_bwa.sam 

We get an error as the container can not see the fastq files in the other directory

OUTPUT

0233452_subset_2.fastq > out_bwa.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file `/projects/my-lab/06-bio-qc/SRR10233452_subset_1.fastq'.

Lets add a bind mount to the command so we can see them.
Note: The mount location in the container does not need to exist in the container.

BASH

singularity exec -B /projects/my-lab/06-bio-qc:/projects/my-lab/06-bio-qc -B $PWD bwa.sif bwa mem chr20.fna /projects/my-lab/06-bio-qc/SRR10233452_subset_1.fastq /projects/my-lab/06-bio-qc/SRR10233452_subset_2.fastq > out_bwa.sam 

BASH

head -n 7 out_bwa.sam

Content from 08-circos


Last updated on 2024-03-11 | Edit this page

Example for circos

Change directory into the circos directory.

BASH

cd /projects/my-lab/08-circos

Verify nothing is in the directory

BASH

ls -lF

Make a circos SIF file by pulling the docker image circos.

BASH

singularity pull docker://alexcoppe/circos

Download the circos tutorial.

BASH

wget http://circos.ca/distribution/circos-tutorials-current.tgz

Untar and gzip the downloaded file.

BASH

tar xzvf circos-tutorials-current.tgz

BASH

ls -1F

OUTPUT

circos_latest.sif*
circos-tutorials-0.67/
circos-tutorials-current.tgz

Run tutorial 1/1

BASH

singularity exec -B $PWD circos_latest.sif /opt/circos/bin/circos -conf circos-tutorials-0.67/tutorials/1/1/circos.conf

The base image of the human chromosomes was created

circos example 1 dot 1
circos_1_1

Run tutorial 8/6 histograms

BASH

singularity exec -B $PWD circos_latest.sif /opt/circos/bin/circos -conf circos-tutorials-0.67/tutorials/8/6/circos.conf

An image with histograms

circos example 8 dot 6
circos_8_6

Run tutorial 8/11 links

BASH

singularity exec -B $PWD circos_latest.sif /opt/circos/bin/circos -conf circos-tutorials-0.67/tutorials/8/11/circos.conf

An image with links

circos example 8 dot 11
circos_8_11

citations

http://circos.ca/

Content from 09-singularity-build


Last updated on 2024-03-11 | Edit this page

Using Containers & Questions

Building on our previous sections, in this unit, we’re going to build a container for BLAST and show how to use that container to construct a BLAST database and search sequences against that database.

Containerizing BLAST

BLAST stands for Basic Local Alignment Search Tool, and is a sophisticated software package for rapid searching of protein and nucleotide databases. BLAST was developed by Steven Altschul in 1989, and has continually been refined, updated, and modified throughout the years to meet the increasing needs of the scientific community.

To cite BLAST, please refer to the following: Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. Volume 215(3), pages 403-410. 1990. PMID: 2231712 DOI: 10.1016/S0022-2836(05)80360-2.

To start, let’s create an empty file to use as our recipe file.

BASH

cd /projects/my-lab/09-build-blast

BASH

touch blast.def

The touch command allows modification of file timestamps, or in the case of this usage, where the file does not already exist, creates an empty file.

Now we’ll use nano to build out our recipe file.

BASH

nano blast.def

This should open the basic nano text editor for you to access through your terminal.

Let’s type the following into our blast.def file:

BASH

Bootstrap:docker
From:ubuntu

%labels
        MAINTAINER Kurt Showmaker

%post
        apt update && apt upgrade -y
        apt install -y wget gzip zip unzip ncbi-blast+ locales
        LANG=C perl -e exit
        locale-gen en_US.UTF-8

%runscript
        echo "Hello from BLAST!"

Let’s hit CTRL-O to save our modifications and then CTRL-X to exit the nano editor.

Ok, now to build the container:

BASH

sudo singularity build blast.sif blast.def

Ok, let’s test that our container is built properly.

BASH

./blast.sif

OUTPUT

Hello from BLAST!

Ok, now we can go into the container’s environment to verify things using the singularity shell command.

BASH

singularity shell blast.sif

Notice the change in prompt from “$” to ”Apptainer>”, this is because we are inside the container.

BASH

Apptainer> blastp -h

OUTPUT

USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-ipglist filename]
    [-negative_ipglist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-subject_besthit] [-window_size int_value] [-lcase_masking]
    [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
    [-num_descriptions int_value] [-num_alignments int_value]
    [-line_length line_length] [-html] [-sorthits sort_hits]
    [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]
    [-use_sw_tback] [-version]

DESCRIPTION
   Protein-Protein BLAST 2.9.0+

Use '-help' to print detailed descriptions of command line arguments

The blastp command is for Protein BLASTs, where a protein sequence is searched against a protein database.

Let’s exit the container environment.

BASH

Apptainer> exit

Now let’s try the containerized command from the server’s environment.

BASH

singularity exec blast.sif blastp -h

OUTPUT

USAGE
  blastp [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
    [-negative_taxidlist filename] [-ipglist filename]
    [-negative_ipglist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
    [-soft_masking soft_masking] [-matrix matrix_name]
    [-threshold float_value] [-culling_limit int_value]
    [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
    [-subject_besthit] [-window_size int_value] [-lcase_masking]
    [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
    [-num_descriptions int_value] [-num_alignments int_value]
    [-line_length line_length] [-html] [-sorthits sort_hits]
    [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]
    [-use_sw_tback] [-version]

DESCRIPTION
   Protein-Protein BLAST 2.9.0+

Use '-help' to print detailed descriptions of command line arguments

Same output, so now let’s put our new BLAST container to use!

Acquiring Protein Data

To start, we are going to need some data to serve as our database to search against. For this exercise, we will use C. elegans proteome. Let’s download and uncompress this file.

BASH

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/985/GCF_000002985.6_WBcel235/GCF_000002985.6_WBcel235_protein.faa.gz

This shouldn’t take long, and produces the following output:

Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.229, 130.14.250.13, 2607:f220:41f:250::228, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.229|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6989933 (6.7M) [application/x-gzip]
Saving to: ‘GCF_000002985.6_WBcel235_protein.faa.gz’

100%[=======================================================================>] 6,989,933   41.1MB/s   in 0.2s

2021-12-02 19:13:56 (41.1 MB/s) - ‘GCF_000002985.6_WBcel235_protein.faa.gz’ saved [6989933/6989933]

Now we will unzip the data file.

BASH

gunzip GCF_000002985.6_WBcel235_protein.faa.gz

Now we will convert the protein FASTA file we downloaded into a BLAST database to search against.

BASH

time singularity exec -B $PWD blast.sif makeblastdb -in GCF_000002985.6_WBcel235_protein.faa -dbtype prot -out c_elegans

This gives us the following output:

OUTPUT


Building a new DB, current time: 12/02/2021 19:14:39
New DB name:   /home/student/c_elegans
New DB title:  GCF_000002985.6_WBcel235_protein.faa
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 28350 sequences in 0.727009 seconds.

real    0m1.248s
user    0m0.828s
sys     0m0.421s

Now we need some sequences of interest to search against the RefSeq database we just constructed. Let’s download all the RefSeq proteins for Human Chromosome 1:

BASH

wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz

OUTPUT

--2021-12-02 19:15:08--  ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz
           => ‘human.1.protein.faa.gz’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 130.14.250.10, 130.14.250.11, 2607:f220:41f:250::230, ...
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /refseq/H_sapiens/mRNA_Prot ... done.
==> SIZE human.1.protein.faa.gz ... 1836521
==> PASV ... done.    ==> RETR human.1.protein.faa.gz ... done.
Length: 1836521 (1.8M) (unauthoritative)

100%[=======================================================================>] 1,836,521   8.92MB/s   in 0.2s

2021-12-02 19:15:08 (8.92 MB/s) - ‘human.1.protein.faa.gz’ saved [1836521]

BASH

gunzip human.1.protein.faa.gz

We’ve downloaded a multi-line FASTA file, but for ease of use, we will now convert this into a single-fasta.

Convert multi-line FASTA to single-line FASTA

BASH

sed -e 's/\(^>.*$\)/#\1#/' human.1.protein.faa | tr -d "\r" | tr -d "\n" | sed -e 's/$/#/' | tr "#" "\n" | sed -e '/^$/d' > human.1.protein.faa.cleaned.fasta

Now, we’ll BLAST the proteins of Human chromosome 1 against the c_elegans blast database. Normally, you would want to run your entire query sequence against the database and interpret results, but because these cloud systems we are connected to are small, in the interest of time we are going to reduce our number of query sequences.

BASH

head -n 20 human.1.protein.faa.cleaned.fasta > search_query.fasta

This pulls the first 10 protein sequences listed in the human chromosome 1 file into a new file, search_query.fasta. So instead of searching all 8,067 sequences, we will only search 10 (0.1%) against the c_elegans database we’ve constructed.

Here We will run blast program using our input and constructed blast database.

BASH

time singularity exec -B $PWD blast.sif blastp -num_threads 2 -db c_elegans -query search_query.fasta -outfmt 6 -out BLASTP_Results.txt -max_target_seqs 1

This gives us the following output:

OUTPUT

Warning: [blastp] Examining 5 or more matches is recommended

real    0m5.957s
user    0m10.364s
sys     0m0.500s

Checking the output file:

BASH

head BLASTP_Results.txt

Gives the following:

OUTPUT

NP_001355814.1  NP_001255859.1  41.697  542     273     10      122     646     77      592     1.46e-58        208
NP_001355814.1  NP_001255859.1  41.199  517     259     9       125     626     154     640     2.69e-57        205
NP_001355814.1  NP_001255859.1  38.609  575     266     9       116     646     109     640     3.11e-54        196
NP_001355814.1  NP_001255859.1  39.962  533     256     10      112     618     156     650     8.53e-53        192
NP_001355814.1  NP_001255859.1  40.393  458     211     9       109     565     243     639     1.33e-49        183
NP_001355814.1  NP_001255859.1  38.647  207     105     5       108     314     455     639     2.77e-15        79.3
NP_001355815.1  NP_001255859.1  40.937  491     220     10      1       472     153     592     4.73e-56        197
NP_001355815.1  NP_001255859.1  39.038  520     232     13      2       472     80      563     1.16e-48        177
NP_001355815.1  NP_001255859.1  42.958  426     196     9       76      459     75      495     9.38e-47        171
NP_001355815.1  NP_001255859.1  40.086  464     218     11      2       444     220     644     9.89e-44        163

Any Questions?

Please ask the presenters any questions you may have!

Content from 10-singularity-build-rstudio


Last updated on 2024-03-11 | Edit this page

Disclaimer

This build of rstudio is for demonstration purposes only on these demo instances. For production purposes please consult your local system admin.

Containerizing Rstudio

CITE ROCKER AND RICHARD

BASH

cd /projects/my-lab/10-build-rstudio

To start, create a few directories Rstudio server looks for.

BASH

workdir=/projects/my-lab/10-build-rstudio

mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server

cat > ${workdir}/database.conf <<END
provider=sqlite
directory=/var/lib/rstudio-server
END

To start, let’s create an empty file to use as our recipe file. #lets build an Rstudio server and install a few R/Bioconductor pacages into the container.

BASH

nano rdeseq2.def

BASH

Bootstrap: docker
From: rocker/tidyverse:4.2.1

#This def file doesn't seem to build on Sumner with Centos 7.  Richard suggests building on an Ubuntu system but we will stick with Centos 7.

%post

apt update
apt install -y libgeos-dev libglpk-dev # libcurl4-openssl-dev
apt install -y libcurl4-gnutls-dev libxml2-dev cmake  libbz2-dev

apt install -y libxslt1-dev  # install for sandpaper

R -e 'if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")'

BASH

sudo time singularity build rdeseq2.sif rdeseq2.def

View files inside directory. Notice no R directory.

BASH

ls -lah 

Set the run time variables and deploy.

BASH

export SINGULARITY_BIND="${workdir}/run:/run,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server"
export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=password
singularity exec --cleanenv rdeseq2.sif rserver --www-port 8787 --auth-none=0 --auth-pam-helper-path=pam-helper --auth-stay-signed-in-days=30 --auth-timeout-minutes=0 --server-user  "student"

navigate to <your_IP>:8787 username is student password is set above to password

Lets load a package that from the container.

R

library('DESeq2')

Using the Rstudio file browser look at the R folder that appeared, but nothing is in it.

Lets install something to the user space.

R

BiocManager::install("EnhancedVolcano")

Now we see the local user install.

R

.libPaths()

OUTPUT

[1] "/home/student/R/x86_64-pc-linux-gnu-library/4.2"
[2] "/usr/local/lib/R/site-library"
[3] "/usr/local/lib/R/library"    

Verify the libs with terminal tab
ls /home/student/R/x86_64-pc-linux-gnu-library/4.2
ls /usr/local/lib/R

Verify via ssh’ing with another tab
ssh student@<IP>
ls /home/student/R/x86_64-pc-linux-gnu-library/4.2 ls /usr/local/lib/R

Lets install few fun packages.

R

install.packages('knitr', dependencies = TRUE)

R

options(repos = c(
  carpentries = "https://carpentries.r-universe.dev/", 
  CRAN = "https://cran.rstudio.com/"
))
install.packages("sandpaper", dep = TRUE)

note the sandpaper command will reset the rstudio server.
select dont save

R

library('sandpaper')
sandpaper::create_lesson("~/r-intermediate-penguins")

after reset we are now in the new folder

R

sandpaper::serve(quiet = FALSE, host = "0.0.0.0", port = "8789")

Now edit one of the episodes

R

 servr::daemon_stop()

Knit example

Make new folder called myknit.
Make file in folder called myknit.Rmd
Copy section of code (lines 56 to 101) from the link below and paste into new myknit.Rmd file.

#https://github.com/sachsmc/knit-git-markr-guide/blob/master/knitr/knit.Rmd

Caution: just an example below, use copy lines from link above

R

 ```{r setup, include=FALSE}
library(stringr)
library(knitr)
opts_chunk$set(tidy = FALSE)

knit_hooks$set(source = function(x, options){
  if (!is.null(options$verbatim) && options$verbatim){
    opts = gsub(",\\s*verbatim\\s*=\\s*TRUE\\s*", "", options$params.src)
    bef = sprintf('\n\n    ```{r %s}\n', opts, "\n")
    stringr::str_c(
      bef, 
      knitr:::indent_block(paste(x, collapse = '\n'), "    "), 
      "\n    ```\n"
    )
  } else {
    stringr::str_c("\n\n```", tolower(options$engine), "\n", 
      paste(x, collapse = '\n'), "\n```\n\n"
    )
  }
})