Estimated time: 30 minutes

Building options

  • Build from a repositor with the build or pull command. Works fine for things that are ready to go

  • Use a remote builder Sylabs/GCP Builder.

  • Use a system with sudo, for instance a cloud VM like we are using today.

Review the remote builder directions

We will omit Sylabs remote builder and repositories in this tutorial since it requiress an external account creation.

A cloud instance with a few programs (apptainer, nano, wget, unzip,nvidia-smi) are a good alternative to remote builders.


Sylabs also have several YouTube Videos on using Singularity:

The 5 part Singularity Container Workflow Demo is a good place to start.

Some note worthy registries and sources of software

Lets pull some bioinformatic software from the registries and see what we can get.


cd /projects/my-lab/04-pull

Check and see what is in the directory.


ls -lF

Nothing, just an empty directory.


total 0

Lets look at the busco pull command from earlier, dont execute it.

docker pull ezlabgva/busco:v5.4.7_cv1

Modify for singularity and pull it down.
Maybe put docker, colon, forward slash,forward slash to a jingle as we will be using it often.



singularity pull docker://ezlabgva/busco:v5.4.7_cv1


Lets see what happened.


ls -lFa

We see a busco sif file but nothing else. The cache is in the home directory.


Lets see if it works by pulling down a couple of Staphylococcus aureus bacteria proteomes.


wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/013/425/GCF_000013425.1_ASM1342v1/GCF_000013425.1_ASM1342v1_protein.faa.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/264/815/GCF_003264815.1_ASM326481v1/GCF_003264815.1_ASM326481v1_protein.faa.gz

gunzip ./GCF_000013425.1_ASM1342v1_protein.faa.gz ./GCF_003264815.1_ASM326481v1_protein.faa.gz

Lets run the busco command.


singularity exec ./busco_v5.4.7_cv1.sif busco -i GCF_000013425.1_ASM1342v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_000013425 -m protein 

Oh no, we got an error, its ok there is an easy fix. We can add the –bind $PWD to specifically request the current directory to be mounted.


singularity exec --bind $PWD ./busco_v5.4.7_cv1.sif busco -i GCF_000013425.1_ASM1342v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_000013425 -m protein 


singularity exec --bind $PWD ./busco_v5.4.7_cv1.sif busco -i GCF_003264815.1_ASM326481v1_protein.faa -l bacteria_odb10 -o  busco_out_GCF_003264815 -m protein 

Now lets download some more tools. What was that jingle again …


singularity pull docker://staphb/bwa:0.7.17
singularity pull docker://staphb/samtools:1.17-2023-06
singularity pull docker://staphb/bedtools:2.31.0
singularity pull docker://staphb/blast:2.14.0
singularity pull docker://staphb/bcftools:1.17
singularity pull docker://staphb/ncbi-datasets
singularity pull docker://biocontainers/vcftools:v0.1.16-1-deb_cv1
singularity pull docker://biocontainers/bedops:v2.4.35dfsg-1-deb_cv1

Lets see what we pulled down.


ls -lh -1 *sif

Thats alot of software ready to go. We see a wide range in sizes for the various software.


Lets see what they used for the base images.


singularity exec bcftools_1.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bedtools_2.31.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec blast_2.14.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec busco_v5.4.7_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec bwa_0.7.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec ncbi-datasets_latest.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec samtools_1.17-2023-06.sif grep -E '^(VERSION|NAME)=' /etc/os-release
singularity exec vcftools_v0.1.16-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release

We see several different Ubuntu release and on Debian.


[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bcftools_1.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (75) bind mounts
VERSION="20.04.5 LTS (Focal Fossa)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bedtools_2.31.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (69) bind mounts
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec blast_2.14.0.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (71) bind mounts
VERSION="20.04.6 LTS (Focal Fossa)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec busco_v5.4.7_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec bwa_0.7.17.sif grep -E '^(VERSION|NAME)=' /etc/os-release
VERSION="16.04.7 LTS (Xenial Xerus)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec ncbi-datasets_latest.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (73) bind mounts
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec samtools_1.17-2023-06.sif grep -E '^(VERSION|NAME)=' /etc/os-release
INFO:    underlay of /etc/localtime required more than 50 (70) bind mounts
VERSION="22.04.2 LTS (Jammy Jellyfish)"
[student@edu-vm-bdafb86b-1 04-pull]$ singularity exec vcftools_v0.1.16-1-deb_cv1.sif grep -E '^(VERSION|NAME)=' /etc/os-release
NAME="Debian GNU/Linux"
VERSION="10 (buster)"

Now lets download another bacteria with the ncbi-datasets


singularity exec -B $PWD ncbi-datasets_latest.sif  datasets download genome accession GCF_001719145.1 --include gff3,rna,cds,protein,genome,seq-report --filename GCF_001719145.1.zip


unzip GCF_001719145.1.zip

We just download a gff, protein fasta, transcript fasta, genome fasta and more for this accession.


ls -lah  ncbi_dataset/data/*

Lets run some of the bioinformatic tools.

Index the genome, transcriptome, and proteome cause why not.


singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna


singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/cds_from_genomic.fna


singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx ncbi_dataset/data/GCF_001719145.1/protein.faa

We can now see the .fai files have been added.


ls -lah  ncbi_dataset/data/GCF_001719145.1


head ncbi_dataset/data/GCF_001719145.1/protein.faa.fai


singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/protein.faa WP_005916057.1 


singularity exec -B $PWD samtools_1.17-2023-06.sif samtools faidx  ncbi_dataset/data/GCF_001719145.1/protein.faa WP_005916057.1 > longest_prot.fasta


singularity exec -B $PWD blast_2.14.0.sif makeblastdb -in ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -input_type fasta -dbtype nucl


singularity exec -B $PWD blast_2.14.0.sif tblastn -query longest_prot.fasta -db ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -outfmt 7


singularity exec -B $PWD blast_2.14.0.sif tblastn -query longest_prot.fasta -db ncbi_dataset/data/GCF_001719145.1/GCF_001719145.1_ASM171914v1_genomic.fna -outfmt 6 | cut -f2,9,10 > hits.bed 


cat hits.bed 


awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' hits.bed  | singularity exec bedops_v2.4.35dfsg-1-deb_cv1.sif sort-bed - > b.fixed.bed


singularity exec -B $PWD bedtools_2.31.0.sif bedtools merge -i b.fixed.bed


Awk command: https://www.biostars.org/p/304852/