10x Genomics Bioinformatic Workshop
Jackson Labs 2018
To get to this notebook please navigate your browser to 10x-tutorial/Jax_Workshop.nb.html on the web UI
The purpose of this tutorial will be to walk new users through some of the steps necessary to explore Whole Genome (WGS) and Whole Exome (WES) sequencing data generated form the 10x Genomics Chromium platform and the Longranger pipeline. We will investigate the Linked-Read data using a variety of tools, all of which are freely available either from 10x Genomics or their home repos (GitHub, SourceForge, etc.)
Things to know about this workshop
/opt/data/10x-data/
Figure 1. Chromium Genome Workflow
ssh ubuntu@ip_address
username: ubuntu
password: jgm2018
Instructors will take care of this
Once you’ve logged in your home directory should look like this
cd /home/ubuntu
sequser@ip-172-31-63-156:/home/ubuntu$ ls -l
total 36
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 2 17:44 10x-bam-files
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 10 21:34 10x-loupe-files
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 11 06:21 10x-tutorial
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 11 16:35 10x-vcf-files
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 11 16:20 figures
drwxrwxr-x 3 ubuntu ubuntu 4096 Mar 30 21:02 igv
drwxrwxr-x 15 ubuntu ubuntu 4096 Apr 2 18:33 miniconda2
drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 30 17:38 misc
drwxr-xr-x 3 ubuntu ubuntu 4096 Apr 11 14:23 R
On Windows you’ll probably need an application like PuTTy
Okay, you’ve done your Chromium prep, sent it through your favorite Illumina (or BGI) sequencer and now you’ve got a BCL file. What’s next?
The instructions to convert BCL to FASTQ can be found here
An example bcl to do some testing on can be found in /opt/data/10x-data/tiny-bcl-2.0.0
Let’s do the conversion
nohup longranger mkfastq --id=tiny-bcl \
--run=/opt/data/10x-data/tiny-bcl-2.0.0 \
--csv=/opt/data/10x-data/tiny-bcl-simple-2.1.0.csv \
--localmem=4 >& mkftq.out &
You should see something like this as the output. Note: All 10x pipelines run under the Martian Framework. In the 10x universe a pipeline is the software you run (e.g.. Longranger) and a pipestance is the functional directory that is created that the pipeline uses while it’s running.
longranger mkfastq (2.2.2)
Copyright (c) 2018 10x Genomics, Inc. All rights reserved.
-------------------------------------------------------------------------------
Martian Runtime - '2.2.2-2.3.2'
Serving UI at http://bespin1.fuzzplex.com:35831
Running preflight checks (please wait)...
Checking run folder...
Checking RunInfo.xml...
Checking system environment...
Checking barcode whitelist...
Checking read specification...
Checking samplesheet specs...
Checking for dual index flowcell...
2018-04-11 13:09:16 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2018-04-11 13:09:19 [runtime] (split_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2018-04-11 13:09:19 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET.fork0.chnk0.main
2018-04-11 13:09:25 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2018-04-11 13:09:28 [runtime] (join_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2018-04-11 13:09:31 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2018-04-11 13:09:31 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.split
2018-04-11 13:09:34 [runtime] (split_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2018-04-11 13:09:34 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.chnk0.main
2018-04-11 13:10:07 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2018-04-11 13:10:07 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.join
2018-04-11 13:10:10 [runtime] (join_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2018-04-11 13:10:13 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY
2018-04-11 13:10:13 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.split
2018-04-11 13:10:19 [runtime] (split_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY
2018-04-11 13:10:19 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.chnk0.main
2018-04-11 13:10:19 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.chnk1.main
2018-04-11 13:10:19 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.chnk2.main
2018-04-11 13:10:19 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.chnk3.main
2018-04-11 13:11:49 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY
2018-04-11 13:11:49 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_QC_SUMMARY.fork0.join
Outputs:
- Run QC metrics: /mnt/home/stephen/yard/Jax_workshop/Longranger/tiny-bcl/outs/qc_summary.json
- FASTQ output folder: /mnt/home/stephen/yard/Jax_workshop/Longranger/tiny-bcl/outs/fastq_path
- Interop output folder: /mnt/home/stephen/yard/Jax_workshop/Longranger/tiny-bcl/outs/interop_path
- Input samplesheet: /mnt/home/stephen/yard/Jax_workshop/Longranger/tiny-bcl/outs/input_samplesheet.csv
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
Saving pipestance info to tiny-bcl/tiny-bcl.mri.tgz
Your fastqs will be in the flowcell directory H77WWBBXX
. All undetermined reads are stored in the ‘Undetermined’ fastqs. These can be automatically deleted by running longranger mkfastq
with the --delete-undetermined
flag.
You’ve got your fastqs made. Let’s do a small alignment. Even though 10x currently recommends using GATK for Longranger alignments freebayes is already baked in so we’ll use that for our example.
Note: The reasons for using GATK over freebayes are many but it boils down to better variant calling and Indel sensitivity/specificity which, among other things, leads to better phasing. This can have downstream effects on SV calling as well. A detailed report of the latest version of Longranger and the advantages that GATK provides over freebayes can be found on bioRxiv.
nohup longranger wgs --id=Sample_example \
--fastqs=/mnt/home/stephen/yard/Jax_workshop/Longranger/tiny-bcl/outs/fastq_path/H77WWBBXX/Sample \
--reference=/mnt/opt/refdata_new/hg19-2.2.0/ \
--localmem=4
--vcmode=freebayes >& wgs.out &
Your output should look something like this as it runs. Make sure to check out the UI where your pipestance is running. This is an interactive workflow that has lots of information contained within it.
longranger wgs (2.2.2)
Copyright (c) 2018 10x Genomics, Inc. All rights reserved.
-------------------------------------------------------------------------------
Martian Runtime - '2.2.2-2.3.2'
Serving UI at http://bespin1.fuzzplex.com:41335?auth=tRqwdXhvQ8vlyWHd6os8ccZ2GXgzYuDKfdYQaxnkZn0
Running preflight checks (please wait)...
2018-04-11 13:31:41 [runtime] (ready) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._COMBINED_SV_CALLER._TERMINAL_CNV_CALLER.PREPARE_TERMINAL_CNV_GT_AND_BINSIZE
2018-04-11 13:31:41 [runtime] (ready) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH
2018-04-11 13:31:44 [runtime] (ready) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-04-11 13:31:44 [runtime] (split_complete) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH
2018-04-11 13:31:44 [runtime] (run:local) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH.fork0.chnk0.main
2018-04-11 13:31:44 [runtime] (split_complete) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._COMBINED_SV_CALLER._TERMINAL_CNV_CALLER.PREPARE_TERMINAL_CNV_GT_AND_BINSIZE
2018-04-11 13:31:44 [runtime] (run:local) ID.Sample_example.PHASER_SVCALLER_CS.PHASER_SVCALLER._COMBINED_SV_CALLER._TERMINAL_CNV_CALLER.PREPARE_TERMINAL_CNV_GT_AND_BINSIZE.fork0.chnk0.main
Figure 2. Example WGS pipestance. Each bubble is an analysis stage and all lines represent passage of data from one state to another. Whenever possible stages are broken into chunks to allow for parallel processing of data.
The 10x/Linked-Read .bam file contains much of the same information that a typical short read .bam would, but like the VCF has some extra information. Documents covering the the 10x .bam spec can be found here
If we take a look we can see some interesting features:
cd /home/ubuntu/10x-bam-files
samtools view -h NA12878_chr21_phased_possorted_exome_bam.bam | less
@PG ID:lariat PN:longranger.lariat CL:lariat -reads=/mnt/analysis/marsoc/pipestances/HGKNJBBXX/PHASER_SVCALLER_EXOME_PD/49255/1016.1.1-0/PHASER_SVCALLER_EXOME_PD/PHASER_SVCALLER_EXOME/_LINKED_READS_ALIGNER/_FASTQ_PREP_NEW/SORT_FASTQS/fork0/join-u29c07c9de1/fi
les/chunk-0.fasth.gz -read_groups=49255:MissingLibrary:1:unknown_fc:0 -genome=/mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa -sample_id=49255 -threads=4 -centromeres=/mnt/opt/refdata_new/hg19-2.0.0/regions/centromeres.tsv -trim_length=7 -first_chunk=True -output=/mnt/ana
lysis/marsoc/pipestances/HGKNJBBXX/PHASER_SVCALLER_EXOME_PD/49255/1016.1.1-0/PHASER_SVCALLER_EXOME_PD/PHASER_SVCALLER_EXOME/_LINKED_READS_ALIGNER/BARCODE_AWARE_ALIGNER/fork0/chnk000-u29c07c9e62/files VN:'576387f'
@PG ID:attach_phasing PN:longranger.attach_phasing PP:lariat VN:1016.1.1
@PG ID:longranger PN:longranger PP:attach_phasing VN:1016.1.1
@CO 10x_bam_to_fastq:R1(RX:QX,TR:TQ,SEQ:QUAL)
@CO 10x_bam_to_fastq:R2(SEQ:QUAL)
@CO 10x_bam_to_fastq:I1(BC:QT)
ST-K00126:334:HGKNJBBXX:4:2118:26920:14519 163 chr21 9412940 39 92M8S = 9412953 90 GGAGTTGTATTGGTGCAGGAAGGGGAGTTTGATTTAATGAAACAATGCATTAAAAATTTGTATTCACTTTGTGATTCAATGATAGTCAATGTTAACATAA AAA<FAJFFJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
JJJJJJJJFAFF<FJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ RX:Z:GACACATCAGCTGTTA QX:Z:AAFFFJJJJJJJJJJJ BC:Z:TCTCGGGC QT:Z:AAFFFJJJ XS:i:-13 AS:i:-9 XM:A:0 AM:A:0 XT:i:0 BX:Z:GACACATCAGCTGTTA-1 RG:Z:49255:MissingLibrary:1:unknown_fc:0 OM:i:39
ST-K00126:334:HGKNJBBXX:4:2118:26920:14519 83 chr21 9412953 39 77M = 9412940 -90 TGCAGGAAGGGGAGTTTGATTTAATGAAACAATGCATTAAAAATTTGTATTCACTTTGTGATTCAATGATAGTCAAT FJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJ
RX:Z:GACACATCAGCTGTTA QX:Z:AAFFFJJJJJJJJJJJ TR:Z:TGTTAAC TQ:Z:JJJJJJJ BC:Z:TCTCGGGC QT:Z:AAFFFJJJ XS:i:-13 AS:i:-9 XM:A:0 AM:A:0 XT:i:0 BX:Z:GACACATCAGCTGTTA-1 RG:Z:49255:MissingLibrary:1:unknown_fc:0 OM:i:39
ST-K00126:334:HGKNJBBXX:4:1114:32644:45010 99 chr21 9413248 39 77M = 9413263 115 TGAATATTTTCTCAGCAACTGTGGTGTTATGATATATATTGGTTTTCATCCCCAGTTCCTGGCTTATAACTCCCCTA FF<J<FJJJ-J<JFAJFJJAJ-A-<7<A--FJ-AJJJFFFFJF-<FFF-F--7A<FF-<AF<JA-A-JJ-<<7FFF<
RX:Z:NAGGGTGAGGCATGGT QX:Z:#<<AAFFJJFJJA<J< TR:Z:TTCCGCA TQ:Z:<FJJJFA BC:Z:TCTCGGGC QT:Z:AAAFFJJJ XS:i:-12 AS:i:-8 XM:A:0 AM:A:0 XT:i:0 BX:Z:AAGGGTGAGGCATGGT-1 RG:Z:49255:MissingLibrary:1:unknown_fc:0 OM:i:39
Things to look for:
BX:Z:GACACATCAGCTGTTA-1
Note: Definitions of all the .bam tags can be found here
Informatics Tip: if you’d like to search for all the reads associated with a list of barcodes, this is the fastest way to do it (will need ripgrep which is installed on your AWS instance)
samtools view -@ 5 possorted_exome_bam.bam | rg -j 5 --no-line-number -F -f BX_list.txt > BC_reads.sam
Some 10x users have also asked about local de novo assembly of a locus of interest. An example workflow for that (completely unsupported) workflow can be found here.
There are a few things that that make the 10x VCF unique. Overall 10x abides by the VCF 4.x standard. However, there is some additional information that takes advantage of the 10x specific technology. Documents covering the the 10x VCF spec can be found here. A comprehensive list of the available outputs of Longranger can be found here.
If we navigate to a 10x VCF and have a look
cd /home/ubuntu/10x-vcf-files
ubuntu@ip-172-31-63-156:~/10x-vcf-files$ ls
total 1014M
-rw-rw-r-- 1 ubuntu ubuntu 43M Apr 11 16:34 NA12878_wes_varcalls.vcf.gz
-rw-rw-r-- 1 ubuntu ubuntu 882K Apr 11 16:34 NA12878_wes_varcalls.vcf.gz.tbi
-rw-rw-r-- 1 ubuntu ubuntu 968M Apr 2 18:46 NA12878_wgs_varcalls.vcf.gz
-rw-rw-r-- 1 ubuntu ubuntu 1.9M Apr 2 18:42 NA12878_wgs_varcalls.vcf.gz.tbi
zcat NA12878_wes_varcalls.vcf.gz | less
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 49255
chr1 12198 . G C 1355.77 PASS AC=2;AF=1.0;AN=2;BaseQRankSum=0.132;ClippingRankSum=0.0;DP=42;ExcessHet=3.0103;FS=0.0;MLEAC=2;MLEAF=1.0;MQ=53.23;MQRankSum=1.593;QD=32.28;ReadPosRankSum=1.653;SOR=0.286;MUMAP_REF=3.73913;MUMAP_ALT=24.775;AO=34;RO=0
;MMD=0.885642;RESCUED=1;NOT_RESCUED=102;HAPLOCALLED=0 GT:AD:DP:GQ:PL:BX:PS 1|1:1,41:42:92:1384,92,0:,AACTCCCTCTTGGTAG-1_74;TTGCGTCAGACGCCCT-1_74;GTACATGAGGTGCGTA-1_65;TAAGGCTCAAGGGTGT-1_74;CACAGTACACATTGGT-1_74_74;GCAAACTGTAAACGGC-1_70;CGGATTATCTGAACTG-1_70;ATGCA
ACTCTCCCAAC-1_60;CATCGGGTCATAACCG-1_74;ACACAACAGAGAGGCG-1_74;GTAGTCAAGGCCGAAT-1_74;TGCATCCTCGTCTGAA-1_74;ATCATGGTCATCGAGT-1_70;CAGCTAACAAGAGTCG-1_60_70;GGGCCATTCTACAAGC-1_55;GTGACCGTCACAGCCG-1_70;GTTTCATGTACGAGAC-1_74;TTAGTCTAGCTAAGTA-1_74_55_74;GTCGACGCACTAAGGG-1_74;GC
GCTGAAGCGATTAA-1_45;TGCAACAGTTCTGTTT-1_70;GGCAACCAGAGACTAT-1_70;CGCACTTTCCATCGTC-1_74_74;GGCGACTAGCGTGTGA-1_74;TAGGCGCCAGGTCAAG-1_74;TGCTAAGAGGTTTCGT-1_74_74;GTCCTCATCTTCCTTC-1_74;ACTGAGTGTCCATTGA-1_74:12198
chr1 12305 . C T 18.82 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=1.097;ClippingRankSum=0.0;DP=12;ExcessHet=3.0103;FS=0.0;MLEAC=1;MLEAF=0.5;MQ=51.41;MQRankSum=-0.23;QD=1.71;ReadPosRankSum=-0.253;SOR=0.086;MUMAP_REF=13.6977;MUMAP_ALT=15.6667;AO=2;RO=9;MMD=0.901709;RESCUED=0;NOT_RESCUED=49;HAPLOCALLED=0 GT:AD:DP:GQ:PL:BX:PS:PQ:JQ 0/1:9,2:11:47:47,0,315:TTAGTCTAGCTAAGTA-1_70;GTCGACGCACTAAGGG-1_74;ACCACGGCATACAGAA-1_55;ACACAACAGAGAGGCG-1_70;TGCATCCTCGTCTGAA-1_65;CGCACTTTCCATCGTC-1_74;ACTGGGCGTCCATCCT-1_74;GTGCTCTCAAGTACAA-1_70;GGCGACTAGCGTGTGA-1_70,GTAGTCAAGGCCGAAT-1_74;ATCATGGTCATCGAGT-1_74:1:10:255
chr1 12383 . G A 172.74 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-0.674;ClippingRankSum=0.0;DP=9;ExcessHet=3.0103;FS=6.021;MLEAC=1;MLEAF=0.5;MQ=49.26;MQRankSum=0.887;QD=21.59;ReadPosRankSum=-0.489;SOR=2.526;MUMAP_REF=7.0;MUMAP_ALT=26.4706;AO=8;RO=0;MMD=0.898793;RESCUED=0;NOT_RESCUED=21;HAPLOCALLED=0 GT:AD:DP:GQ:PL:BX:PS:PQ:JQ 1|0:1,7:8:6:200,0,6:,GAAATGAAGTTTCTTC-1_60;ACCACGGCATACAGAA-1_74;TTGCGTCAGACGCCCT-1_74;AGGAGACCATGTATGC-1_55;TGCATCCTCGTCTGAA-1_74;GCAAACTGTAAACGGC-1_70;GACGCGTGTCCTCGGA-1_45;TCCATCGAGGGCGAAG-1_70:1:44:36
Not only do you see some of the typical things
You can also see some of the extra 10x “stuff”. Mostly in the FORMAT field
;
separated barcodes cover the first allele followed by a ,
which separates barcodes associated with reads covering the second alleleThis extra information can be very useful for looking at variants that may or may not be in cis or trans. This can be especially useful if you have compound heterozygote variants. All the alleles on one side of the separator (|
) with the same PS
are from the same haplotype (thus same parent).
Note: For GT, |
represents a phased variant \
represents an unphased variant. Definitions of all the VCF tags are available here
Informatics Tip: If have a variant of interest and you’d like to look at all the SNPs within the same phaseblock (for example chr16 2042353
in NA12878_wes_varcalls.vcf.gz
) you you can simply run
zcat NA12878_wes_varcalls.vcf.gz | rg '1308074'
This will output all called SNPs, both filtered and unfiltered, that land in the paseblock of interest. SNPs with the same PS
number are in cis with each other. This is very informative when you have a potential compound variant.
The 10x VCF is fully compatible with downstream analysis tools such as Variant Effect Predictor, snpEFF, and Gemini to name a few.
IGV is one of the most common tools used in the field of genomics to view a variety of different data types. If you do not have IGV (on your laptop not your terminal), or don’t have the latest version (2.4), please download and install it from either home-brew brew install igv
or conda conda install igv
. If you don’t use conda or home-brew IGV can be manually install here: https://software.broadinstitute.org/software/igv/download.
Note: Type which igv
will show you where IGV has been installed. You can navigate to that directory and click on igv.jar.
First open IGV and load the 10x data. There are two .bam files that can be explored. Here’s a snapshot of the 10x-bam-files
directory
ubuntu@ip-172-31-63-156:~/10x-bam-files$ ls /home/ubuntu/10x-bam-files
total 1.3G
-rw-rw-r-- 1 ubuntu ubuntu 64M Apr 2 17:37 NA12878_chr21_phased_possorted_exome_bam.bam
-rw-rw-r-- 1 ubuntu ubuntu 83K Apr 2 17:37 NA12878_chr21_phased_possorted_exome_bam.bam.bai
-rw-rw-r-- 1 ubuntu ubuntu 1.2G Apr 2 17:41 NA12878_chr21_phased_possorted_WGS_bam.bam
-rw-rw-r-- 1 ubuntu ubuntu 116K Apr 2 17:37 NA12878_chr21_phased_possorted_WGS_bam.bam.bai
-rw-rw-r-- 1 ubuntu ubuntu 891 Apr 2 17:44 README.md
In order to load one of the .bam files follow the following steps.
/10x-bam-files/
directory on the ftpDepending on what view you are in you might see reads paired in a variety of different ways. To show some of the special features of the 10x data:
To view the benefit of phasing navigate to chr21:44,485,330-44,485,370
This will order the reads by barcode and, if possible, show phasing of the region that you are investigating. Groups of reads will be “linked” to each other by the individual barcodes associated with the single molecule that the reads originated from. The reads and barcodes will also be separated into phased haplotypes 1 (red) and 2 (blue). Those reads that could not be phased are represented by grey lines. These unphased reads are still useful and are utilized in most steps of Longranger.
Some things to keep in mind when thinking about 10x data
The Loupe Browser is a 10x specific genome browser that more fully captures some of the enhanced information that Linked-Reads will get you in your WGS or WES experiments. Loupe is fully integrated into the Longranger pipeline and .loupe files are automatically generated by default.
If you look at the 10x-loupe-files
directory you can see three loupe files to explore.
ubuntu@ip-172-31-63-156:~/10x-loupe-files$ ls /home/ubuntu/10x-loupe-files
total 422M
-rw-rw-r-- 1 ubuntu ubuntu 40M Dec 8 09:02 LungTumorT.loupe
-rw-rw-r-- 1 ubuntu ubuntu 383M Apr 10 20:05 NA12878_exome.loupe
First let’s go to the Loupe browser and click on NA12878_exome.loupe
. This will bring us to the main page of Loupe which looks like this.
Figure 3. The Loupe main page. Loupe is a 10x Genomics genome browser specifically designed to visualize 10x Linked-Read data.
As you can see we have some nice statistics about the performance of our sequencing experiment including:
Let’s navigate to chr17:41,074,530-41,399,282
Figure 4. Haplotypes view in the Loupe genome browser.
There are a lot of “clickable” features to see here:
Figure 5. Variant types represented in Loupe genome browser.
Here you can see a similar output to that of IGV but it is a bit more digestible for viewing linked reads.
Figure 6. Linked-Read view in the Loupe genome browser. Haplotype 1 is shown in yellow, haplotype 2 is shown in purple. Grey reads/barcodes could not be assigned to a haplotype
Once again, we can see reads clearly phased by haplotype and reads that do not get phased in grey.
If we open up the NA12878_wgs.loupe file, navigate to chr2:34,595,838-34,795,838
, and click on Linked-Reads we can very clearly see a hemizygous deletion in both the
Linked-Read View
Figure 7. Hemizygous deletion example. View of a deletion on chr2 showing phased reads absent in haplotype 1 and present in haplotype 2.
There is a whole other world of investigating SVs in 10x data. If we look at the Structural Variant View in Loupe We can see some nice examples SVs.
2 34,695,830 AC073218...DEL
93
2 34,736,552 AC073218...40.7 kb
All 10x specific software and information about 10x specific file formats can be found here
A work by Stephen R. Williams, PhD