Cohort RNA-seq data
HBA-DEALS is designed for the analysis of RNA-seq cohort data. The nature of the cohorts is arbitrary but will commonly be labeled as cases and controls or group 1 vs. group2.
In general, there are many ways to obtain such data including generating RNA-seq data or downloading it from an appropriate site. For this tutorial, we download a dataset from the NCBI Sequence Read Archive (SRA), which is the primary archive of NGS datasets.
Downloading from SRA
We will use the SRA Toolkit.
We will use 8 RNA-seq samples from the dataset SRP149366, which investigated estrogen responsive transcriptome of estrogen receptor positive normal human breast cells in 3D cultures.
control |
estradiol |
---|---|
SRR7236472 SRR7236473 SRR7236474 SRR7236475 |
SRR7236480 SRR7236481 SRR7236482 SRR7236483 |
First install fasterq-dump
on your system accorrding to the SRA toolkit instructions. The download page
may be helpful.
Then execute the following command from the shell.
for srr in SRR7236472 SRR7236473 SRR7236474 SRR7236475 SRR7236480 SRR7236481 SRR7236482 SRR7236483; do \
prefetch $srr
fasterq-dump -t tmp/ --split-files --threads 8 --outdir tutorial/ $srr
done
If necessary, change the --threads
argument according to the resources of your system.
The downloaded *.fastq
files will be written to the tutorial
directory.
Other directories that were created (e.g., SRR7236472
, which contains the file SRR7236472.sra
) can be deleted.
This step will typically take a few hours.
Cleaning the reads
fastp performs quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data.
fastp can be downloaded at its GitHub site, which also has installation instructions for various platforms. If you have a debian or Ubuntu system, then the easiest installation is just
sudo apt install fastp
After you have installed fastp, run the following command in the shell.
for srr in SRR7236472 SRR7236473 SRR7236474 SRR7236475 SRR7236480 SRR7236481 SRR7236482 SRR7236483; do \
fastp -i tutorial/${srr}_1.fastq -I tutorial/${srr}_2.fastq -o tutorial/${srr}_trimmed_1.fastq -O tutorial/${srr}_trimmed_2.fastq; \
done
This will create for each .fastq file that was downloaded a file by the same name with an added “_trimmed_” in its name.
At this point, you can delete the original *.fastq
files if desired.