# Read Quantification Learning objectives: * Learn mapping and differential gene expression analysis of rna-seq data * Interpret rna-seq analysis results ## Make a new working directory and link the trimmed reads and assembly We will be using the same trimmed fastq data as before ([Tulin et al., 2013](https://evodevojournal.biomedcentral.com/articles/10.1186/2041-9139-4-16)). The following commands will create a new folder `rnaseq` and link the data in: ``` cd .. mkdir -p rnaseq cd rnaseq ln -s ../trim/0Hour*.qc.fq.gz . ln -s ../trim/6Hour*.qc.fq.gz . ln -s /opt/rnaseq/assembly/nema_trinity/Trinity.fasta . ``` Note that this is a previously-assembly `Trinity.fasta` file that the whole class will use. Why is this important? ## Index the assembly: ``` salmon index --index nema --type quasi --transcripts Trinity.fasta ``` ## Run salmon on all the samples: * What are the flags used in the salmon command? * Read up on [libtype, here](https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype). ``` for R1 in *R1*.qc.fq.gz do sample=$(basename $R1 extract.qc.fq.gz) echo sample is $sample, R1 is $R1 R2=${R1/R1/R2} echo R2 is $R2 salmon quant -i nema -p 2 -l IU -1 <(gunzip -c $R1) -2 <(gunzip -c $R2) -o ${sample}.quant done ``` ## Take a look at quant output * [Examples of exploring the output](https://github.com/ngs-docs/2015-nov-adv-rna/blob/master/salmon.rst) ``` head 0Hour_ATCACG_L002_R1_001.qc.fq.gz.quant/quant.sf ``` ## Look at all the mapping rates: ``` find . -name \salmon_quant.log -exec grep -H "Mapping rate" {} \; ``` ## More reading * Schurch et al. 2016. ["How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?"](http://rnajournal.cshlp.org/content/22/6/839). * Patro et al. 2017 ["Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600148/). * [Salmon documentation](https://salmon.readthedocs.io/en/latest/salmon.html)