19cc0bdaf3cc
Update
author | Steve Losh <steve@stevelosh.com> |
---|---|
date | Sun, 23 Feb 2020 21:50:45 -0500 |
parents | dfec45c7e500 |
children | 4b8e88f0d250 |
branches/tags | (none) |
files | README.markdown |
Changes
--- a/README.markdown Sun Feb 23 18:59:29 2020 -0500 +++ b/README.markdown Sun Feb 23 21:50:45 2020 -0500 @@ -960,7 +960,6 @@ set title "READ COUNTS OF INDIVIDUAL FASTQ FILES" set xlabel "READS (MILLIONS)" - # major x tics every 2 million, with 2 minor divisions per major (i.e. minor tics are every 1 million) set xtics 2 set mxtics 2 @@ -981,3 +980,11 @@ ![plot](https://i.imgur.com/YAPXHaQ.png) Neat! + +Hacked together some Awk to remove overrepresented sequences. **But** I don't +think a simple `grep -v` approach works, because the two FASTQ files are +expected to have the paired reads at the same positions in the file. So if we +remove a read from one file but not the other, now all the reads are going to be +offset. So we need to remove these reads a bit more carefully (really, we need +tools that process the paired-end reads together). Need to think about this +a little bit more.