19cc0bdaf3cc

Update
[view raw] [browse files]
author Steve Losh <steve@stevelosh.com>
date Sun, 23 Feb 2020 21:50:45 -0500
parents dfec45c7e500
children 4b8e88f0d250
branches/tags (none)
files README.markdown

Changes

--- a/README.markdown	Sun Feb 23 18:59:29 2020 -0500
+++ b/README.markdown	Sun Feb 23 21:50:45 2020 -0500
@@ -960,7 +960,6 @@
     set title "READ COUNTS OF INDIVIDUAL FASTQ FILES"
     set xlabel "READS (MILLIONS)"
 
-
     # major x tics every 2 million, with 2 minor divisions per major (i.e. minor tics are every 1 million)
     set xtics 2
     set mxtics 2
@@ -981,3 +980,11 @@
 ![plot](https://i.imgur.com/YAPXHaQ.png)
 
 Neat!
+
+Hacked together some Awk to remove overrepresented sequences.  **But** I don't
+think a simple `grep -v` approach works, because the two FASTQ files are
+expected to have the paired reads at the same positions in the file.  So if we
+remove a read from one file but not the other, now all the reads are going to be
+offset.  So we need to remove these reads a bit more carefully (really, we need
+tools that process the paired-end reads together).  Need to think about this
+a little bit more.