31f98ce9303a
Update
author | Steve Losh <steve@stevelosh.com> |
---|---|
date | Tue, 25 Feb 2020 11:26:40 -0500 |
parents | 0f1de4b551b0 |
children | 21840941f626 |
branches/tags | (none) |
files | README.markdown |
Changes
--- a/README.markdown Mon Feb 24 23:56:15 2020 -0500 +++ b/README.markdown Tue Feb 25 11:26:40 2020 -0500 @@ -1013,3 +1013,23 @@ out of sync because we grepped out the overrepresented sequences naively. Tried restarting the alignment on the original data and it's still going, so that's probably it. Need to figure out how to filter those bad reads properly tomorrow. + +## 2020-02-25 + +Class. Chatting about QC and such. + +Professor says he asked around and people haven't seen the first 5 bases being +lower quality before, but that the explanation from Illumina makes sense, and +that we should *not* trim those bases unless they seem to be causing issues. +Also talked about how to filter out the overrepresented sequences — he thought +fastx had something for this, but I can't seem to find anything in the +documentation. I may need to write some code. + +Professor installed `pigz` on the server, so I can remove my hacky `xargs` +workaround. + +Need to find a way to filter bad sequences from BOTH files at once. + +Need to align to the NCBI reference GTF instead of the other weird one. + +