# HG changeset patch # User Steve Losh # Date 1579841958 18000 # Node ID 008a451ebd8e48bc9057338da8d9e43c53f06eeb # Parent 2e4a81e16a6844c8e7fa457ea5983f7188a1f319 Update diff -r 2e4a81e16a68 -r 008a451ebd8e README.markdown --- a/README.markdown Thu Jan 23 00:14:55 2020 -0500 +++ b/README.markdown Thu Jan 23 23:59:18 2020 -0500 @@ -156,7 +156,7 @@ do an ugly hack in `float-string` to make the output consistently match Rosalind's style everywhere. -## 2020-01-23 +## 2020-01-22 Started doing Rosalind problems in shell/Awk, to join in with a group of folks at work doing them for the first time. The first few problems are simple, but @@ -167,3 +167,19 @@ Worked. It's so nice to have a site generator I designed myself that's not going to change out from under me when I just want to fix something simple and move on with my life. + +## 2020-01-23 + +Lesk book exercises, chapter 1: + +1. Average density would be `(/ 3r4 3r9) = 1/100000` or 1 gene every ~100kb. +2. Parts: + A. Two humans would have roughly 250k differences. + B. A human and a chimpanzee would have roughly 3m differences. +3. Parts: + A. Average density would be 2 SNPs per 1000 base pairs. + B. Is this a trick question? I think 1.1% of the differences would be in protein-coding regions. +4. Parts: + A. I am confused. The glossary defines a "haplotype" as a group of closely-linked genes that are typically inherited together. So… 1 haplotype? But the *text* talks about a haplotype as being a combination of SNPs in a recombination-poor region. So if "haplotype" means the combination of SNPs, not a set of genes, then I think this would be `4^10 = 1048576`. + B. On a diploid chromosome, you'd have two separate sets of SNPs to combine (and order doesn't matter), so I think it's `4^10 * 4^10 / 2`. + C. `(1 SNP / 5 kb) * (100 kb) = 20 SNPs = 2^20 sequences = 1048576 sequences`