8de6c03950e7

Move month
[view raw] [browse files]
author Steve Losh <steve@stevelosh.com>
date Sun, 22 Oct 2023 20:15:48 -0400 (15 months ago)
parents f76b2070bb71
children 61377bfda672
branches/tags (none)
files 2023.markdown README.markdown

Changes

--- a/2023.markdown	Sun Oct 22 20:15:21 2023 -0400
+++ b/2023.markdown	Sun Oct 22 20:15:48 2023 -0400
@@ -795,3 +795,1092 @@
 The number of recombination events (crossovers) per chromosome is random, but is
 usually relatively low (3-5 per chromosome).
 
+# September 2023
+
+## 2023-09-01
+
+HG545.  Looked over the slides last night and was a little worried, but felt
+okay after the lecture for the most part.  Still a few things I need to look up
+and I do still need to get my fleeting notes into this, but I feel okay.
+
+Continuing the Snakemake tutorial.
+
+Threads can be specified for a given job with `threads: 8`, and you need to
+propagate that to the command yourself with `{threads}`.  Will be scaled down if
+run with fewer cores than threads, otherwise will wait until that many are
+available.
+
+Snakemake has some support for noticing log files, but it seems like you have to
+manually create them yourself?  This seems… tedious?
+
+    rule bwa_map:
+        input:
+            "data/genome.fa",
+            lambda wc: SAMPLES[wc.sample]
+        output:
+            "mapped_reads/{sample}.bam"
+        threads: 8
+        params:
+            rg=r"@RG\tID:{sample}\tSM:{sample}"
+        log: "logs/bwa_map/{sample}.log"
+        shell:
+            "("
+            "bwa mem -R '{params.rg}' -t {threads} {input}"
+            " | samtools view -Sb - > {output}"
+            ") >{log} 2>&1"
+
+Do I really have to wrap everything in `(…) >{log} 2>&1` by hand myself?
+
+You can get a summary of file provenance with `snakemake --summary`.  The output
+is a TSV, so I went down a rathole of pretty-printing TSVs and eventually found
+that `| column -s $'\t' -t` works (mnemonic: `s$tt`).  I love how every UNIX
+program gets to invent its own bespoke command line interface for specifying
+special characters. Really great.
+
+Can mark outputs as `temp()` and `protected()`, which is nice.
+
+Need to install singularity *inside* my VM:
+
+    # Ensure repositories are up-to-date
+    sudo apt-get update
+
+    # Install debian packages for dependencies
+    sudo apt-get install -y \
+        wget \
+        build-essential \
+        libseccomp-dev \
+        libglib2.0-dev \
+        pkg-config \
+        squashfs-tools \
+        cryptsetup \
+        runc
+
+    # Install Golang
+    export VERSION=1.21.0 OS=linux ARCH=amd64 && \
+        wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
+        sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
+        rm go$VERSION.$OS-$ARCH.tar.gz
+
+    echo 'export PATH=/usr/local/go/bin:$PATH' >> ~/.bashrc && \
+        source ~/.bashrc
+
+    # Install Singularity
+    export VERSION=3.11.4 && \
+        wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-ce-${VERSION}.tar.gz && \
+        tar -xzf singularity-ce-${VERSION}.tar.gz && \
+        cd singularity-ce-${VERSION}
+
+    ./mconfig && \
+        make -C builddir && \
+        sudo make -C builddir install
+
+## 2023-09-02
+
+It is time to shave the LaTeX yak again. Installed it with `texlive-latex-base`
+to start, we'll see if I need to add some more crud in later. Going to go
+through some guides for now.
+
+Going to note some things to remember.  Skeleton of document:
+
+    \documentclass{article}
+    \begin{document}
+
+    Basic text.
+
+    \end{document}
+
+Math:
+
+    Inline math $y = 3 \sin x$ example.
+
+    Block equation:
+    \[
+            y = 3 \sin x
+    \]
+
+    With reference:
+    \begin{equation}\label{equa}
+        y' = 3 \cos x
+    \end{equation}
+    refer to it by label, e.g. equation (\ref{equa}).
+
+    More complicated: $x^2$ and $x^{2+\alpha}$ and $y_{n+1}$.
+
+Verbatim:
+
+    Verbatim text: \verb"$x^{2+\alpha}$".  Delimiter can be anything ala sed,
+    \verb_%%&_ or \verb+$$+.
+
+    Must escape special characters \&, \$, \%, \_, \{, \}, and \#.
+
+    \begin{verbatim}
+    A whole verbatim region.
+
+    (defun square (x)
+    (* x x))
+    \end{verbatim}
+
+Comments:
+
+    Comments exist.  % This is a comment.
+
+Type styles:
+
+    Shapes:
+    \textup{Upright}
+    \textit{Italic}
+    \textsl{Slanted}
+    \textsc{Small}
+
+    Series (weight):
+    \textmd{Medium}
+    \textbf{Boldface}
+
+    Families:
+    \textrm{Roman}
+    \textsf{Sans}
+    \texttt{Typewriter}
+
+Emphasis:
+
+    \emph{Never} do Foo!
+
+"Environments" are sections that are treated differently, made with `\begin{…}`
+and `\end{…}`.
+
+Lists:
+
+    Unordered list:
+    \begin{itemize}
+        \item Foo
+        \item Bar
+        \item Baz
+    \end{itemize}
+
+    Ordered list:
+    \begin{enumerate}
+        \item One
+        \item Two
+        \item Three
+    \end{enumerate}
+
+    Customizable labels:
+    \begin{description}
+        \item[Rule 1.] Foo
+        \item[Rule 2.] Bar
+        \item[Rule 3.] Baz
+    \end{description}
+
+Sizes (note the brace comes BEFORE the command!):
+
+    {\Huge Huge}
+    {\huge huge}
+    {\LARGE LARGE}
+    {\Large Large}
+    {\large large}
+    {\normalsize normalsize}
+    {\small small}
+    {\footnotesize footnotesize}
+    {\scriptsize scriptsize}
+    {\tiny tiny}
+
+Centering:
+
+    \begin{center}
+        {\large\textbf{Assignment 1}}\\% The \\ linebreaks.
+        Steve Losh\\
+        BS521
+    \end{center}
+
+Example table.
+
+    \begin{tabular}{l|rc} % lrc = cols should be left, right, centered, pipe for vertical line
+        Name & Mark & Grade \\
+        \hline\hline
+        Foo & 99 & A+ \\
+        Bar & 51 & C  \\
+        Baz &  5 & F
+    \end{tabular}
+
+Colspan with multicolumn command.
+
+    \begin{tabular}{|l||r|r|}
+        \hline
+            & \multicolumn{2}{c|}{Grades} \\
+                \cline{2-3}
+        Name & Class 1 & Class 2 \\
+        \hline\hline
+        Foo & 99 & 88 \\
+        Bar & 51 & 65  \\
+        Baz &  5 & 58 \\
+        \hline
+    \end{tabular}
+
+Full example, with referencing and caption, e.g. `Table~\ref{tab:a} on page~\pageref{tab:a}`.
+
+    % b = try to put at Bottom.  Also t top, h here, p separate page.
+    % Can do multiple in order of preference.
+    % [!t] ! = try harder
+    \begin{table}[b]
+        \begin{center}
+            \caption{An Example Table}
+            \label{tab:a}
+
+            \begin{tabular}{lr}
+                Name & Value \\
+                \hline
+                Foo &  1.0 \\
+                Bar & 15.9 \\
+                Baz &  6.2
+            \end{tabular}
+            % \caption{Caption at the end works too.}
+        \end{center}
+    \end{table}
+
+Sections:
+
+    \section{Some section} % includes numbering
+    \subsection{Some subsection}
+
+    \section*{Some section} % no numbering
+    \subsection*{Some subsection}
+
+Quotation marks (hilarious):
+
+    `Single quoted'
+    ``Double quoted''
+
+Change overall text size (simple):
+
+    \documentclass[11pt]{article} % only valid sizes are 10/11/12.
+
+Palatino instead of Computer Modern:
+
+    \usepackage{mathpazo}
+    \linespread{1.05}         % needs more leading (space between lines)
+    \usepackage[T1]{fontenc}
+
+Vimtex stuff:
+
+* Close thing with `]]` in insert mode.
+
+Got about that far, which was enough to start my BIOSTAT-521 homework.  Will dig
+in again later, but it's nice to be able to use it for something real to
+practice.
+
+Puttered around a bit looking at other fonts, but didn't find anything new or
+interesting.
+
+## 2023-09-03
+
+Spent most of today getting the reading room in my apartment ready.  Went to
+IKEA, looked around a lot and got some ideas, picked up a chair and bookshelf
+for my still-boxed unread books.  Not the most productive Sunday, but sitting in
+my chair and reading at night felt good.
+
+## 2023-09-04
+
+Continuing the EdX genetics course over breakfast.
+
+## 2023-09-05
+
+BIOSTAT-521 and BIOINF-500 classes this morning.
+
+Going to spend my non-class time looking into Unicycler today (and taking care
+of paperwork if anything that needs my attention crops up) and Bandage to
+visualize the results.
+
+Grabbed the container from StaPH-B and tried to figure out where the executable
+is.  I think it's `/unicycler/unicycler-runner.py`.  Grabbed the sample data
+from the Unicycler repo and got something running:
+
+    singularity exec containers/unicycler.sif \
+        /unicycler/unicycler-runner.py \
+        --short1 sample_data/short_reads_1.fastq.gz \
+        --short2 sample_data/short_reads_2.fastq.gz \
+        --out assembly/
+
+Pretty straightforward so far.  The result directory contains a log and a bunch
+of GFA files.  GFA apparently stands for [Graphical Fragment
+Assembly](https://gfa-spec.github.io/GFA-spec/GFA1.html), but I think it's
+"graphical" in the "directed/undirected graph" sense and not in the "pixels"
+sense.  Text-based format which seems pretty straightforward to parse (maybe
+I should have a go at parsing it for fun).
+
+Installed Bandage and viewed the results.  Not sure what exactly I'm looking at,
+but it works and looks pretty enough I guess.  [This
+example](https://github.com/rrwick/Bandage/wiki/Simple-example) was helpful to
+get a sense of what it's trying to show me.
+
+Unicycler has some configuration knobs to tweak:
+
+> Unicycler can be run in three modes: conservative, normal (the default) and
+> bold, set with the --mode option. Conservative mode is least likely to produce
+> a complete assembly but has a very low risk of misassembly. Bold mode is most
+> likely to produce a complete assembly but carries greater risk of misassembly.
+> Normal mode is intermediate regarding both completeness and misassembly risk.
+
+Reran with both conservative and bold modes and looked at the difference in the
+results for the sample data.  They're not the same, but I can't visually detect
+any major obvious differences.  Maybe it's not a big deal on this sample.
+
+Once I got that running in a shell, I got it ported into Snakemake, shaving
+a bunch of yaks along the way.
+
+I noticed that Snakemake can take a JSON config file instead of YAML. Fantastic,
+switched over to that right away:
+
+    {
+        "containers": {
+            "fastqc":    "docker://staphb/fastqc:0.12.1",
+            "unicycler": "docker://staphb/unicycler:0.5.0"
+        },
+        "samples": {
+            "short_read_example": [
+                "sample_data/short_reads_1.fastq.gz",
+                "sample_data/short_reads_2.fastq.gz"
+            ]
+        }
+    }
+
+Got the containers downloading via snakemake as well, so it's snakes all the way
+down:
+
+    rule containers:
+        input:
+            expand("containers/{name}.sif", name=config["containers"].keys()),
+
+    rule container:
+        output:
+            "containers/{name}.sif",
+        params:
+            source=lambda wc: config["containers"][wc.name],
+        shell:
+            "singularity pull {output} {params.source}"
+
+Got `snakefmt` working with Neoformat so I can `F6` in Vim to reformat.  Had to
+fuck around with the config because it was just emptying out the file — I think
+the key was that:
+
+* `snakefmt` edits in-place by default (gross).
+* Need to use `replace`: `1` in the Neoformat config to deal with this.
+
+And finally we can assemble:
+
+    def get_input_fastqs(wildcards):
+        return config["samples"][wildcards.sample]
+
+    rule unicycler_assemble:
+        log:
+            "logs/unicycler_assemble/{sample}.log",
+        container:
+            "containers/unicycler.sif"
+        threads: 8
+        input:
+            "containers/unicycler.sif",
+            get_input_fastqs,
+        output:
+            "assemblies/{sample}/assembly.gfa",
+            "assemblies/{sample}/assembly.fasta",
+        shell:
+            logged(
+                "/unicycler/unicycler-runner.py"
+                " --threads {threads}"
+                " --short1 {input[0]}"
+                " --short2 {input[1]}"
+                " --out assemblies/{wildcards.sample}/"
+            )
+
+Puttered around changing my StumpWM and terminal colors/borders/etc a bit to
+make them a little easier on my eyes.  We'll see if it sticks.
+
+## 2023-09-06
+
+HG545 in the morning.  Mostly understood things.
+
+Asked the professor after about a question I had while reading the paper.  One
+of the things the paper did to confirm the region of interest was to use a PAC
+(P1-derived artificial chromosome) to "rescue" the golden embryos.  The
+resulting fish showed mosaic rescue, confirming that the wild-type gene was
+likely on that PAC, i.e. in the region of interest.
+
+What I didn't understand is how injecting the plasmid into the embryos resulted
+in the expression of the genes farther down the developmental line, e.g. does it
+somehow get incorporated into the cells' genomes?  It turns out to be messy.
+
+First: you don't inject "a PAC" into the embryos, you inject "a shitload of
+copies of the PAC" into the embryo.  So all of the embryo's cells will have
+copies of the PAC floating around inside.  As the cell divides, those will get
+diluted in daughter cells over time.  Some of these copies will, by chance, make
+it into the nucleus of their cells.  And some of those (rarely) will get
+randomly incorporated into the cell's genome, and from then on mitosis takes
+over and the gene gets propagated normally.  So the mosaic region from that
+point forward will have the gene (and if the region happens to contain some
+melanophores, also the rescued wild-type phenotype).
+
+Got my Armis access at some point during class, so it's time to figure out how
+to log into the various HPC clusters today.
+
+Doubled checked exam schedule to make sure nothing conflicts.  I think it's
+fine.
+
+Changed my school password after the network clusterfuck last week.  Sigh.
+
+Wanted to print something in the lab, realized I never installed any printing
+support on this laptop, lol.  `apt install cups` will hopefully Just Work.  CUPS
+interface is at `http://localhost:631/`.  It did not Just Work.  Surely 2024
+will be The Year of Linux on the Desktop.  Printer wouldn't configure itself,
+driver didn't appear in the list when I tried to manually configure it through
+CUPS.  `apt install printer-driver-all` got me more drivers but not this one.
+Tracked it down on the brother site and downloaded some `.deb` packages but
+they're 32-bit instead of 64.  Gave up at this point, what a janky shitshow of
+an OS.  If only everything else didn't suck in even worse ways.
+
+Tried getting the VPN running.  Installed with the script into `/opt/cisco`
+(good).  Got mysterious errors when trying to connect.  Tried the GUI connection
+manager, which runs, but gave a more informative error message I could search
+for.  Looks like I need to install `libwebkit2gtk-4.0-37`.  Installed that, now
+I get the login screen, but I can't 2FA because I only have Yubikeys set up but
+that requires a real browser, not this jank webkitgtk thing, so now I need to
+set up *another* 2FA method just for this.  Good god.  Tried to add a new 2FA
+device via Duo, but that requires 2FAing *again*, but *this* time it's through
+the Duo site which doesn't actually fucking work on Linux, so I can't add it
+here, I'll need to use the Windows shitbox at home.  God, I hate two-factor
+authentication so much.  It's *always* miserable.
+
+Tried to do the homework for BIOINF-500 (creating a pubmed search alert).  To do
+this you need an NCBI account.  Tried to log in via my UM account and managed to
+500 the NCBI site.  Incredible.  Poked around and eventually got it working (I
+think)?  Created the alert, took a screenshot.  Uploaded the PNG into Canvas for
+the assignment.  Canvas shows an error trying to retrieve it.  Uploaded it to
+the Canvas "My Files" and used *that* to submit the assignment, instead of
+uploading the file directly, and that worked.  *Incredible*.  Why does *nothing*
+ever work correctly?
+
+Came home, tried to add the 2FA with the Windows box, but it failed in the same
+way (hanging on the popup after successfully touching the yubikey).  But
+I finally figured out a workaround (in retrospect, I vaguely recall having to do
+this when I added the extra yubikeys originally): log out of everything, then go
+to log back into something (e.g. Wolverine Access), but **before** you 2FA in
+*that* login process there will be a link to add a new device on the left side
+of the screen.  That will then require you to 2FA, but doing it *here* does
+actually work.  It seems absolutely wild to me that you need to *not* be logged
+in if you want to manage 2FA, but here we are.
+
+Now that I have the Duo app thing on my phone and connected, logging into the
+VPN seems to work great.  Yak shaved successfully.
+
+## 2023-09-07
+
+BS521 and its lab this morning.
+
+Got my dotfiles synced to GL.  One tricky thing: my remote `.bash_profile`
+sources `/etc/profile` if it exists, but that causes problems on the cluster
+because there's some read-only variable set in there that it doesn't like.
+Commented out that line and everything looks okay.  `ControlMaster` does work,
+so I won't have to auth a billion times a day (thank god).
+
+BS521 lab.  Of course the AC isn't working so it's a billion degrees, lovely.
+
+Installing tidyverse failed with inscrutable errors.  After some googling it
+[looks like](https://blog.zenggyu.com/posts/en/2018-01-29-installing-r-r-packages-e-g-tidyverse-and-rstudio-on-ubuntu-linux/index.html)
+there's *extra* dependencies you need to install on Linux (C programming is
+wonderful):
+
+    sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev \
+        libfontconfig1-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev \
+        libpng-dev libtiff5-dev libjpeg-dev
+
+Still fucked, so I manually trawled through the tons of log output, googled
+around more, and found that I need to configure an env variable:
+
+    Sys.setenv(PKG_CONFIG_PATH="/usr/lib/x86_64-linux-gnu/pkgconfig")
+
+And finally it works.  Started going through the lab stuff but then time was up
+— will come back to it later.
+
+Moving on to rotation lab work.  Working from home today, so I wanted to get my
+laptop working with my external monitor and keyboard and such.  Had to swap out
+the USB-C cable I was using previously because apparently that one doesn't work
+for video ("universal" serial bus my ass) and then adjust my StumpWM `xrandr`
+commands, but I did get it working, so now I can use the laptop at my desk,
+which is nice.
+
+Note to self: I can probably use the text-mode VPN UI now that I've got the 2FA
+sorted out, and maybe remove the webkit crap I installed for it originally.
+
+Finished BS521 lab 0.  Wasn't too bad once I shaved the tidyverse yak.  Getting
+it written up with Latex required more Latex derusting, this time for code
+listings and images (pulled some of this from the MS thesis).  First, preamble
+stuff:
+
+    % Listing package for code listings.
+    \RequirePackage{listings}
+    \lstdefinestyle{default}{
+        basicstyle=\footnotesize\ttfamily,
+        showtabs=true,
+        frame=lines,
+        aboveskip=10pt,
+    }
+    \lstset{
+        language=,
+        style=default,
+    }
+
+    % Used to embed plots.
+    \usepackage{graphicx}
+
+    % No paragraph indentation for homework, just looks awkward.
+    \setlength{\parindent}{0pt}
+
+    % Inline code.
+    \def\code#1{\small\texttt{#1}}
+
+I really need to split these up into actual files I can include instead of
+copypasting them a million times, but maybe I'll wait for one more practice
+round first.
+
+Usage:
+
+    % Code listings.
+    \begin{lstlisting}
+    prop.table(table(DATA$Race))
+
+            Black MexicanAmerican           Other   OtherHispanic           White
+        0.22921790      0.23954747      0.04230202      0.03885883      0.45007378
+    \end{lstlisting}
+
+
+    % Graphics.
+    \includegraphics[]{figures/bmi-hist}
+
+    \begin{center}
+        \includegraphics[width=0.45\textwidth]{figures/hist-age}
+        \includegraphics[width=0.45\textwidth]{figures/hist-log-age}
+    \end{center}
+
+To actually save PDF plots with R:
+
+    pdf("figures/foo.pdf", height=6, width=6)
+    hist(DATA$Foo, main = "Distribution of Foo", xlab="Foo")
+    dev.off()
+
+Went to the poster session.  Lots of stuff I don't understand, and a tiny bit
+that I do.
+
+## 2023-09-08
+
+HG545 this morning.
+
+Papers never say what *could* have gone wrong with what they did — you have to
+just read between the lines and actively think about that (and what it would
+have meant, and what you would have done if it did).
+
+Learned about nonsense-mediated decay: a mechanism where mRNA with premature
+stop codons is degraded, instead of expressing a (probably truncated) protein.
+Without this, if you have a mutation that creates a stop codon in the middle of
+the gene, you would see truncated protein expressed.  But because of NMD, the
+mRNA is degraded and doesn't express the broken protein (as much).  This is good
+not only to reduce wasted translation, but because the truncated proteins can be
+actively bad.
+
+One important control that was left out of the study where they wanted to find
+where in the organism the target gene is being expressed: inject a probe with
+GFP that intentionally shouldn't match *anything*, and expect it to show up
+vaguely all over (or not at all).
+
+Another example covered during class: if you suspected a phenotype was caused by
+a mutation in a promoter (instead of in an exon), how would you test this?
+There were a couple of things folks came up with:
+
+* Could sequence the region in the mutant and wild-type population, compare to
+  see if the mutation segregates the two reliably.
+* Old school: "reporter genes".  I'm a bit fuzzy on this, but I think you insert
+  the promoter into a vector with some easily observable gene (e.g. luciferase,
+  a bioluminescent protein).  Then you see if that product is expressed more or
+  less with the different variants of the promoter.  This is a bit janky because
+  just yanking the promoter completely out of context can be problematic (e.g.
+  loses the chromatin structure around it, nearby enhancers/repressors, etc).
+* Could use RNAseq to see if the mutants with the variants are producing more of
+  the RNA for that gene.
+* Could use CHIPseq, if you know the transcription factors that bind to that
+  promoter.  Fix, fragment, attach antibodies to the TFs, precipitate them out,
+  unfix, extract the DNA (all the remaining is whatever was bound to the
+  transcription factors), and then do the sequencing.  You would expect to see
+  a larger signal if the mutation in the promoter is causing transcription
+  factors to be more likely to bind.
+
+
+Got back and tried VPN'ing with the command-line client.  It seemed to hang
+after entering my password, but then I realized it had just silently tried to
+2FA with my phone and I didn't notice.  Trying again and being ready with Duo
+let me log in, so I think I can probably ditch the webkit crap I installed for
+the graphical thing.
+
+Desktop machine wouldn't take input from my USB hub all of a sudden.  Found some
+bullshit in the logs, probably not worth debugging Yet More Linux Jank if I'm
+just going to wipe this machine and install Debian on it soon anyway.  Tried to
+reboot and systemd hung at the end, so I just powercycled the damn thing.  If
+I could just have one single day where no computer broke for me, that would be
+so nice.
+
+Flu shots are available, need to get one so PI doesn't get pinged all the time.
+
+Read for BS521 class.  All still pretty basic.  Cleaned up and turned in lab 0.
+Finished homework 2 as well, just to get it out of the way.  Or at least
+I thought I did, except there are apparently Surprise Questions™ not in the book
+to do with R.  I'll do that this weekend.
+
+## 2023-09-09
+
+Actually finished BS521 homework 2.  Realized my Latex `\code` shortcut was
+broken:
+
+    % Broken, doesn't scope the \small so later text is changed.
+    \def\code#1{\small\texttt{#1}}
+
+    % Fixed     v                 v
+    \def\code#1{{\small\texttt{#1}}}
+
+Did a first draft of the HG545 assignment 1.  This one is a lot harder than the
+stats homework.  Need to polish it up and submit it tomorrow.
+
+## 2023-09-10
+
+Polished and submitted the HG545 homework.  We'll see how it goes, I guess.
+
+## 2023-09-11
+
+HG545 discussion.  This paper was pretty straightforward.
+
+Met with PIBS peer mentor.
+
+HG545 second paper was posted, need to do an initial read of that tonight.
+
+## 2023-09-12
+
+BS521 again.  Mostly basic linear regression stuff, but got a few interesting
+tidbits out of it, mostly about the coefficient of determination, also called
+`R²` or `r²`.  This is the square of the correlation coefficient `r`, and it is
+said to mean "the fraction of the variability in the data that is explained by
+the linear model".  So an `r²` of `0.7` would mean "70% of the variation in the
+data is explained by the model".
+
+*Intuitively* what this means would be to look at the total variability in the
+data, i.e.:
+
+    (- (reduce #'max y) (reduce #'min y))
+
+Then convert the data to residuals by subtracting out the model:
+
+    (mapcar #'- (mapcar #'model x) y)
+
+and look at home much variability remains:
+
+    (- (reduce #'max residuals) (reduce #'min residuals))
+
+Compare the two to see the fraction that remains after accounting for the model.
+₂
+Looked into some "R for actual programmers" resources so maybe I can feel like
+I'm flailing less:
+
+* <https://arrgh.tim-smith.us/>
+* <https://r4ds.hadley.nz/>
+* <https://adv-r.hadley.nz/>
+* <https://www.burns-stat.com/documents/books/the-r-inferno/>
+* <https://www.burns-stat.com/documents/tutorials/impatient-r/>
+* <https://www.burns-stat.com/documents/books/tao-te-programming/>
+
+Lunch at a place called Maizie's.  Was actually pretty good!
+
+Doing Yet Another Round of Paperwork for the VA.  So much red tape.  Did what
+I could here, but there's a bunch I can't do until I get home after class today.
+
+So far I'm loving the look of the stumpwm config changes I made the other day.
+Shouldn't have waited this long to clean things up.  TODO: use
+`select-from-menu` to implement a better screen-switching shortcut in stump.
+
+Figured out how to print.  Use <https://mprint.umich.edu/maps?sites> to find
+a reasonable printer nearby, then Print Here to use it.  You upload a PDF or
+whatever through the web UI.  Good enough, it works.  One color paper cost
+$3.22 of my (apparent) $24 print budget.  Welp.
+
+PIBS800.  Getting… another lecture about how to use the library?  Didn't we
+already do this in the other class?
+
+Spent a bit more time tracking down my white whale font from that 1979 Science
+issue.  Identifont came to the rescue and I think I finally have an answer, or
+at least something very close: "Rotation" by Arthur Ritzel from 1971.
+Unfortunately a 50-year old font still has ghastly licensing options, so I'll
+probably never be able to *use* it, but at least I have peace of mind, I guess.
+
+## 2023-09-13
+
+HG545.  This module is focusing on how to create physical maps of chromosomes,
+especially the bizarre human Y chromosome.
+
+There's a difference between a genetic map and a physical map.  A genetic map
+can be created with e.g. linkage analysis, and can tell you relative distances
+but not necessarily the exact locations of things.  A physical map shows the
+actual locations.  Note that physically linked genes might not necessarily be
+genetically linked if they're far enough apart that the recombination chance is
+50%.
+
+We can't use genetic mapping for the Y chromosome because there's not
+recombination with another chromosome.
+
+In the paper they used hierarchical shotgun sequencing to sequence the
+Y chromosome, which goes roughly like this:
+
+1. Fragmented the human genome into ~200kb fragments.
+2. Cloned those into BACs
+3. You want to retrieve *only* the fragments from the Y chromosome, not from the others.
+4. Start with a known gene on Y (e.g. a well-known gene like Sry, the
+   sex-determining gene) and you PCR that to amplify the fragment(s) that
+   contain it.
+5. Sequence those fragments (split into 20kb and shotgun sequence).
+6. Design more PCR primers that *start* at the ends of *those* fragments, use
+   those to amplify things next to it.
+7. Repeat to get overlapping tiles.
+
+You end up with overlapping tiles:
+
+    ---------Sry----------                          ----------Zry--------------
+                       >>>                          <<<
+                       -----------------
+                                     >>>
+                                     -------------------
+
+Nowadays we can take advantage of long read tech to eliminate a lot of the grunt
+work in the process, e.g.:
+
+* Oxford Nanopore: 50-500kb, 90% accuracy.
+* PacBio: 20kb, >99% accuracy.
+
+Oxford is still pretty bad accuracy, but is useful to resolve things when PacBio
+still runs into trouble with some of the crazy-long repeats.
+
+Also learned about some kind of "bionano" thing that was glossed over very
+quickly.  Looks like it's a company?  Need to ask someone about this.
+
+Next talked about content of the human genome:
+
+    Human Genome
+        Unique DNA (1/3)
+        Repetative DNA (2/3)
+            Dispersed Repeats
+                Transposable Elements (e.g. LINEs, Alu)
+                Retrogenes (e.g. CDY)
+                Transposed Genes (e.g. DAZ)
+                tDNA
+            Local Repeats
+                Segmental Duplication (e.g. palindromes)
+                Satellite Duplication
+                rDNA
+
+Repeats are challenging to assemble, e.g. if you have:
+
+    Unique A | LINE1 | Unique B | LINE 1 | Unique C
+
+You might get reads like:
+
+    A1
+    1B1
+    1C
+
+It's hard to tell which direction the `1B1` should go, or whether `A` should go
+directly to `C`.  `LINE1` specifically can be resolved with PacBio because it's
+only ~6.5kb, far less than the 20kb you get from PacBio, but other segments
+still cause problems.
+
+Example of problematic things are the large palindromes from the paper:
+
+             1.45mb arm
+    <------------------------ Unique -------------------------->
+               arms have ~99.97% nucleotide identity
+
+Even if there are a few SNPs on the arms, if the segments right around the
+unique part happen to be identical it's hard to tell which arm goes where.
+
+Looked into the PACCAR thing from yesterday, but the application form is
+extremely long and I already have enough red tape to deal with through the VA,
+so I'm not going to add more paperwork for myself.  Oh well.
+
+Met with John Prensner about possibly rotating in his lab.  Next steps for
+rotations are pretty clearly to set up some chats with his students and some
+from Boyle/Parker labs to make a choice for the next 1-2 slots.  I'll try to do
+that for next week I think.  Also want to talk to Shavit again — I really liked
+chatting with him, and I think if I wanted to rotate there I would need him to
+join the department as an affiliate of some kind, so I'd need to see if he's
+okay with doing that.
+
+## 2023-09-14
+
+I am going to `mark` every time I have to log in and/or 2FA for school for at
+least a week, so I can graph it and be sad.  Adjusted my `marks` thing to go to
+my Syncthing dir.
+
+Sped up my shell prompt by wrapping the Mercurial prompt in a basic `.hg`
+existence check.  Had to relearn how to write a fish function.
+
+BS521. Chatted with the professor at office hours a bit to ask a couple of
+things.
+
+Made a TODO list with all the homework/exam/lab stuff for school.  Hopefully
+this will make it easier to see what's coming up since Canvas is barely usable.
+
+Started reaching out to set up chats with folks in a few labs I might be
+interested in for my next rotation.
+
+## 2023-09-15
+
+HG545 this morning.
+
+## 2023-09-16
+
+BS521 reading.
+
+Z-score means "number of standard deviations above the mean".
+
+Successes-based distributions:
+
+* Geometric: number of trials before observing a success.
+* Binomial: number of successes in a fixed number of trials.
+
+The chapters of this book are getting sloppier as they go on — I'm noticing
+a lot more typos now than in the first couple of chapters.
+
+Went back to John D Cook's R for Programmers post when the `pnorm` function was
+mentioned.  R has several of these functions with veyr confusing names:
+
+    <func><dist>
+
+    <func>: d: PDF ("density")
+            p: CDF ("probability")
+            q: Quantile, i.e. CDF⁻¹
+            r: Random sample
+
+    <dist>: norm: Normal aka gaussian
+            unif: Uniform
+
+So `pnorm` is "the CDF of a normal distribution".
+
+Found a way to view which fonts a PDF file embeds and/or references: `pdffonts`.
+Nice.
+
+Tired of CACL crashing on my laptop because I don't have CCL, so I'll just
+install CCL.
+
+    git clone https://github.com/Clozure/ccl.git ccl
+    curl -L -O https://github.com/Clozure/ccl/releases/download/v1.12.2/linuxx86.tar.gz
+    cd ccl
+    tar xf ../linuxx86.tar.gz
+    ./lx86cl64
+    (rebuild-ccl :full t)
+
+    sudo ln -s /home/sjl/src/ccl/lx86cl64 /usr/local/bin/ccl64
+
+Finally discovered the reason my bash prompt gets mangled sometimes:
+non-printing characters in `PS1` have to be wrapped in `\[…\]`.  So I need to do
+something ugly like this:
+
+    export PS1='\n\[${PINK}\]\u \[${D}\]at \[${HOST_COLOR}\]\h \[${D}\]in \[${GREEN}\]\w\[${D}\] $(last_return_value)$ '
+
+But at least it works properly now and won't drive me crazy.
+
+Did a bit more font hunting.  Looking for something to use for figures that
+looks plotter-esque, but isn't something with a couple of scattered glyphs and
+no weights like the plotter fonts I've found.  Licensing is a minefield, but
+Google Fonts has a bunch of stuff that's under the open font license, and
+I think I found a couple that might work: Quicksand and Nunito.  Of the two,
+Nunito seems a little nicer to me.  Will need to try it in some graphs and see
+how it works.
+
+## 2023-09-17
+
+Trying to get ahead of classwork for the next couple of weeks, since I've got so
+many other things going on.
+
+Did the reading for BS521 for the next two weeks.
+
+Finished BS521 homework 3.
+
+## 2023-09-18
+
+HG545.  Feeling better about this module than the last, which is surprising
+because I enjoyed this paper less.
+
+Chatted with someone about one of the labs I'm thinking about rotating in.
+
+Cleaned up HW2 for HG545 a bit.  Still not done, but at least I'm getting it
+into shape.
+
+## 2023-09-19
+
+BS521.
+
+Meeting with two more grad students to chat about their labs.
+
+BI500.
+
+DCMB has full time IT staff: `DCMB-IT-Services@umich.edu`.  Might email them
+about Ethernet connection?
+
+Chat widget on <https://michmed.service-now.com/sp> is a decent way to get help.
+Also walk-in help in THSL 4020. ARC support: <arc-support@umich.edu>.
+
+Slurm tutorial.  Learned a couple of interesting things:
+
+* `sq` is an alias for `squeue --me`.  Nice.
+* `my_job_header` can help debug weird Slurm shit, handy.
+* Emails will include core/mem high-water marks.  Need to figure out if I can
+  get this programatically, might be more accurate than the Snakemake benchmarks
+  (or at least worth comparing).
+
+Chatted about Boyle lab with a current grad student.
+
+PIBS 800.
+
+Finished HG545 homework 2.
+
+## 2023-09-20
+
+HG545 discussion.  Talked a lot about the Y chromosome paper.
+
+## 2023-09-21
+
+BS521.  Went over the binomial distribution.  Seeing this yet another time gave
+me an actual intuitive understanding this time, which is nice.
+
+BISTRO seminar.
+
+## 2023-09-22
+
+Retreat.
+
+Lightning talks.
+
+Breakout panel with current grad students.  Lots of stuff, probably not going to
+write it all down here.
+
+## 2023-09-23
+
+Finished HW 4 for BS521.
+
+Got some random Latex shit to remember for next time. Aligned equations:
+
+    \begin{eqnarray*}
+        foo &=& bar \\
+        meow &=& wow \\
+    \end{eqnarray*}
+
+References to figures:
+
+    (See figure~\ref{fig:g-a})
+
+    …
+
+    \begin{figure}[H]
+        \centering
+        \includegraphics[width=0.65\textwidth]{figures/g-a}
+        \caption{Graph for exercise 4.1 part a.}
+        \label{fig:g-a}
+    \end{figure}
+
+Units:
+
+    \usepackage{units}
+
+    Drink 500 \unit{ml} of water at lunch.
+
+And some random R shit to remember for next time:
+
+    dbinom = Binomial PDF
+    pbinom = Binomial CDF
+
+Came up with some absolutely cursed code to made shaded normal graphs.
+Surprised that's not already a thing.
+
+## 2023-09-25
+
+HG545.  Need to retype all my notes for this module here when I get some time so
+I don't lose them.
+
+Today started with a description of RNAseq.  Something vaguely familiar was
+a nice change for this class.  Then reviewed STARR-seq which I think I mostly
+understand now.
+
+Talked about the similarity between enhancers and promoters.  Polymerase can
+sometimes actually sit down at enhancers and produce small RNAs, but
+transcription doesn't ever elongate.  But this might be an example of how genes
+could evolve.
+
+Then talked about heat shock proteins and heat shock factor as an example of how
+rapid transcription can happen.
+
+* HSE: "Heat Shock Element", an enhancer sequence located upstream of a gene,
+  e.g. hsp90.
+* hsp90: "Heat Shock Protein 90", a protein that's used in cells to help other
+  proteins fold in the presence of heat that might otherwise prevent it.  The 90
+  is from its weight in kilodaltons (lol).
+* HSF1: "Heat Shock Factor 1", a transcription factor that trimerizes, binds to
+  HSE, and recruits another thing to activate the transcription of hsp90.
+
+There's a self-regulation loop here where, when things are cold, hsp90 binds to
+HSF1 outside the nucleus and prevents it from enhancing transcription of hsp90
+(i.e. of itself).  But when heat is applied, other proteins unfold and hsp90
+starts chaperoning them more, which leaves HSF1 free to enter the nucleus and
+enhance transcription of hsp90.
+
+Remembering how to create a local Postgres DB for testing:
+
+    sudo -u postgres psql
+
+    CREATE DATABASE example;
+    CREATE USER testuser WITH PASSWORD 'pass';
+    GRANT ALL PRIVILEGES ON DATABASE example TO testuser;
+
+    \c example
+    GRANT ALL ON SCHEMA public TO testuser;
+
+    \q
+
+    psql postgresql://testuser:pass@localhost:5432/example
+
+## 2023-09-26
+
+BS521.  Exam is on Thursday.  Today is about sampling distributions and
+statistical inference.
+
+BIOINF500.  Fire alarm for the first half of class, nice.  Rest of the class
+will be recorded, need to remember to watch it later.
+
+## 2023-09-27
+
+HG545 this morning. Did an initial pass on the homework, then met up with some
+other grad students later to chat about it and now I'm even less confident, lol.
+Welp.
+
+## 2023-09-28
+
+BS521 exam.  Did okay, though I really should have had a couple of more things
+on my note sheet than I did.  Next time I need to go through the slides too, not
+just the book — there were things on the test from class only, not in the book.
+I think I did alright though.
+
+Finished HG545 homework.  I think I did alright, but my brain is now fried.
+
+## 2023-09-29
+
+HG545 discussion this morning.
+
+Sent a few emails to try to nail down my next three rotations.  I think at this
+point I have a pretty good idea of where I want to try, so if I can just get
+them all nailed down now it'll be less stuff to deal with later.
+
+Signed up for the 503 discussion sections.  What a painful process to get
+registered.  I should have waited til I was home on my large monitor because
+trying to flip back and forth between the 90%-whitespace-filled list of sessions
+and my calendar/TODO list was extremely tedious.  I think I've got it all mapped
+out now though.
+
--- a/README.markdown	Sun Oct 22 20:15:21 2023 -0400
+++ b/README.markdown	Sun Oct 22 20:15:48 2023 -0400
@@ -5,1095 +5,6 @@
 
 [TOC]
 
-# September 2023
-
-## 2023-09-01
-
-HG545.  Looked over the slides last night and was a little worried, but felt
-okay after the lecture for the most part.  Still a few things I need to look up
-and I do still need to get my fleeting notes into this, but I feel okay.
-
-Continuing the Snakemake tutorial.
-
-Threads can be specified for a given job with `threads: 8`, and you need to
-propagate that to the command yourself with `{threads}`.  Will be scaled down if
-run with fewer cores than threads, otherwise will wait until that many are
-available.
-
-Snakemake has some support for noticing log files, but it seems like you have to
-manually create them yourself?  This seems… tedious?
-
-    rule bwa_map:
-        input:
-            "data/genome.fa",
-            lambda wc: SAMPLES[wc.sample]
-        output:
-            "mapped_reads/{sample}.bam"
-        threads: 8
-        params:
-            rg=r"@RG\tID:{sample}\tSM:{sample}"
-        log: "logs/bwa_map/{sample}.log"
-        shell:
-            "("
-            "bwa mem -R '{params.rg}' -t {threads} {input}"
-            " | samtools view -Sb - > {output}"
-            ") >{log} 2>&1"
-
-Do I really have to wrap everything in `(…) >{log} 2>&1` by hand myself?
-
-You can get a summary of file provenance with `snakemake --summary`.  The output
-is a TSV, so I went down a rathole of pretty-printing TSVs and eventually found
-that `| column -s $'\t' -t` works (mnemonic: `s$tt`).  I love how every UNIX
-program gets to invent its own bespoke command line interface for specifying
-special characters. Really great.
-
-Can mark outputs as `temp()` and `protected()`, which is nice.
-
-Need to install singularity *inside* my VM:
-
-    # Ensure repositories are up-to-date
-    sudo apt-get update
-
-    # Install debian packages for dependencies
-    sudo apt-get install -y \
-        wget \
-        build-essential \
-        libseccomp-dev \
-        libglib2.0-dev \
-        pkg-config \
-        squashfs-tools \
-        cryptsetup \
-        runc
-
-    # Install Golang
-    export VERSION=1.21.0 OS=linux ARCH=amd64 && \
-        wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
-        sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
-        rm go$VERSION.$OS-$ARCH.tar.gz
-
-    echo 'export PATH=/usr/local/go/bin:$PATH' >> ~/.bashrc && \
-        source ~/.bashrc
-
-    # Install Singularity
-    export VERSION=3.11.4 && \
-        wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-ce-${VERSION}.tar.gz && \
-        tar -xzf singularity-ce-${VERSION}.tar.gz && \
-        cd singularity-ce-${VERSION}
-
-    ./mconfig && \
-        make -C builddir && \
-        sudo make -C builddir install
-
-## 2023-09-02
-
-It is time to shave the LaTeX yak again. Installed it with `texlive-latex-base`
-to start, we'll see if I need to add some more crud in later. Going to go
-through some guides for now.
-
-Going to note some things to remember.  Skeleton of document:
-
-    \documentclass{article}
-    \begin{document}
-
-    Basic text.
-
-    \end{document}
-
-Math:
-
-    Inline math $y = 3 \sin x$ example.
-
-    Block equation:
-    \[
-            y = 3 \sin x
-    \]
-
-    With reference:
-    \begin{equation}\label{equa}
-        y' = 3 \cos x
-    \end{equation}
-    refer to it by label, e.g. equation (\ref{equa}).
-
-    More complicated: $x^2$ and $x^{2+\alpha}$ and $y_{n+1}$.
-
-Verbatim:
-
-    Verbatim text: \verb"$x^{2+\alpha}$".  Delimiter can be anything ala sed,
-    \verb_%%&_ or \verb+$$+.
-
-    Must escape special characters \&, \$, \%, \_, \{, \}, and \#.
-
-    \begin{verbatim}
-    A whole verbatim region.
-
-    (defun square (x)
-    (* x x))
-    \end{verbatim}
-
-Comments:
-
-    Comments exist.  % This is a comment.
-
-Type styles:
-
-    Shapes:
-    \textup{Upright}
-    \textit{Italic}
-    \textsl{Slanted}
-    \textsc{Small}
-
-    Series (weight):
-    \textmd{Medium}
-    \textbf{Boldface}
-
-    Families:
-    \textrm{Roman}
-    \textsf{Sans}
-    \texttt{Typewriter}
-
-Emphasis:
-
-    \emph{Never} do Foo!
-
-"Environments" are sections that are treated differently, made with `\begin{…}`
-and `\end{…}`.
-
-Lists:
-
-    Unordered list:
-    \begin{itemize}
-        \item Foo
-        \item Bar
-        \item Baz
-    \end{itemize}
-
-    Ordered list:
-    \begin{enumerate}
-        \item One
-        \item Two
-        \item Three
-    \end{enumerate}
-
-    Customizable labels:
-    \begin{description}
-        \item[Rule 1.] Foo
-        \item[Rule 2.] Bar
-        \item[Rule 3.] Baz
-    \end{description}
-
-Sizes (note the brace comes BEFORE the command!):
-
-    {\Huge Huge}
-    {\huge huge}
-    {\LARGE LARGE}
-    {\Large Large}
-    {\large large}
-    {\normalsize normalsize}
-    {\small small}
-    {\footnotesize footnotesize}
-    {\scriptsize scriptsize}
-    {\tiny tiny}
-
-Centering:
-
-    \begin{center}
-        {\large\textbf{Assignment 1}}\\% The \\ linebreaks.
-        Steve Losh\\
-        BS521
-    \end{center}
-
-Example table.
-
-    \begin{tabular}{l|rc} % lrc = cols should be left, right, centered, pipe for vertical line
-        Name & Mark & Grade \\
-        \hline\hline
-        Foo & 99 & A+ \\
-        Bar & 51 & C  \\
-        Baz &  5 & F
-    \end{tabular}
-
-Colspan with multicolumn command.
-
-    \begin{tabular}{|l||r|r|}
-        \hline
-            & \multicolumn{2}{c|}{Grades} \\
-                \cline{2-3}
-        Name & Class 1 & Class 2 \\
-        \hline\hline
-        Foo & 99 & 88 \\
-        Bar & 51 & 65  \\
-        Baz &  5 & 58 \\
-        \hline
-    \end{tabular}
-
-Full example, with referencing and caption, e.g. `Table~\ref{tab:a} on page~\pageref{tab:a}`.
-
-    % b = try to put at Bottom.  Also t top, h here, p separate page.
-    % Can do multiple in order of preference.
-    % [!t] ! = try harder
-    \begin{table}[b]
-        \begin{center}
-            \caption{An Example Table}
-            \label{tab:a}
-
-            \begin{tabular}{lr}
-                Name & Value \\
-                \hline
-                Foo &  1.0 \\
-                Bar & 15.9 \\
-                Baz &  6.2
-            \end{tabular}
-            % \caption{Caption at the end works too.}
-        \end{center}
-    \end{table}
-
-Sections:
-
-    \section{Some section} % includes numbering
-    \subsection{Some subsection}
-
-    \section*{Some section} % no numbering
-    \subsection*{Some subsection}
-
-Quotation marks (hilarious):
-
-    `Single quoted'
-    ``Double quoted''
-
-Change overall text size (simple):
-
-    \documentclass[11pt]{article} % only valid sizes are 10/11/12.
-
-Palatino instead of Computer Modern:
-
-    \usepackage{mathpazo}
-    \linespread{1.05}         % needs more leading (space between lines)
-    \usepackage[T1]{fontenc}
-
-Vimtex stuff:
-
-* Close thing with `]]` in insert mode.
-
-Got about that far, which was enough to start my BIOSTAT-521 homework.  Will dig
-in again later, but it's nice to be able to use it for something real to
-practice.
-
-Puttered around a bit looking at other fonts, but didn't find anything new or
-interesting.
-
-## 2023-09-03
-
-Spent most of today getting the reading room in my apartment ready.  Went to
-IKEA, looked around a lot and got some ideas, picked up a chair and bookshelf
-for my still-boxed unread books.  Not the most productive Sunday, but sitting in
-my chair and reading at night felt good.
-
-## 2023-09-04
-
-Continuing the EdX genetics course over breakfast.
-
-## 2023-09-05
-
-BIOSTAT-521 and BIOINF-500 classes this morning.
-
-Going to spend my non-class time looking into Unicycler today (and taking care
-of paperwork if anything that needs my attention crops up) and Bandage to
-visualize the results.
-
-Grabbed the container from StaPH-B and tried to figure out where the executable
-is.  I think it's `/unicycler/unicycler-runner.py`.  Grabbed the sample data
-from the Unicycler repo and got something running:
-
-    singularity exec containers/unicycler.sif \
-        /unicycler/unicycler-runner.py \
-        --short1 sample_data/short_reads_1.fastq.gz \
-        --short2 sample_data/short_reads_2.fastq.gz \
-        --out assembly/
-
-Pretty straightforward so far.  The result directory contains a log and a bunch
-of GFA files.  GFA apparently stands for [Graphical Fragment
-Assembly](https://gfa-spec.github.io/GFA-spec/GFA1.html), but I think it's
-"graphical" in the "directed/undirected graph" sense and not in the "pixels"
-sense.  Text-based format which seems pretty straightforward to parse (maybe
-I should have a go at parsing it for fun).
-
-Installed Bandage and viewed the results.  Not sure what exactly I'm looking at,
-but it works and looks pretty enough I guess.  [This
-example](https://github.com/rrwick/Bandage/wiki/Simple-example) was helpful to
-get a sense of what it's trying to show me.
-
-Unicycler has some configuration knobs to tweak:
-
-> Unicycler can be run in three modes: conservative, normal (the default) and
-> bold, set with the --mode option. Conservative mode is least likely to produce
-> a complete assembly but has a very low risk of misassembly. Bold mode is most
-> likely to produce a complete assembly but carries greater risk of misassembly.
-> Normal mode is intermediate regarding both completeness and misassembly risk.
-
-Reran with both conservative and bold modes and looked at the difference in the
-results for the sample data.  They're not the same, but I can't visually detect
-any major obvious differences.  Maybe it's not a big deal on this sample.
-
-Once I got that running in a shell, I got it ported into Snakemake, shaving
-a bunch of yaks along the way.
-
-I noticed that Snakemake can take a JSON config file instead of YAML. Fantastic,
-switched over to that right away:
-
-    {
-        "containers": {
-            "fastqc":    "docker://staphb/fastqc:0.12.1",
-            "unicycler": "docker://staphb/unicycler:0.5.0"
-        },
-        "samples": {
-            "short_read_example": [
-                "sample_data/short_reads_1.fastq.gz",
-                "sample_data/short_reads_2.fastq.gz"
-            ]
-        }
-    }
-
-Got the containers downloading via snakemake as well, so it's snakes all the way
-down:
-
-    rule containers:
-        input:
-            expand("containers/{name}.sif", name=config["containers"].keys()),
-
-    rule container:
-        output:
-            "containers/{name}.sif",
-        params:
-            source=lambda wc: config["containers"][wc.name],
-        shell:
-            "singularity pull {output} {params.source}"
-
-Got `snakefmt` working with Neoformat so I can `F6` in Vim to reformat.  Had to
-fuck around with the config because it was just emptying out the file — I think
-the key was that:
-
-* `snakefmt` edits in-place by default (gross).
-* Need to use `replace`: `1` in the Neoformat config to deal with this.
-
-And finally we can assemble:
-
-    def get_input_fastqs(wildcards):
-        return config["samples"][wildcards.sample]
-
-    rule unicycler_assemble:
-        log:
-            "logs/unicycler_assemble/{sample}.log",
-        container:
-            "containers/unicycler.sif"
-        threads: 8
-        input:
-            "containers/unicycler.sif",
-            get_input_fastqs,
-        output:
-            "assemblies/{sample}/assembly.gfa",
-            "assemblies/{sample}/assembly.fasta",
-        shell:
-            logged(
-                "/unicycler/unicycler-runner.py"
-                " --threads {threads}"
-                " --short1 {input[0]}"
-                " --short2 {input[1]}"
-                " --out assemblies/{wildcards.sample}/"
-            )
-
-Puttered around changing my StumpWM and terminal colors/borders/etc a bit to
-make them a little easier on my eyes.  We'll see if it sticks.
-
-## 2023-09-06
-
-HG545 in the morning.  Mostly understood things.
-
-Asked the professor after about a question I had while reading the paper.  One
-of the things the paper did to confirm the region of interest was to use a PAC
-(P1-derived artificial chromosome) to "rescue" the golden embryos.  The
-resulting fish showed mosaic rescue, confirming that the wild-type gene was
-likely on that PAC, i.e. in the region of interest.
-
-What I didn't understand is how injecting the plasmid into the embryos resulted
-in the expression of the genes farther down the developmental line, e.g. does it
-somehow get incorporated into the cells' genomes?  It turns out to be messy.
-
-First: you don't inject "a PAC" into the embryos, you inject "a shitload of
-copies of the PAC" into the embryo.  So all of the embryo's cells will have
-copies of the PAC floating around inside.  As the cell divides, those will get
-diluted in daughter cells over time.  Some of these copies will, by chance, make
-it into the nucleus of their cells.  And some of those (rarely) will get
-randomly incorporated into the cell's genome, and from then on mitosis takes
-over and the gene gets propagated normally.  So the mosaic region from that
-point forward will have the gene (and if the region happens to contain some
-melanophores, also the rescued wild-type phenotype).
-
-Got my Armis access at some point during class, so it's time to figure out how
-to log into the various HPC clusters today.
-
-Doubled checked exam schedule to make sure nothing conflicts.  I think it's
-fine.
-
-Changed my school password after the network clusterfuck last week.  Sigh.
-
-Wanted to print something in the lab, realized I never installed any printing
-support on this laptop, lol.  `apt install cups` will hopefully Just Work.  CUPS
-interface is at `http://localhost:631/`.  It did not Just Work.  Surely 2024
-will be The Year of Linux on the Desktop.  Printer wouldn't configure itself,
-driver didn't appear in the list when I tried to manually configure it through
-CUPS.  `apt install printer-driver-all` got me more drivers but not this one.
-Tracked it down on the brother site and downloaded some `.deb` packages but
-they're 32-bit instead of 64.  Gave up at this point, what a janky shitshow of
-an OS.  If only everything else didn't suck in even worse ways.
-
-Tried getting the VPN running.  Installed with the script into `/opt/cisco`
-(good).  Got mysterious errors when trying to connect.  Tried the GUI connection
-manager, which runs, but gave a more informative error message I could search
-for.  Looks like I need to install `libwebkit2gtk-4.0-37`.  Installed that, now
-I get the login screen, but I can't 2FA because I only have Yubikeys set up but
-that requires a real browser, not this jank webkitgtk thing, so now I need to
-set up *another* 2FA method just for this.  Good god.  Tried to add a new 2FA
-device via Duo, but that requires 2FAing *again*, but *this* time it's through
-the Duo site which doesn't actually fucking work on Linux, so I can't add it
-here, I'll need to use the Windows shitbox at home.  God, I hate two-factor
-authentication so much.  It's *always* miserable.
-
-Tried to do the homework for BIOINF-500 (creating a pubmed search alert).  To do
-this you need an NCBI account.  Tried to log in via my UM account and managed to
-500 the NCBI site.  Incredible.  Poked around and eventually got it working (I
-think)?  Created the alert, took a screenshot.  Uploaded the PNG into Canvas for
-the assignment.  Canvas shows an error trying to retrieve it.  Uploaded it to
-the Canvas "My Files" and used *that* to submit the assignment, instead of
-uploading the file directly, and that worked.  *Incredible*.  Why does *nothing*
-ever work correctly?
-
-Came home, tried to add the 2FA with the Windows box, but it failed in the same
-way (hanging on the popup after successfully touching the yubikey).  But
-I finally figured out a workaround (in retrospect, I vaguely recall having to do
-this when I added the extra yubikeys originally): log out of everything, then go
-to log back into something (e.g. Wolverine Access), but **before** you 2FA in
-*that* login process there will be a link to add a new device on the left side
-of the screen.  That will then require you to 2FA, but doing it *here* does
-actually work.  It seems absolutely wild to me that you need to *not* be logged
-in if you want to manage 2FA, but here we are.
-
-Now that I have the Duo app thing on my phone and connected, logging into the
-VPN seems to work great.  Yak shaved successfully.
-
-## 2023-09-07
-
-BS521 and its lab this morning.
-
-Got my dotfiles synced to GL.  One tricky thing: my remote `.bash_profile`
-sources `/etc/profile` if it exists, but that causes problems on the cluster
-because there's some read-only variable set in there that it doesn't like.
-Commented out that line and everything looks okay.  `ControlMaster` does work,
-so I won't have to auth a billion times a day (thank god).
-
-BS521 lab.  Of course the AC isn't working so it's a billion degrees, lovely.
-
-Installing tidyverse failed with inscrutable errors.  After some googling it
-[looks like](https://blog.zenggyu.com/posts/en/2018-01-29-installing-r-r-packages-e-g-tidyverse-and-rstudio-on-ubuntu-linux/index.html)
-there's *extra* dependencies you need to install on Linux (C programming is
-wonderful):
-
-    sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev \
-        libfontconfig1-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev \
-        libpng-dev libtiff5-dev libjpeg-dev
-
-Still fucked, so I manually trawled through the tons of log output, googled
-around more, and found that I need to configure an env variable:
-
-    Sys.setenv(PKG_CONFIG_PATH="/usr/lib/x86_64-linux-gnu/pkgconfig")
-
-And finally it works.  Started going through the lab stuff but then time was up
-— will come back to it later.
-
-Moving on to rotation lab work.  Working from home today, so I wanted to get my
-laptop working with my external monitor and keyboard and such.  Had to swap out
-the USB-C cable I was using previously because apparently that one doesn't work
-for video ("universal" serial bus my ass) and then adjust my StumpWM `xrandr`
-commands, but I did get it working, so now I can use the laptop at my desk,
-which is nice.
-
-Note to self: I can probably use the text-mode VPN UI now that I've got the 2FA
-sorted out, and maybe remove the webkit crap I installed for it originally.
-
-Finished BS521 lab 0.  Wasn't too bad once I shaved the tidyverse yak.  Getting
-it written up with Latex required more Latex derusting, this time for code
-listings and images (pulled some of this from the MS thesis).  First, preamble
-stuff:
-
-    % Listing package for code listings.
-    \RequirePackage{listings}
-    \lstdefinestyle{default}{
-        basicstyle=\footnotesize\ttfamily,
-        showtabs=true,
-        frame=lines,
-        aboveskip=10pt,
-    }
-    \lstset{
-        language=,
-        style=default,
-    }
-
-    % Used to embed plots.
-    \usepackage{graphicx}
-
-    % No paragraph indentation for homework, just looks awkward.
-    \setlength{\parindent}{0pt}
-
-    % Inline code.
-    \def\code#1{\small\texttt{#1}}
-
-I really need to split these up into actual files I can include instead of
-copypasting them a million times, but maybe I'll wait for one more practice
-round first.
-
-Usage:
-
-    % Code listings.
-    \begin{lstlisting}
-    prop.table(table(DATA$Race))
-
-            Black MexicanAmerican           Other   OtherHispanic           White
-        0.22921790      0.23954747      0.04230202      0.03885883      0.45007378
-    \end{lstlisting}
-
-
-    % Graphics.
-    \includegraphics[]{figures/bmi-hist}
-
-    \begin{center}
-        \includegraphics[width=0.45\textwidth]{figures/hist-age}
-        \includegraphics[width=0.45\textwidth]{figures/hist-log-age}
-    \end{center}
-
-To actually save PDF plots with R:
-
-    pdf("figures/foo.pdf", height=6, width=6)
-    hist(DATA$Foo, main = "Distribution of Foo", xlab="Foo")
-    dev.off()
-
-Went to the poster session.  Lots of stuff I don't understand, and a tiny bit
-that I do.
-
-## 2023-09-08
-
-HG545 this morning.
-
-Papers never say what *could* have gone wrong with what they did — you have to
-just read between the lines and actively think about that (and what it would
-have meant, and what you would have done if it did).
-
-Learned about nonsense-mediated decay: a mechanism where mRNA with premature
-stop codons is degraded, instead of expressing a (probably truncated) protein.
-Without this, if you have a mutation that creates a stop codon in the middle of
-the gene, you would see truncated protein expressed.  But because of NMD, the
-mRNA is degraded and doesn't express the broken protein (as much).  This is good
-not only to reduce wasted translation, but because the truncated proteins can be
-actively bad.
-
-One important control that was left out of the study where they wanted to find
-where in the organism the target gene is being expressed: inject a probe with
-GFP that intentionally shouldn't match *anything*, and expect it to show up
-vaguely all over (or not at all).
-
-Another example covered during class: if you suspected a phenotype was caused by
-a mutation in a promoter (instead of in an exon), how would you test this?
-There were a couple of things folks came up with:
-
-* Could sequence the region in the mutant and wild-type population, compare to
-  see if the mutation segregates the two reliably.
-* Old school: "reporter genes".  I'm a bit fuzzy on this, but I think you insert
-  the promoter into a vector with some easily observable gene (e.g. luciferase,
-  a bioluminescent protein).  Then you see if that product is expressed more or
-  less with the different variants of the promoter.  This is a bit janky because
-  just yanking the promoter completely out of context can be problematic (e.g.
-  loses the chromatin structure around it, nearby enhancers/repressors, etc).
-* Could use RNAseq to see if the mutants with the variants are producing more of
-  the RNA for that gene.
-* Could use CHIPseq, if you know the transcription factors that bind to that
-  promoter.  Fix, fragment, attach antibodies to the TFs, precipitate them out,
-  unfix, extract the DNA (all the remaining is whatever was bound to the
-  transcription factors), and then do the sequencing.  You would expect to see
-  a larger signal if the mutation in the promoter is causing transcription
-  factors to be more likely to bind.
-
-
-Got back and tried VPN'ing with the command-line client.  It seemed to hang
-after entering my password, but then I realized it had just silently tried to
-2FA with my phone and I didn't notice.  Trying again and being ready with Duo
-let me log in, so I think I can probably ditch the webkit crap I installed for
-the graphical thing.
-
-Desktop machine wouldn't take input from my USB hub all of a sudden.  Found some
-bullshit in the logs, probably not worth debugging Yet More Linux Jank if I'm
-just going to wipe this machine and install Debian on it soon anyway.  Tried to
-reboot and systemd hung at the end, so I just powercycled the damn thing.  If
-I could just have one single day where no computer broke for me, that would be
-so nice.
-
-Flu shots are available, need to get one so PI doesn't get pinged all the time.
-
-Read for BS521 class.  All still pretty basic.  Cleaned up and turned in lab 0.
-Finished homework 2 as well, just to get it out of the way.  Or at least
-I thought I did, except there are apparently Surprise Questions™ not in the book
-to do with R.  I'll do that this weekend.
-
-## 2023-09-09
-
-Actually finished BS521 homework 2.  Realized my Latex `\code` shortcut was
-broken:
-
-    % Broken, doesn't scope the \small so later text is changed.
-    \def\code#1{\small\texttt{#1}}
-
-    % Fixed     v                 v
-    \def\code#1{{\small\texttt{#1}}}
-
-Did a first draft of the HG545 assignment 1.  This one is a lot harder than the
-stats homework.  Need to polish it up and submit it tomorrow.
-
-## 2023-09-10
-
-Polished and submitted the HG545 homework.  We'll see how it goes, I guess.
-
-## 2023-09-11
-
-HG545 discussion.  This paper was pretty straightforward.
-
-Met with PIBS peer mentor.
-
-HG545 second paper was posted, need to do an initial read of that tonight.
-
-## 2023-09-12
-
-BS521 again.  Mostly basic linear regression stuff, but got a few interesting
-tidbits out of it, mostly about the coefficient of determination, also called
-`R²` or `r²`.  This is the square of the correlation coefficient `r`, and it is
-said to mean "the fraction of the variability in the data that is explained by
-the linear model".  So an `r²` of `0.7` would mean "70% of the variation in the
-data is explained by the model".
-
-*Intuitively* what this means would be to look at the total variability in the
-data, i.e.:
-
-    (- (reduce #'max y) (reduce #'min y))
-
-Then convert the data to residuals by subtracting out the model:
-
-    (mapcar #'- (mapcar #'model x) y)
-
-and look at home much variability remains:
-
-    (- (reduce #'max residuals) (reduce #'min residuals))
-
-Compare the two to see the fraction that remains after accounting for the model.
-₂
-Looked into some "R for actual programmers" resources so maybe I can feel like
-I'm flailing less:
-
-* <https://arrgh.tim-smith.us/>
-* <https://r4ds.hadley.nz/>
-* <https://adv-r.hadley.nz/>
-* <https://www.burns-stat.com/documents/books/the-r-inferno/>
-* <https://www.burns-stat.com/documents/tutorials/impatient-r/>
-* <https://www.burns-stat.com/documents/books/tao-te-programming/>
-
-Lunch at a place called Maizie's.  Was actually pretty good!
-
-Doing Yet Another Round of Paperwork for the VA.  So much red tape.  Did what
-I could here, but there's a bunch I can't do until I get home after class today.
-
-So far I'm loving the look of the stumpwm config changes I made the other day.
-Shouldn't have waited this long to clean things up.  TODO: use
-`select-from-menu` to implement a better screen-switching shortcut in stump.
-
-Figured out how to print.  Use <https://mprint.umich.edu/maps?sites> to find
-a reasonable printer nearby, then Print Here to use it.  You upload a PDF or
-whatever through the web UI.  Good enough, it works.  One color paper cost
-$3.22 of my (apparent) $24 print budget.  Welp.
-
-PIBS800.  Getting… another lecture about how to use the library?  Didn't we
-already do this in the other class?
-
-Spent a bit more time tracking down my white whale font from that 1979 Science
-issue.  Identifont came to the rescue and I think I finally have an answer, or
-at least something very close: "Rotation" by Arthur Ritzel from 1971.
-Unfortunately a 50-year old font still has ghastly licensing options, so I'll
-probably never be able to *use* it, but at least I have peace of mind, I guess.
-
-## 2023-09-13
-
-HG545.  This module is focusing on how to create physical maps of chromosomes,
-especially the bizarre human Y chromosome.
-
-There's a difference between a genetic map and a physical map.  A genetic map
-can be created with e.g. linkage analysis, and can tell you relative distances
-but not necessarily the exact locations of things.  A physical map shows the
-actual locations.  Note that physically linked genes might not necessarily be
-genetically linked if they're far enough apart that the recombination chance is
-50%.
-
-We can't use genetic mapping for the Y chromosome because there's not
-recombination with another chromosome.
-
-In the paper they used hierarchical shotgun sequencing to sequence the
-Y chromosome, which goes roughly like this:
-
-1. Fragmented the human genome into ~200kb fragments.
-2. Cloned those into BACs
-3. You want to retrieve *only* the fragments from the Y chromosome, not from the others.
-4. Start with a known gene on Y (e.g. a well-known gene like Sry, the
-   sex-determining gene) and you PCR that to amplify the fragment(s) that
-   contain it.
-5. Sequence those fragments (split into 20kb and shotgun sequence).
-6. Design more PCR primers that *start* at the ends of *those* fragments, use
-   those to amplify things next to it.
-7. Repeat to get overlapping tiles.
-
-You end up with overlapping tiles:
-
-    ---------Sry----------                          ----------Zry--------------
-                       >>>                          <<<
-                       -----------------
-                                     >>>
-                                     -------------------
-
-Nowadays we can take advantage of long read tech to eliminate a lot of the grunt
-work in the process, e.g.:
-
-* Oxford Nanopore: 50-500kb, 90% accuracy.
-* PacBio: 20kb, >99% accuracy.
-
-Oxford is still pretty bad accuracy, but is useful to resolve things when PacBio
-still runs into trouble with some of the crazy-long repeats.
-
-Also learned about some kind of "bionano" thing that was glossed over very
-quickly.  Looks like it's a company?  Need to ask someone about this.
-
-Next talked about content of the human genome:
-
-    Human Genome
-        Unique DNA (1/3)
-        Repetative DNA (2/3)
-            Dispersed Repeats
-                Transposable Elements (e.g. LINEs, Alu)
-                Retrogenes (e.g. CDY)
-                Transposed Genes (e.g. DAZ)
-                tDNA
-            Local Repeats
-                Segmental Duplication (e.g. palindromes)
-                Satellite Duplication
-                rDNA
-
-Repeats are challenging to assemble, e.g. if you have:
-
-    Unique A | LINE1 | Unique B | LINE 1 | Unique C
-
-You might get reads like:
-
-    A1
-    1B1
-    1C
-
-It's hard to tell which direction the `1B1` should go, or whether `A` should go
-directly to `C`.  `LINE1` specifically can be resolved with PacBio because it's
-only ~6.5kb, far less than the 20kb you get from PacBio, but other segments
-still cause problems.
-
-Example of problematic things are the large palindromes from the paper:
-
-             1.45mb arm
-    <------------------------ Unique -------------------------->
-               arms have ~99.97% nucleotide identity
-
-Even if there are a few SNPs on the arms, if the segments right around the
-unique part happen to be identical it's hard to tell which arm goes where.
-
-Looked into the PACCAR thing from yesterday, but the application form is
-extremely long and I already have enough red tape to deal with through the VA,
-so I'm not going to add more paperwork for myself.  Oh well.
-
-Met with John Prensner about possibly rotating in his lab.  Next steps for
-rotations are pretty clearly to set up some chats with his students and some
-from Boyle/Parker labs to make a choice for the next 1-2 slots.  I'll try to do
-that for next week I think.  Also want to talk to Shavit again — I really liked
-chatting with him, and I think if I wanted to rotate there I would need him to
-join the department as an affiliate of some kind, so I'd need to see if he's
-okay with doing that.
-
-## 2023-09-14
-
-I am going to `mark` every time I have to log in and/or 2FA for school for at
-least a week, so I can graph it and be sad.  Adjusted my `marks` thing to go to
-my Syncthing dir.
-
-Sped up my shell prompt by wrapping the Mercurial prompt in a basic `.hg`
-existence check.  Had to relearn how to write a fish function.
-
-BS521. Chatted with the professor at office hours a bit to ask a couple of
-things.
-
-Made a TODO list with all the homework/exam/lab stuff for school.  Hopefully
-this will make it easier to see what's coming up since Canvas is barely usable.
-
-Started reaching out to set up chats with folks in a few labs I might be
-interested in for my next rotation.
-
-## 2023-09-15
-
-HG545 this morning.
-
-## 2023-09-16
-
-BS521 reading.
-
-Z-score means "number of standard deviations above the mean".
-
-Successes-based distributions:
-
-* Geometric: number of trials before observing a success.
-* Binomial: number of successes in a fixed number of trials.
-
-The chapters of this book are getting sloppier as they go on — I'm noticing
-a lot more typos now than in the first couple of chapters.
-
-Went back to John D Cook's R for Programmers post when the `pnorm` function was
-mentioned.  R has several of these functions with veyr confusing names:
-
-    <func><dist>
-
-    <func>: d: PDF ("density")
-            p: CDF ("probability")
-            q: Quantile, i.e. CDF⁻¹
-            r: Random sample
-
-    <dist>: norm: Normal aka gaussian
-            unif: Uniform
-
-So `pnorm` is "the CDF of a normal distribution".
-
-Found a way to view which fonts a PDF file embeds and/or references: `pdffonts`.
-Nice.
-
-Tired of CACL crashing on my laptop because I don't have CCL, so I'll just
-install CCL.
-
-    git clone https://github.com/Clozure/ccl.git ccl
-    curl -L -O https://github.com/Clozure/ccl/releases/download/v1.12.2/linuxx86.tar.gz
-    cd ccl
-    tar xf ../linuxx86.tar.gz
-    ./lx86cl64
-    (rebuild-ccl :full t)
-
-    sudo ln -s /home/sjl/src/ccl/lx86cl64 /usr/local/bin/ccl64
-
-Finally discovered the reason my bash prompt gets mangled sometimes:
-non-printing characters in `PS1` have to be wrapped in `\[…\]`.  So I need to do
-something ugly like this:
-
-    export PS1='\n\[${PINK}\]\u \[${D}\]at \[${HOST_COLOR}\]\h \[${D}\]in \[${GREEN}\]\w\[${D}\] $(last_return_value)$ '
-
-But at least it works properly now and won't drive me crazy.
-
-Did a bit more font hunting.  Looking for something to use for figures that
-looks plotter-esque, but isn't something with a couple of scattered glyphs and
-no weights like the plotter fonts I've found.  Licensing is a minefield, but
-Google Fonts has a bunch of stuff that's under the open font license, and
-I think I found a couple that might work: Quicksand and Nunito.  Of the two,
-Nunito seems a little nicer to me.  Will need to try it in some graphs and see
-how it works.
-
-## 2023-09-17
-
-Trying to get ahead of classwork for the next couple of weeks, since I've got so
-many other things going on.
-
-Did the reading for BS521 for the next two weeks.
-
-Finished BS521 homework 3.
-
-## 2023-09-18
-
-HG545.  Feeling better about this module than the last, which is surprising
-because I enjoyed this paper less.
-
-Chatted with someone about one of the labs I'm thinking about rotating in.
-
-Cleaned up HW2 for HG545 a bit.  Still not done, but at least I'm getting it
-into shape.
-
-## 2023-09-19
-
-BS521.
-
-Meeting with two more grad students to chat about their labs.
-
-BI500.
-
-DCMB has full time IT staff: `DCMB-IT-Services@umich.edu`.  Might email them
-about Ethernet connection?
-
-Chat widget on <https://michmed.service-now.com/sp> is a decent way to get help.
-Also walk-in help in THSL 4020. ARC support: <arc-support@umich.edu>.
-
-Slurm tutorial.  Learned a couple of interesting things:
-
-* `sq` is an alias for `squeue --me`.  Nice.
-* `my_job_header` can help debug weird Slurm shit, handy.
-* Emails will include core/mem high-water marks.  Need to figure out if I can
-  get this programatically, might be more accurate than the Snakemake benchmarks
-  (or at least worth comparing).
-
-Chatted about Boyle lab with a current grad student.
-
-PIBS 800.
-
-Finished HG545 homework 2.
-
-## 2023-09-20
-
-HG545 discussion.  Talked a lot about the Y chromosome paper.
-
-## 2023-09-21
-
-BS521.  Went over the binomial distribution.  Seeing this yet another time gave
-me an actual intuitive understanding this time, which is nice.
-
-BISTRO seminar.
-
-## 2023-09-22
-
-Retreat.
-
-Lightning talks.
-
-Breakout panel with current grad students.  Lots of stuff, probably not going to
-write it all down here.
-
-## 2023-09-23
-
-Finished HW 4 for BS521.
-
-Got some random Latex shit to remember for next time. Aligned equations:
-
-    \begin{eqnarray*}
-        foo &=& bar \\
-        meow &=& wow \\
-    \end{eqnarray*}
-
-References to figures:
-
-    (See figure~\ref{fig:g-a})
-
-    …
-
-    \begin{figure}[H]
-        \centering
-        \includegraphics[width=0.65\textwidth]{figures/g-a}
-        \caption{Graph for exercise 4.1 part a.}
-        \label{fig:g-a}
-    \end{figure}
-
-Units:
-
-    \usepackage{units}
-
-    Drink 500 \unit{ml} of water at lunch.
-
-And some random R shit to remember for next time:
-
-    dbinom = Binomial PDF
-    pbinom = Binomial CDF
-
-Came up with some absolutely cursed code to made shaded normal graphs.
-Surprised that's not already a thing.
-
-## 2023-09-25
-
-HG545.  Need to retype all my notes for this module here when I get some time so
-I don't lose them.
-
-Today started with a description of RNAseq.  Something vaguely familiar was
-a nice change for this class.  Then reviewed STARR-seq which I think I mostly
-understand now.
-
-Talked about the similarity between enhancers and promoters.  Polymerase can
-sometimes actually sit down at enhancers and produce small RNAs, but
-transcription doesn't ever elongate.  But this might be an example of how genes
-could evolve.
-
-Then talked about heat shock proteins and heat shock factor as an example of how
-rapid transcription can happen.
-
-* HSE: "Heat Shock Element", an enhancer sequence located upstream of a gene,
-  e.g. hsp90.
-* hsp90: "Heat Shock Protein 90", a protein that's used in cells to help other
-  proteins fold in the presence of heat that might otherwise prevent it.  The 90
-  is from its weight in kilodaltons (lol).
-* HSF1: "Heat Shock Factor 1", a transcription factor that trimerizes, binds to
-  HSE, and recruits another thing to activate the transcription of hsp90.
-
-There's a self-regulation loop here where, when things are cold, hsp90 binds to
-HSF1 outside the nucleus and prevents it from enhancing transcription of hsp90
-(i.e. of itself).  But when heat is applied, other proteins unfold and hsp90
-starts chaperoning them more, which leaves HSF1 free to enter the nucleus and
-enhance transcription of hsp90.
-
-Remembering how to create a local Postgres DB for testing:
-
-    sudo -u postgres psql
-
-    CREATE DATABASE example;
-    CREATE USER testuser WITH PASSWORD 'pass';
-    GRANT ALL PRIVILEGES ON DATABASE example TO testuser;
-
-    \c example
-    GRANT ALL ON SCHEMA public TO testuser;
-
-    \q
-
-    psql postgresql://testuser:pass@localhost:5432/example
-
-## 2023-09-26
-
-BS521.  Exam is on Thursday.  Today is about sampling distributions and
-statistical inference.
-
-BIOINF500.  Fire alarm for the first half of class, nice.  Rest of the class
-will be recorded, need to remember to watch it later.
-
-## 2023-09-27
-
-HG545 this morning. Did an initial pass on the homework, then met up with some
-other grad students later to chat about it and now I'm even less confident, lol.
-Welp.
-
-## 2023-09-28
-
-BS521 exam.  Did okay, though I really should have had a couple of more things
-on my note sheet than I did.  Next time I need to go through the slides too, not
-just the book — there were things on the test from class only, not in the book.
-I think I did alright though.
-
-Finished HG545 homework.  I think I did alright, but my brain is now fried.
-
-## 2023-09-29
-
-HG545 discussion this morning.
-
-Sent a few emails to try to nail down my next three rotations.  I think at this
-point I have a pretty good idea of where I want to try, so if I can just get
-them all nailed down now it'll be less stuff to deal with later.
-
-Signed up for the 503 discussion sections.  What a painful process to get
-registered.  I should have waited til I was home on my large monitor because
-trying to flip back and forth between the 90%-whitespace-filled list of sessions
-and my calendar/TODO list was extremely tedious.  I think I've got it all mapped
-out now though.
-
 # October 2023
 
 ## 2023-10-01