2024.markdown @ 8b42d86ce8fd

Update
author Steve Losh <steve@stevelosh.com>
date Mon, 22 Jul 2024 10:04:36 -0400
parents 8f1af69d38e3
children (none)
[TOC]

# January 2024

## 2024-01-08

Back from break.  Started a new rotation (classes don't start for another
couple of days).  Feels good to be back.

## 2024-01-09

Another day of rotation before classes start.  Getting my laptop set up to do
actual work again.

Looked around a little bit in Canvas.  One of my classes doesn't seem to have
posted anything yet, but I'm definitely registered so I think I should be good.

Did a couple of tasks to get ready for classes tomorrow (though they don't
*really* start til Thursday).

Need to remember to bring things into the lab tomorrow:

* Something to put my laptop on to raise it.
* Headphones.

## 2024-01-10

Cleaned up my schedule and sketched out my TODO list, found all the rooms for my
classes.  Got one new building to find/learn: USB, the "Undergraduate Science
Building".  I think I've seen it before, never been inside though.

First day of journal club.  Main things I'll need to do are:

* Ask two questions during the semester.
* Two review sessions for particular papers.
* Read over the rest of the papers.

## 2024-01-11

First two actual classes.

BS521.  Fairly straightforward.  8:30 AM sucks.  Tried to change lab sections to
something that works better with my schedule, but the class registration system
is awful and won't let me swap things out for some reason, despite there being
open seats.

BI529.  Need to watch videos (and do the embedded quizzes) before classes.
Really awful high-pitched whine from something in the ceiling in the spot I sat
in the room — need to move around and find a spot that isn't going to destroy my
ears. Got both Great Lakes and my VM set up with Jupyter (I think) and
accessibly by SSH port forwarding (e.g. `ssh -NAL 8888:localhost:8888
slosh@gl1234 -J gl`) so we'll see how this goes during the next class I guess.

Poked around at running `make docs` in a Lisp project with my new laptop and
something in `d`'s Python dependency shitpile broke since I last built it, so
I had to screw around with the guts of `d` to get it rendering again.  I really
need to rewrite `d` in a sane language.

Attempted to set up printing in the lab, but of course it doesn't work on Linux.

Downloaded the first 3 papers for journal club.  Need to print them and start
reading soon, at least for next week's.

After going back and forth with someone about the BS522 lab section stuff, they
realized they needed to grant me permission for that section.  Once they did
that I was able to swap it, so that's all set now.

Read through chapter 1 of the BS522 textbook.  It's… meh.

## 2024-01-12

Reinstalled Figlet and the `figlet-fonts` repo yet again on this new computer.
Finally got around to writing a script to do the stupid install so I don't have
to manually do it every time.

Reinstalled ABCL on the new machine.  Changed my scripts so they invoke `java
-jar /usr/local/bin/abcl.jar` directly instead of requiring
a `/usr/local/bin/abcl` wrapper for no reason, so now all I have to do is
download the binary distribution and move the two jars to `/usr/local/bin/`.

Reinstalled ECL on the new machine.  Trivial thanks to my notes, except the git
tagging convention switched from `ECL-1.2.3` to just a bare `1.2.3`.  My kingdom
for consistency.

Downloaded the textbook and marked down important dates for BI545 today so I'll
be prepared for class.

Running some of my project tests on the new machine resulted in `make: time: No
such file or directory` because… apparently Debian doesn't include
a `/usr/bin/time` by default?  Wow.  `apt install time` to get it.

Read chapter 2 of the BS522 book.  Mostly meh again.

One thing I'd like to do some weekend that's been on my mind for a while is
a small proof of concept web app with Lisp and HTMX.  Would be good to poke
around at it and see if I can make something reasonably easy to get up and
running, in case I want to do something like that in the future.  Stuff I'm
probably going to need to choose:

* Web server (almost certainly just boring old Hunchentoot)
* Router
* DB interface (probably Postmodern)
* DB migrations (will write my own)
* Template library (Djula is the obvious choice but has a shitload of deps)
* Monocss style

Also really need to write a simple FASTQ parser and bio seq library in Lisp, so
I don't have to do everything with janky UNIX tools.  Name idea: `skein`,
because it's for strings that are a tangled/knotted mess.  Should probably try
to separate interface and implementation pretty well here so I can potentially
plug and play different implementations in the future.

Found my way to the BI545 classroom.  These MSC/MSRB buildings are laid out like
someone's first Dwarf Fortress build.

First BI545 class.  Mostly reading the syllabus and going through Molecular
Biology 101.  Feels quite basic after HG545 last semester, hopefully things will
pick up as the course gets going.

## 2024-01-15

Tried to modify Pubmed search subscriptions but it looks like logging in via UM
Oauth is broken, sigh.

Finally got around to setting up keyboard shortcuts for a couple of my most
commonly used `pass -c` commands.  I should have done this months ago.

Need to make a last-minute slide for myself for PIBS 800 when I get home.

## 2024-01-16

BS522 early.  Mostly review of stuff from BS521 today.

BI529.  Most of the time spent getting Jupyter working and the data all
downloaded.  Need to find a better workflow here.

BI545:

* Slurm account: `bioinf545w24_class`
* Scratch: `/scratch/bioinf545_w24_class_root/bioinf545w24_class/`
* Turbo: `/nfs/turbo/dcmb-class/bioinf545`
  * `gsi/`: GSI's stuff
  * `sec001`: directories for each student
  * `shared`: read-only, shared data and stuff.

Now basic UNIX/bash scripting stuff.

Slurm `.sbat` script tidbit: use `my_job_header` for debugging info.

Method for raw-dogging batch mode execution instead of writing an sbat: `sbatch
-c4 --mem=32G foo.sh`. Not sure I'll use that much, but good to know it's an
option.

Ate lunch during my short break today.  Tuesdays are going to suck.

Looked around again for how to integrate Jupyter and Neovim and found nothing
good.  Maybe I'll just suffer through using Jupyter for a class and not try to
shave this yak.

BS522 lab.  Pretty simple, but gave lots of advice for the homework which will
come in handy when I get to that.  Sketched out a LaTeX file for it, but
I really really need to get around to modularizing my LaTeX situation at some
point — the copy/paste is getting untenable.

## 2024-01-17

Finished BS529 exercise.  Looked at the solutions to figure out some of the
fiddly bits, like the 1-based indexing from the GFF (I was right) and the GFF
entry type (`CDS`, I was also right).  The problem was I was dumb and forgot to
reverse in the reverse complement.  Welp.

Went through the videos for BI529 tomorrow.

BI602.  Didn't get a chance to ask questions — this is going to be a very
annoying part of the class.

## 2024-01-18

BS522.  Started talking about simple linear regression.  Mostly still review at
this point.

BI529.  Talking about transcription factor binding sites, and fuzzy-matching
motifs using Positional Weight Matrices (PWMs).

Went to a lecture by my grandPI about the human skin microbiome, especially
mapping the distributions of various populations across the topology of the
human skin.

Wrapped up BI529 after getting the clarification email.  Today was more
straightforward than the last class, probably in part by not having to mess
around with Jupyter setup.

Did a first pass at a read of the journal club paper for next week.  It's an
older ML paper whose results are… meh?

## 2024-01-19

BI545. The next few classes will be mostly RNA-seq data, especially focused on
QC.  Need to watch 2 videos in the slides before next time.  Mostly a review for
me at the beginning.  Need to look into:

* Ribodepletion as an RNA selection step.
  * This just means using probes with magnetic beads to select and pull out ribosomal RNA so it doesn't get sequenced.
* Strand-specific library prep for RNA seq.

## 2024-01-22

Tried to watch the BI529 videos but they kept cutting out.  Will retry later.

Started the Biostat 522 homework.  So far it's… meh.  Still trying to find
a Latex font setup I like.  Going to try XCharter this time, it seems… slightly
better than Stix?  Really with I could use Rotation, but foundries gonna founder
I guess.

BI529 videos are still cutting out.  Ugh.

Watched the BI545 videos.  Those were on Youtube, so that worked reliably.

## 2024-01-23

The roads and sidewalks are one giant sheet of ice today.  This sucks.

BS522.  More linear regression stuff.  He really rushed through the last few
topics far too fast in the last couple of minutes of class — I think I need to
review those slides myself to make sure I got what he was trying to cover.

BI529.  Gibbs sampling.  Thursday will also be this topic, so if we finish today
we can skip that day if we want.

BI545.  QC and such for RNAseq.  Lots of various things to consider, but felt
like a bit of a grab bag of topics.

Read through the journal club paper again for tomorrow.  Some ideas on
questions:

* How does the performance of their tool compare to a human doctor?
* How robust is their data?  E.g. what if someone has a heart attack outside of
  the hospital, could that be counted as a control?

PIBS800.

BS522 lab.

## 2024-01-24

Finally learned about `View > Panes > Console on Right` in R.  Should have found
this months ago.

Reread the paper for BI602.  Being forced to ask synthetic questions sucks
a lot, but oh well.

BI602.

## 2024-01-25

BS522.  Many slides with walls of text and notation.

BI529.  Mostly done already, so just relaxing for a change.

Started reading the paper for BI602 review session tomorrow.

Poked around a little at implementing some of the 521 stuff in Common Lisp.  It
was trivial to port the first class, but I've now hit two stumbling blocks.

Naive parsing of bioinformatics files like FASTAs is slow when you have a stream
sandwich of a flexi stream wrapping a gzip stream wrapping a Lisp stream.  Can
likely live with this for now, and writing a real parser for some of the common
formats (which would probably work directly with the binary and avoid the slow
flexi stream layer) has been on my list for a while, so this isn't an
insurmountable obstacle.

The other snag I hit is the lack of Numpy.  There are a bunch of linear algebra
libraries in CL and they're all lacking.  Oddly enough the one that seems to
have the most promise for me is April, even though it would have the steepest
learning curve at the beginning.  I've poked around with it a bit before but
maybe this is a good time to actually give it a real try.  Need to go through
some/all of:

* <https://xpqz.github.io/learnapl/intro.html>
* <http://archive.vector.org.uk/art10011550>
* <https://microapl.com/APL/tutorial_contents.html>

At least I shaved the input yak the last time I looked at it, so I can dive
right in (as if I don't have enough other shit to do, but at least this is
computery so it's actually fun to learn).

## 2024-01-26

BI545.  More RNA seq lectures, talking mostly about how to normalize and account
for non-biological variability when doing differential gene expression.

BI602 review session.  Was pretty fun and chill and cool.  By far the best part
of the class so far.

Continued one of the APL tutorials.  Pretty basic stuff, pretty much just
getting my fingers used to the keyboard shortcuts to type basic stuff.

## 2024-01-27

Found a series of videos about using APL to build neural networks, which might
be a fun thing to go through:
<https://www.youtube.com/playlist?list=PLgTqamKi1MS3p-O0QAgjv5vt4NY5OgpiM>

Watched the BS521 videos for Tuesday.  Starting assembly with De Bruijn graphs
which is going to be fun.  Need to find those papers I read a while back that
had a good summary of the process.

## 2024-01-28

BI545 homework.  Long.  The first two sections (biology/computer stuff) were
easy, the last (experimental design) was rough.

Started the BS522 homework.  Going to finish it tomorrow I think.

Went through some more of those APL videos.  I'm actually liking it (and April)
quite a lot.  I feel like I might be actually motivated to learn this time
because I can see things I would use it for now, as opposed to years ago when it
was just a fun novelty.

## 2024-01-29

Finished BS522 homework.  That was *tedious*.  The inconsistent notation in this
class is… not fun.

## 2024-01-30

BS522.  Started multiple linear regression, mostly review for now.

BI529.  Today's assignment was pretty trivial after reviewing De Bruijn graphs
last night, presumably it'll get more difficult when we actually use it on
Thursday.  We also figured out what the high-pitched whine in the room was
coming from (one particular bank of lights), thank god.

Spent some of the spare time to create a bug report for April and the April guy
responded pretty quickly that I had indeed ran into multiple compiler bugs.  If
I could ever just use a computer and have it reliably work I would be so happy.

BI545 on Zoom, mostly talking about how to measure the false discovery rate in
a variety of different ways.

PIBS800, about posters.

Watched the BI529 video.  It suggested we read the Velvet paper, which is funny
because I already printed it out the other day intending to reread it.

Did one or two more of the APL videos, now that I've confirmed I'm not crazy and
it was a bug in APL.  Got the forward pass working, though I need to learn to be
even more precise than usual in APL I guess.  When all your variables are like
`ws, bs, w, b` then `1↓ ws` and `1↓ w` is an easy mistake to make, and I guess
because everything is an array and every operator works on any shape of array
that could even be conceivable for convenience you won't get any type errors to
help you realize you did something stupid.

## 2024-01-31

Read the Velvet paper for BI529 tomorrow.  Seems like what I remember, though
I think I actually understand it better now.  There are still a few bits I'm not
100% clear on, but I think I get the general idea.

BI602.  This is the paper I reviewed the presentation for, and I asked something
last week, so I think this week I really have nothing to do.

# February 2024

## 2024-02-01

Tried to commit my dotfiles this morning and had to commit to a subrepo
I modified yesterday.  Wanted to push that to the git mirror but the hg-git
mapping was fucked again, so I needed to run that janky `remap-hggit-mapfile`
thing I wrote ages ago.  But I needed to port *that* to Python 3 to get it to
run on this laptop.  Did that and finally unwound the stack of yaks.  Ugh.
Final process to run the remapping is something like:

    cd src/foo
    cd .hg
    git clone https://github.com/sjl/foo gitrepo
    cd ..
    remap-hggit-mapfile . .hg/gitrepo
    remap-hggit-mapfile . .hg/gitrepo > .hg/git-mapfile

BS522.

BI529.  Going to release the first homework assignment today.  The assignment
was tricky today — I think there's a bug in the doctest and/or their
implementation of the Eulerian walk.

Continued watching a couple of APL videos.

## 2024-02-02

Started reading the journal club paper for next week, about prediction of CRISPR
off-target effects.  It's not very well-written, unfortunately — I'm not even
a page in and there are multiple typos.

BI545.  Lecture plus interactive lab on Great Lakes.  The lab was okay, but
quite rushed because the lecture took a long time.  I'm pretty decent at CLI
stuff so it went smoothly for me, but I still didn't have enough time to finish
in class.

## 2024-02-03

Watched some BI529 videos for next week.  Still need to finish up the homework.
Next week is BWT, which always breaks my brain, so I think I should probably
have a poke around tomorrow just to get my head around it again.  I wonder how
it would be to implement in APL.

## 2024-02-04

Finished the BS529 homework.  Wrote a bunch of stuff about the confusing parts
in the summary document only to realize there's a one page limit.  Oh well, it
was cathartic for me at least.

Started the BS522 homework, but I just don't have the willpower to deal with the
tedium of this today.  I'll get it done tomorrow.

Looked at some BWT transform videos.  I think I understand it a little better
now, though not sure if I'll retain it.  I need to write it down at some point
if I really want to remember it (is it worth it?).

## 2024-02-05

Read part of the chapter on BWT in the Bioinformatics Algorithms textbook.
It's… really not great.  They have a really weird way of presenting algorithms
that's long and rambly and does a lot of unnecessary work to actually explain
things.  Sometimes they go on for a page about something trivial, and other time
gloss over major points.

I think I need to find a better source to review for tomorrow.  Found
<https://james.fabpedigree.com/bwt.htm> which is actually pretty nice because it
shows how to *efficiently* do this, albeit with godawful terse C code.  After
going through it and deciphering the stupid C programmer shortcuts… I think
I finally get it.  At least the efficient compression and decompression side of
things.  Took me a long time, but it feels good.

BWT fried my brain, so I'm procrastinating the tedious BS522 homework til
tomorrow.  Sorry, future self.

## 2024-02-06

Slept late, missed BS522.  Need to watch the VOD tomorrow.

BI529 about BWT.  Easy given that I figured it out yesterday, though there were
some new data structures that we'll need for string matching (which I still
haven't figured out — only the decompression).

Had a stroke of luck with respect to the BS522 homework — they pushed the due
date back to Sunday (four more days) so I've got a few more days to deal with
it.  That's a relief.

BI545, about novel transcript identifications.

## 2024-02-07

Spent a ton of time trying to decipher what the BI529 homework was trying to ask
for.  Not fun.

BI602.  Paper was about using machine learning to predict CRISPR off target
scores for various sgRNA configurations.

## 2024-02-08

BS522.  Feels like we're moving incredibly slowly now, but the homeworks are
still painful.  Very strange.

BI529, BWT matching.  Going to try to make sure I understand what's actually
happening here.

BI545, hodgepodge of various topics (RNASeq contamination, GEO databases?).
Trying to install the R packages they told us to install at the beginning of
class using the cluster was an absolute mess, because 50 people all hitting the
login nodes and doing a ton of file IO slows everything to a crawl.  I ended up
doing things locally, which was much faster.

Started reading the paper for journal club next week, about prediction
transcriptional response to genetic perturbation.

## 2024-02-10

Watched BI529 videos.

Finally did the absolutely mind-numbing homework 3 for BS522.  Truly, truly
miserable.  Result was eleven pages, much of which was manual plug n' chug
algebra to verify that R can, in fact, do math correctly.  Very unpleasant.

## 2024-02-11

Did a chunk of the BI545 homework.  Otherwise not super productive today,
unfortunately.

## 2024-02-12

Finished BI545 homework.

## 2024-02-13

BS522.  This class is too early.  Spent all class talking about the special case
of binary interaction variables, which felt like it didn't need a whole entire
class just for that.

Forgot to charge my laptop last night, so I'm at 50% in BI529.  Welp.  Managed
to finish all but the last function, though the documentation was not
particularly clear and I think they didn't expect us to handle certain edge
cases they gave us.  Finished that last bit of 529 right before 545 started.

BI545.  Another grab bag of stuff, none gone into a lot of depth, lots of magic
incantations for how to perform analyses in R without explaining what everything
is.  Oh well.  Checked my work with some folks after class and feel a bit better
now.

PIBS800, training grant fair.

BS522 lab.

## 2024-02-14

Reading the paper for BI545 on Friday.  They're trying to figure out the cause
of undiagnosed rare muscular diseases using RNA seq to look for alternative
splicing events.  Noticed that figure S6 is confusing because they say in the
paper the median number of potentially pathogenic splice events per sample for
the "all genes" category is 190, but the S6 figure caption says 105, and the
figure itself looks like the median in the box plot is more like 250.  So… which
is it?  Finished the paper up to the methods section, will take a look at that
tomorrow I think.

## 2024-02-15

BI529 in the morning.  Managed to finish it in class at the end with some hints
from Alan.  Wasn't too bad once I understood what they actually wanted.

## 2024-02-16

BI545.  This is the class where we had to read the paper about using RNA seq to
find novel splice isoforms in patients with undiagnosed muscular diseases.  Also
getting some information about the group project.

Figured out how to get R working with Vim.  I should have done this months ago,
now that I've got it set up it's no longer nearly as painful to work in R.  The
trick was realizing that base R running in a terminal, *on its own*, can open up
a plot window itself.  So I don't have to do any horrible workarounds to get
plots visible, they just appear in a window that I can put wherever I want.
Maybe BS522 homework won't feel so painful now.

## 2024-02-17

Did the BS522 homework.  This one wasn't as mind-numbing as the last one,
thankfully.  Only a tiny bit of R involved, but I can already tell I'm loving
the Vim setup with it.  I can edit my code and LaTeX in the same editor, imagine
that.

## 2024-02-18

Read some of the paper(s) for journal club this week.

## 2024-02-19

A bunch of homeworks got rescheduled, and now everything is due on Friday.  Ugh.

Puttered around making a `reductions` function in the evening.  Turned out to be
tricky if you want it to work like CL's actual `reduce`, i.e. supporting
`start`, `end`, `key`, `from-end`, and `initial-value` properly.  I think I've
got something working, though it would also be nice to add a `result-type`
argument so you could ask for a vector directly though.

## 2024-02-20

BS522.  Another day on multiple regression, this time they mentioned continuous
vs continuous but otherwise it still just feels like treading water.  I feel
like we got everything thrown at us in the first week and have been stalled on
the same stuff ever since.

BI529.  This week was much easier, maybe because I slowed down and read the
description more carefully first to try to

* Try to figure out the biology behind what we're actually implementing.  Way
  too easy to treat the problems as a black box, but I'm actually getting to the
  point now where the biological context *helps* me, rather than overwhelming
  me.  Progress?
* Identify which things they're going to handwave over in the implementation
  (e.g. today we just compare *all* genes with *all* SNPs, not the ones near
  them like you'd have to do in reality).

Anyway, reading carefully at the beginning saved me going down a couple of
rabbit holes along the way.  Need to remember to do the same next time.

BI545.  ChIP-seq.

PIBS800, about funding after the PIBS year.  Need to remember to register for
604 as soon as I'm able.

* Figure out who the graduate coordinator is, check in with them once per term
  about funding/appointment changes etc.

Funding mechanisms:

* Fellowship, etc.  Paid through financial aid system.
* Training grants.  Paid through financial aid system.
* Graduate Student Research Assistant (GSRA), paid through HR.
* Graduate Student Instructor (GSI), paid through HR.

EdM stuff:

* Need to be in PhD lab and program as of July 1.
* Make sure rotation 4 is done in EdM (I think we did this long ago).

## 2024-02-21

Found <https://www.ejmastnak.com/tutorials/vim-latex/intro/> and might sharpen
the LaTeX saw a little bit this morning before I dive back into stuff.  I think
I'm pretty invested in LaTeX at this point, and aside from the snippets thing
I use almost everything there already, so it might be a good effort/reward
ratio.

Went through the first part, about snippets.  After some ridiculous yak shaving
(apparently the obvious, reasonable directory name `snippets` is special-cased
to be parsed as some *other* snippet plugin's syntax and so if you dare to name
the directory the obviously correct name it'll fail with a completely
inscrutable error) I got it working.  Going to have a go and see if it's worth
adding snippets back into my life, at least for LaTeX — I haven't really used
them since the Textmate era.

Journal club class.  Paper was about deep learning for EEG data.

## 2024-02-22

BS522 in the morning.

BI529, about Markov chains.  I've done these before so today was pretty basic.
Played around with some Python syntax to see if I could golf things down a bit.

Finished up the BS522 and BI545 homeworks, no idea why it felt like pulling
teeth this time — they (mostly) weren't as bad as the previous ones.

Also finished BI529 code (still need to do the writeup).  Ended up being pretty
easy overall, a lot more straightforward than the previous homework, which was
nice.

Finally got irssi set up on my new laptop.  It's been months, I should have done
this before now.  Oh well, at least it's done.

## 2024-02-23

Cleaned up and submitted BI545.  Going to review later today with folks and
maybe resubmit, but at least there's something up there now in case I forget.

BI545, this time about ATAC-seq.  I feel like it was a good introduction to the
assay, but am cautious about how in-depth the labs/etc will go.

Journal club review.  Last one for me, so I'm done with the reviews for the rest
of the semester, which is nice.

Office hours to try to understand what the 545 homework wanted after submitting.
I still have no idea if I was correct.  Oh well.

Finished BI529 so I don't have to worry about it over the break.