Sometimes you just want to make a quick FASTQ file for testing.
Usage:
- Write your FASTQ spec in a
foo.lispfile (see below for the syntax). quick-fastq foo.lisp(orcat foo.lisp | quick-fastqif you prefer) to dump a random FASTQ on stdout.
Syntax
quick-fastq will read two Common Lisp forms (using the standard reader for
now, so don't run it on untrusted data). The format of the input is:
bindings
expr
expr is an expression describing how to generate a random read.
- A literal string like
"ATCG"generates those bases with random quality scores. - An integer like
123generates that many random bases with random quality scores. - A vector like
#(expr1 expr2 …)evaluates each expression and concatenates the results. - A symbol like
xlooks up the value in the bindings (see below). - A list performs some operation on the form inside, depending on the symbol at the head of the list:
(qN expr)whereNis 0-90 evaluatesexprand sets its quality scores toN, e.g.(q12 500)will generate 500 random bases with a qscore of12.(rev expr)reversesexpr(you can also use(r expr)as a shortcut).(comp expr)complementsexpr(you can also use(c expr)as a shortcut).(revcomp expr)is equivalent to(rev (comp expr))(you can also use(rc expr)as a shortcut)(first n expr)takes the firstnbases ofexpr(you can also use(f n expr)as a shortcut).(last n expr)takes the lastnbases ofexpr(you can also use(l n expr)as a shortcut).(rep n expr)concatenatesncopies ofexpr(you can also use(tr n expr)as a shortcut).(snp freq expr)modifiesexprto add SNPs at a rate offreq(freqmust be between 0 and 1).(ins freq expr)modifiesexprto insert bases at a rate offreq(freqmust be between 0 and 1).(del freq expr)modifiesexprto delete bases at a rate offreq(freqmust be between 0 and 1).(err freq expr)is equivalent to(ins freq (del freq (snp freq expr)))(freqmust be between 0 and 1).
Bindings must be a (possibly empty) list of bindings, each of the form (symbol
expr). expr will be evaluated and bound to symbol. Bindings are performed
in order as if by let*. Several keyword symbols have special meanings:
- Binding
:entriesto an integernwill generate that many FASTQ entries instead of just a single one. - Binding
:seedto an integer will seed the RNG with a specific seed, to make runs reproducible.
Examples
Generate a random 1000bp read:
()
1000
Generate a read with the same 100bp beginning and end, with 500bp of random bases in the middle:
((x 100))
#(x 500 x)
Generate a gapped foldback chimeric read, with the second half having a lower quality than the first:
((x (q40 1000))
(f (q20 (revcomp x))))
#(x 25 f)
Generate a read with a tandem repeat in the middle:
()
#(1000 (rep 200 "ATTT") 1000)
Generate a foldback chimeric read with a double tandem duplication in the foldback strand, with simulated sequencing error, and small chunks of low-quality bases to make the transitions between sections as a hack:
((x 1000)
(lq (q1 10))
(a (first 800 x))
(b (last 200 x))
(dup (last 150 a))
(f (revcomp #(lq a lq dup lq (rc dup) lq dup lq b))))
(err 0.01 #(x f))