Run work flow
#!/bin/bash
shopt -s expand_aliases
alias ~=”:«’~bash’”
:«’~~~bash’
Usage
Run test data
param1=value1 param2=value2 ... ./runWorkFlow.md [options]
Common
options
are passed to the underlying make
calling. makeTarget
is the file you want to generate. The underly make
engine use file extensions to determine which step to run, so the file extension matters. Depending on makeTarget
, you may need to provide additional parameters and input files.
Remove duplicates
If you want to remove duplicates for paired (or multiply paired) fastq
files, then run
makeTarget=<dataDir>/<name>.noDup \
fastqFiles=<fq1>,<fq2>,... \
./runWorkFlow.md
For more details, see removeDuplicates.md
.
Demultiplex
If you has a file <dataDir>/<name>.noDup
output by removeDuplicates.md
, then run
makeTarget=<dataDir>/<name>.demultiplex \
spliterIndices=<path1>,<path2>,... \
minScores=<score1>,<scores2>,... \
./runWorkFlow.md
For more details, see demultiplex.md
.
Align
If you has a file <dataDir>/<name>.post
ready to align, then run
makeTarget=<dataDir>/<name>.alg \
refFile=<pathToRefFile> \
correctFile=<pathToCorrectFile> \
./runWorkFlow.md
This will generate the alignment file <dataDir>/<name>.alg
. Note that the file ready to align must have extension .post
. These following parameters with defaults control the chimeric alignment.
s0=-6
s1=4
s2=2
u=-3
v=-9
ru=0
rv=0
qu=0
qv=-5
Note that the defaults in this scripts override those in rearr
. If you do not have correctFile
and your refFile=<path>.ref
, then you may just set correctFile=<path>.correct
. runWorkFlow.md
will generate correctFile
with all corrections target up
. For more details, see workFlow.mak
.
Post-process by sx module
The output of demultiplex.md
does not fit the input of rearr
. The transformation between them is highly customized and changes from now and that. For Shi Xing’s data, this is done by sxCutR2AdapterFilterCumulate.md
. If you has a file <dataDir>/<name>.demultiplex
output by demultiplex.md
, then just run
makeTarget=<dataDir>/<name>.post \
./runWorkFlow.md
Although the default minToMapShear=30
works well for Shi Xing’s data, you may modifty it as you like.
makeTarget=<dataDir>/<name>.post
minToMapShear=31 \
./runWorkFlow.md
Extract spliter by sx module
If you has a csv file <fullPathToCsvFile>
in the same format as Shi Xing, then you can extract spliters by sxExtractSpliter.md
.
makeTarget=<fullPathToCsvFile>.target.fa \
./runWorkFlow.md
Besides <fullPathToCsvFile>.target.fa
, another file <fullPathToCsvFile>.pair.fa
will be generated as well. This is because sxExtractSpliter.md
always generate both spliterIndices
simultaneously.
Get reference by sx module
If you has a csv file <fullPathToCsvFile>
in the same format as Shi Xing, then you can extract spliters by getSxCsvFileRef.md
.
makeTarget=<fullPathToCsvFile>.ref \
genome=<pathToGenome> \
bowtie2index=<pathToGenomeIndex> \
./runWorkFlow.md
The default settings for genome
and bowtie2index
is hg19.fa
.
genome=test/genome/genome.fa
bowtie2index=test/genome/genome
So just run
makeTarget=<fullPathToCsvFile>.ref \
./runWorkFlow.md
Full workflow
Assume that your design is the same as Shi Xing.
Then just run
```bash
makeTarget=<dataDir>/<name>.alg \
fastqFiles=<pathToR2>,<pathToR1> \
spliterIndices=<pathToCsvFile>.target.fa,<pathToCsvFile>.pair.fa \
genome=<pathToGenome> \
bowtie2index=<pathToGenomeIndex> \
refFile=<pathToCsvFile>.ref \
correctFile=<pathToCsvFile>.correct \
./runWorkFlow.md
runWorkFlow.md
will run all steps above for you to generate <dataDir>/<name>.alg
. You may also try to modify the following parameters with defaults for better results.
minScores=30,100
s0=-6
s1=4
s2=2
u=-3
v=-9
ru=0
rv=0
qu=0
qv=-5
minToMapShear=30
Introduction
This scripts integrate all steps (remove duplicates, demultiplex, alignment and so on). However, why not just use the corresponding script to run each step? If you run test data for the second time, then runWorkFlow.md
will not do anythin because the underlying make
engine is smart enough to skip the updating of the outputs when no change is detected in the inputs. Thus, the reason to use runWorkFlow.md
is that it may skip some duplicated computations for you. To actually run the test again, one need delete the previous results first.
Another reason to use runWorkFlow.md
is that it hides two cumbersome steps from the user. For example, if spliterIndices
used in demultiplex.md
is not indexed by bowtie2
, then runWorkFlow.md
will do this silently. Also, if correctFile
has the same path and name as refFile
(see rearr
), but with the file extension .correct
instead of .ref
, and runWorkFlow.md
cannot find correctFile
on the file system, then it will generate a default correctFile
for you with all fields filled with up
.
Source
# The following parameters should be replaced.
makeTarget=${makeTarget:-test/test_work_flow/rearr.alg}
fastqFiles=${fastqFiles:-test/test_work_flow/A2-g1n-3.R2.fq.gz,test/test_work_flow/A2-g1n-3.fq.gz}
spliterIndices=${spliterIndices:-test/test_work_flow/final_hgsgrna_libb_all_0811_NGG_scaffold_nor_G1.csv.target.fa,test/test_work_flow/final_hgsgrna_libb_all_0811_NGG_scaffold_nor_G1.csv.pair.fa}
minScores=${minScores:-30,100}
minToMapShear=${minToMapShear:-30}
refFile=${refFile:-test/test_work_flow/final_hgsgrna_libb_all_0811_NGG_scaffold_nor_G1.csv.ref}
correctFile=${correctFile:-test/test_work_flow/final_hgsgrna_libb_all_0811_NGG_scaffold_nor_G1.csv.correct}
ext1up=${ext1up:-50}
ext1down=${ext1down:-0}
ext2up=${ext2up:-10}
ext2down=${ext2down:-100}
# The following parameters are default in most cases.
genome=${genome:-test/genome/genome.fa}
bowtie2index=${bowtie2index:-test/genome/genome}
s0=${s0:--6}
s1=${s1:-4}
s2=${s2:-2}
u=${u:--3}
v=${v:--9}
ru=${ru:-0}
rv=${rv:-0}
qu=${qu:-0}
qv=${qv:--5}
make $@ -f workFlow.mak $makeTarget fastqFiles=$fastqFiles spliterIndices=$spliterIndices minScores=$minScores genome=$genome bowtie2index=$bowtie2index refFile=$refFile correctFile=$correctFile s0=$s0 s1=$s1 s2=$s2 u=$u v=$v ru=$ru rv=$rv qu=$qu qv=$qv minToMapShear=$minToMapShear
alias ~~~=":" # This suppresses a warning and is not part of source.