rearrangement

rearrangement is the core chimeric alignment engine of rearr.

$ rearrangement -h
### Basic Usage
rearrangement <input_file 3<reference_file

### Parameters
-h, -help, --help: Display help.
# Aligning Parameters
-s0: Mismatching score. (default: -6)
-s1: Matching score for non-extension reference part. (default: 4)
-s2: Matching score for extension reference part. (default: 2)
-u: Gap-extending penalty. (default: -3)
-v: Gap-opening penalty. (default: -9)
-ru: Gap-extending penalty for unaligned reference ends. (default: 0)
-rv: Gap-opening penalty for unaligned reference ends. (default: 0)
-qu: Gap-extending penalty for unaligned query parts. (default: 0)
-qv: Gap-opening penalty for unaligned query parts. (default: -5)
---
title: rearrangement
---
flowchart TD
    QUERY[(
        

input_file

query # id
... ... ...
)] --> REARR[rearrangement] REF[(

reference_file

start1 ref1 end1 start2 ref2 end2
... ... ... ... ... ...
)] --> REARR REARR --> ALG[(

stdout

idx # score id
ref1 ref2
query
)]

input_file

reference_file

stdout

Every three lines of the standard output represents a single alignment.

1       1       157     9300
---aGTTGGCTAGTCAATACCTGAAGAGAGATTGGCCTGGAGTAAAAGC-TGAtaAAAGCTGATGATCGGAATGATTACAGGTAAATTAGTAGTTTTTGCCTATTTTCTTTAGAAACGGTTTTACTTAAAGCTATGTTACATATAGATAATGTAACACTCTAGt-------
CTG----------------------------TTGGCCTGGAGTAAAAGCATGAT----------GATCGGAATGATTACAGGTAAA------------------------------------------------------------------------------CAAAAAA

correct_micro_homology.awk

Microhomology is common in CRISPR editing output. When microhomology happens, rearrangement cannot determine how to align query to ref1 and ref2, as show in the following video.

correct_micro_homology.awk allows one to specify which end of the double strand break should be corrected toward the cleavage site up to the microhomology equivalence.

$ gawk -f correct_micro_homology.awk -- \
    reference_file \
    direction_file \
    < rearrangement_file
---
title: correct_micro_homology.awk
---
flowchart TD
    REF[(
        

reference_file

start1 ref1 end1 start2 ref2 end2
... ... ... ... ... ...
)] --> CMH[correct_micro_homology.awk] DIRECTION[(

direction_file

up/down
...
)] --> CMH ALG[(

rearrangement_file

idx # score id
ref1 ref2
query
)] --> CMH CMH --> CORRECTED[(

stdout

idx # score id udangle rstart1 qstart1 rend1 qend1 random rstart2 qstart2 rend2 qend2 ddangle cut1 ref1+cut2
ref1 ref2
query
)]

Core

rearrangement and correct_micro_homology.awk forms the core part of rearr. They are generally piped together.

$ rearrangement \
    < input_file \
    3< reference_file |
  gawk -f correct_micro_homology.awk -- \
    reference_file \
    direction_file

More than two blocks

rearrangement and correct_micro_homology.awk supports chimeric alignments with more than two blocks. The core part of rearr for multiple blocks is as follows.

---
title: core
---
flowchart TD
    QUERY[(
        

input_file

query # id
... ... ...
)] --> REARR[rearrangement] REF[(

reference_file

start1 ref1 end1 start2 ref2 end2 ... startN refN endN
... ... ... ... ... ... ... ... ... ...
)] --> REARR REARR --> ALG[(

rearrangement_file

idx # score id
ref1 ref2 ...
query
)] REF --> CMH[correct_micro_homology.awk] DIRECTION[(

direction_file

up/down:1:2 up/down:2:3 ... up/down:N-1:N
... ... ... ...
)] --> CMH ALG --> CMH CMH --> CORRECTED[(

stdout

idx # score id dangle0 rstart1 qstart1 rend1 qend1 dangle1 rstart2 qstart2 rend2 qend2 dangle2 ... rstartN qstartN rendN qendN dangleN
ref1 ref2
query
)]

Each row of direction_file has multiple fields corresponding to the junctions of adjacent references. The extended header of stdout of correct_micro_homology.awk contains information for all alignment blocks and all unaligned parts of query.