#!/bin/bash

shopt -s expand_aliases

alias ~=”:«’~bash’”

:«’~~~bash’

Usage

getSxCsvFileRef.md csvfile genome bowtie2index [ext1up ext1down ext2up ext2down]

Introcution

This is an in-house script to extract references from the csvfile of sx and lcy. The composition of the input csvfile is

adapter(20bp) + sgRNA(20bp) + scaffold(83/93bp) + target(44bp) + 3bp + RCbarcode(18bp) + RCprimer(21bp)

For NGG csvfile, the 44bp target can be perfectly mapped to the genome. For NAA csvfile, 17~18bp of target is TT, which should be replaced by CC in order to map genome. After mapping, the actual cut site is inferred. ref1 consists of ext1up bases upstream to the cut site and ext1down bases downstream to the cut site. ref2 is composed similarly. Note that for NAA csvfile, the retrieved reference need replace GG (target and reference always have opposite strands, so CC becomes GG) back to AA.

Source

csvfile=$1
genome=$2
bowtie2index=$3
ext1up=${4:-50}
ext1down=${5:-0}
ext2up=${6:-10}
ext2down=${7:-100}


getSxCsvFileTarget.pl "${csvfile}" | bowtie2 --quiet --mm -x "${bowtie2index}" -r -U - 2> /dev/null | samtools view | gawk -f sxTargetSam2Bed.awk -- ${ext1up} ${ext1down} ${ext2up} ${ext2down} | bedtools getfasta -s -fi "${genome}" -bed - | sed '1~2d' | getSxRefFile.pl ${ext1up} ${ext2up} ${csvfile: -6:1}
alias ~~~=":" # This suppresses a warning and is not part of source.