Page History

...

You may have sufficient length to overlap the forward and reverse reads to create a longer sequence. This process will be time consuming, but it gains phylogenetic resolution and can be useful for many applications. We use SHE-RA, which was created to have a sophisticated calculation of quality for an overlapped base, given the quality of each overlapped base and whether or not they match. Other software exists (and is faster), but will do multiple things at once, including trimming the sequences for quality and will not provide as good an estimate of the quality of the overlapped bases. If other programs are used, it might be necessary to use other ways to de-multiplex samples after using. With SHE-RA, we overlap paired end seuqencessequences, then re-generate the fastq files to use with QIIME split_libraries_fastq.py.

First, divide up your samples into about 1 million reads per file, forward and reverse reads separately (SHERA has code for parallelization, but I couldn't get it to work).

general form-

perl split_fastq_qiime_1.8.pl <read> <number needed> <output prefix>

Example:-

perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split

...

Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing (remember to change the lib path in the code of concatReads.pl for the code to run from any folder- text editor like emacs, change the second line to the directory where the .pm files are, save):

general form-

perl concatReads_1.8.pl fastq_1 fastq2 --qualityScaling illuminasanger

example of actual command-

perl concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illuminasanger

Filter out the bad overlaps from the fa and quala generated with SHERA:

...

split_libraries_fastq.py -i 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq -m mapping_file.txt -b 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.index.filter_0.8.fastq --barcode_type 8 --rev_comp_barcode --min_per_read_length .8 -q 10 --max_bad_run_length 0 -o uniqueunique_output_${PBS_ARRAYID} --phred_offset 33

...

Blog

Versions Compared

Old Version 13

New Version Current

Key