...
All other samples that do not match are discarded.
Program:
perlparse_Illumina_multiplex2.pl <Solexa File1> <Solexa File2> <mapping> <output_prefix>
...
First, divide up your samples into about 1 million reads per file. This can typically be processed on our computers in about 10 hours.
perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split
...
Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing
perl ~/bin/SHERA_code/ concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illumina
Filter out the bad overlaps from the fa and quala generated with SHERA:
perl /mit/spacocha/bin/SHERA_code/ filterReads.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fa 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.quala 0.8
Use mothur to re-generate the fastq files:
...
Now, you will either have to fix the index file to contain only the reads in your file (if the index read is a separate file):
perl ~/bin/ fix_index.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq 131001Alm_D13-4961_3_sequence.fastq > 131001Alm_D13-4961_3_${PBS_ARRAY_ID}.filter_0.8.fastqOr, if you have to generate it from the header (if the index is already present in the header):
perl /mit/spacocha/bin/ fastq2Qiime_barcode2.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq > 131001Alm_D13-4961_1_sequence.split.index.filter_0.8.fastq
This file can be used for specific header configuration where the fastq files look like this, where it pulls out the longest string of base letters (ATGCKMRYSWBVHDNX) after the #, in this case it would be TGGGACCT and creates a false quality for each base as the lower case of each barcode letter:
...