Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

All other samples that do not match are discarded.

Program:

perlparse_Illumina_multiplex2.pl <Solexa File1> <Solexa File2> <mapping> <output_prefix>

...

First, divide up your samples into about 1 million reads per file. This can typically be processed on our computers in about 10 hours.

perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split

...

Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing

perl ~/bin/SHERA_code/ concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illumina

Filter out the bad overlaps from the fa and quala generated with SHERA:

perl /mit/spacocha/bin/SHERA_code/ filterReads.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fa 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.quala 0.8
Use mothur to re-generate the fastq files:

...

Now, you will either have to fix the index file to contain only the reads in your file (if the index read is a separate file):

perl ~/bin/ fix_index.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq 131001Alm_D13-4961_3_sequence.fastq > 131001Alm_D13-4961_3_${PBS_ARRAY_ID}.filter_0.8.fastqOr, if you have to generate it from the header (if the index is already present in the header):

perl /mit/spacocha/bin/ fastq2Qiime_barcode2.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq > 131001Alm_D13-4961_1_sequence.split.index.filter_0.8.fastq
This file can be used for specific header configuration where the fastq files look like this, where it pulls out the longest string of base letters (ATGCKMRYSWBVHDNX) after the #, in this case it would be TGGGACCT and creates a false quality for each base as the lower case of each barcode letter:

...