Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sequencing our construct on MiSeq is slightly different than standard Illumina MiSeq sequencing. Load the providedsample sheet, (which arbitrarily specifies a 250 paired-end run with an 8nt barcode read) and spike in 15uL of the anti-reverse BMC index primer @ 100uM (5' AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG 3') into tube 13 of the cartridge. This should provide three reads (forward, reverse and index) at 250, 8, 250 bp each.

De-multiplexing

Depending on whether your samples contains data from other projects that you don not want to process, you You can demultiplex at various stages. Typically if If there are multiple, un-related projects in the same run, I will pull out all of the reads that map to the specific barcodes I am interested in firstbarcodes for only one project, so that I don't have to process extra data. You also have the option of removing unwanted barcodes at the qiime: split_libraries_fastq.py step by providing a mapping file containing only the barcodes you want. This makes sense , but you may waste time overlapping them if there are a lot of them. Do not use the following step if you if you will eventually work with all of the data, but in sets. Otherwise, . Only use the following if you will never need to work with the other data in the lane, since it doesn't make sense to process it at all.

...

You may have sufficient length to overlap the forward and reverse reads to create a longer sequence. This process will be time consuming, but it gains phylogenetic resolution and can be useful for many applications. We use SHE-RA, which was created to have a sophisticated calculation of quality for an overlapped basesbase, given the quality of the each overlapped bases base and whether or not they match. Other software exists (and is faster), but will do multiple things at once, including trimming the sequences for quality and will not provide as good an estimate of the quality of the overlapped bases. If other programs are used, it might be necessary to use other programs ways to de-multiplex samples after using. With SHE-RA, we overlap paired end seuqences, then re-generate the fastq files to use with QIIME split_libraries_fastq.py.

First, divide up your samples into about 1 million reads per file. This can typically be processed on our computers in about 10 hours., forward and reverse reads separately.

perl ~/bin/split_fastq_qiime_1.8.pl <read> <number needed> <output prefix>

Example:

perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split

...

Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing:

perl concatReads_1.8.pl fastq_1 fastq2 --qualityScaling illumina

perl concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illumina

...