Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sequencing our construct on MiSeq is slightly different than standard Illumina MiSeq sequencing. Load the providedsample sheet, (which arbitrarily specifies a 250 paired-end run with an 8nt barcode read) and spike in 15uL of the anti-reverse BMC index primer @ 100uM (5' AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG 3') into tube 13 of the cartridge. This should provide three reads (forward, reverse and index) at 250, 8, 250 bp each.

(The following is curtsey of Shaprio lab):

To generate an indexing file, you have to change the setup of the MiSeq reporter, because by default MSR doesn't generate a barcodes_reads.fastq. 
In order to change that:
First, turn off MSR service:  Task Manager, Services Tab, Right click on MiSeq Reporter and click stop.
Used NotePad to edit MiSeqReporter.exe.config file that can be found in C:\Illumina\MiSeq Reporter

The following needs to be included in the top portion of the file (the <appSettings> section)

<add key="CreateFastqForIndexReads" value="1" />

Save then close.

You will then want to restart the service. This can be accomplished by right clicking on the tool bar in windows, selecting "Start Task Manager", select the "Services" tab, find MiSeq Reporter on the list and then select to stop and then start the service.

You can re-queue your run using the sample sheet WITH the index information on the sample. In our case, we used a very simple sample_sheet with one index like ATATATAT.

De-multiplexing

You can demultiplex at various stages. If there are multiple, un-related projects in the same run, I will pull out all of the reads that map to barcodes for only one project, so that I don't have to process extra data. You also have the option of removing unwanted barcodes at the qiime: split_libraries_fastq.py step by providing a mapping file containing only the barcodes you want, but you may waste time overlapping them if there are a lot of them. Do not use the following step if you if you will eventually work with all of the data. Only use the following if you never need to work with the other data, since it doesn't make sense to process it at all.

...

These can be used as the fastq files in downstream processes.

You can also use just the mapping files that would be the input to QIIME (not in the example above, but default for QIIME), and the index read generated as previously stated, if you just want to limit the data to a set of barcodes in your mapping file. In that case run the following command:

perl parse_Illumina_multiplex_from_map_index.pl <Solexa File1> <Solexa File2> <mapping> <output_prefix> <index read>

The fastq files will contain only those found in your mapping file and can be used in downstream analysis.

Overlapping the reads

You may have sufficient length to overlap the forward and reverse reads to create a longer sequence. This process will be time consuming, but it gains phylogenetic resolution and can be useful for many applications. We use SHE-RA, which was created to have a sophisticated calculation of quality for an overlapped base, given the quality of each overlapped base and whether or not they match. Other software exists (and is faster), but will do multiple things at once, including trimming the sequences for quality and will not provide as good an estimate of the quality of the overlapped bases. If other programs are used, it might be necessary to use other ways to de-multiplex samples after using. With SHE-RA, we overlap paired end seuqencessequences, then re-generate the fastq files to use with QIIME split_libraries_fastq.py.

First, divide up your samples into about 1 million reads per file, forward and reverse reads separately (SHERA has code for parallelization, but I couldn't get it to work).

general form-

perl perl ~/bin/ split_fastq_qiime_1.8.pl <read> <number needed> <output prefix>

Example:-

perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split

...

Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing (remember to change the lib path in the code of concatReads.pl for the code to run from any folder- text editor like emacs, change the second line to the directory where the .pm files are, save):

general form-

perl concatReads_1.8.pl fastq_1 fastq2 --qualityScaling illuminasanger

example of actual command-

perl concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illuminasanger

Filter out the bad overlaps from the fa and quala generated with SHERA:

...

perl fix_index.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq 131001Alm_D13-4961_3_sequence.fastq > 131001Alm_D13-4961_3_${PBS_ARRAY_ID}.filter_0.8.fastqOrfastq

Or, if you have to generate it from the header (if the index is already present in the header):

...

split_libraries_fastq.py  -i 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq -m mapping_file.txt  -b 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.index.filter_0.8.fastq --barcode_type 8 --rev_comp_barcode --min_per_read_length .8 -q 10 --max_bad_run_length 0 -o  uniqueunique_output_${PBS_ARRAYID} --phred_offset 33

...