...
Sequencing our construct on MiSeq is slightly different than standard Illumina MiSeq sequencing. Load the providedsample sheet, (which arbitrarily specifies a 250 paired-end run with an 8nt barcode read) and spike in 15uL of the anti-reverse BMC index primer @ 100uM (5' AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG 3') into tube 13 of the cartridge. This should provide three reads (forward, reverse and index) at 250, 8, 250 bp each.
De-multiplexing
(The following is curtsey of Shaprio lab):
To generate an indexing file, you have to change the setup of the MiSeq reporter, because by default MSR doesn't generate a barcodes_reads.fastq.
In order to change that:
First, turn off MSR service: Task Manager, Services Tab, Right click on MiSeq Reporter and click stop.
Used NotePad to edit MiSeqReporter.exe.config file that can be found in C:\Illumina\MiSeq Reporter
The following needs to be included in the top portion of the file (the <appSettings> section)
<add key="CreateFastqForIndexReads" value="1" />
Save then close.
You will then want to restart the service. This can be accomplished by right clicking on the tool bar in windows, selecting "Start Task Manager", select the "Services" tab, find MiSeq Reporter on the list and then select to stop and then start the service.
You can re-queue your run using the sample sheet WITH the index information on the sample. In our case, we used a very simple sample_sheet with one index like ATATATAT.
De-multiplexing
You Depending on whether your samples contains data from other projects that you don not want to process, you can demultiplex at various stages. Typically if If there are multiple, un-related projects in the same run, I will pull out all of the reads that map to the specific barcodes I am interested in firstbarcodes for only one project, so that I don't have to process extra data. You also have the option of removing unwanted barcodes at the qiime: split_libraries_fastq.py step by providing a mapping file containing only the barcodes you want. This makes sense , but you may waste time overlapping them if there are a lot of them. Do not use the following step if you if you will eventually work with all of the data, but in sets. Otherwise, . Only use the following if you will never need to work with the other data in the lane, since it doesn't make sense to process it at all.
...
These can be used as the fastq files in downstream processes.
You can also use just the mapping files that would be the input to QIIME (not in the example above, but default for QIIME), and the index read generated as previously stated, if you just want to limit the data to a set of barcodes in your mapping file. In that case run the following command:
perl parse_Illumina_multiplex_from_map_index.pl <Solexa File1> <Solexa File2> <mapping> <output_prefix> <index read>
The fastq files will contain only those found in your mapping file and can be used in downstream analysis.
Overlapping the reads
You may have sufficient length to overlap the forward and reverse reads to create a longer sequence. This process will be time consuming, but it gains phylogenetic resolution and can be useful for many applications. We use SHE-RA, which was created to have a sophisticated calculation of quality for an overlapped basesbase, given the quality of the each overlapped bases base and whether or not they match. Other software exists (and is faster), but will do multiple things at once, including trimming the sequences for quality and will not provide as good an estimate of the quality of the overlapped bases. If other programs are used, it might be necessary to use other programs ways to de-multiplex samples after using. With SHE-RA, we overlap paired end sequences, then re-generate the fastq files to use with QIIME split_libraries_fastq.py.
First, divide up your samples into about 1 million reads per file. This can typically be processed on our computers in about 10 hours., forward and reverse reads separately (SHERA has code for parallelization, but I couldn't get it to work).
general form-
perl split_fastq_qiime_1.8.pl <read> <number needed> <output prefix>
Example-
perl ~/bin/split_fastq_qiime_1.8.pl 131001Alm_D13-4961_1_sequence.fastq 100 131001Alm_D13-4961_1_sequence.split
...
Then, overlap each of the 100 files with SHERA where ${PBS_ARRAYID} is the process number for parallel processing (remember to change the lib path in the code of concatReads.pl for the code to run from any folder- text editor like emacs, change the second line to the directory where the .pm files are, save):
general form-
perl concatReads.pl fastq_1 fastq2 --qualityScaling sanger
example of actual command-
perl concatReads_1.8.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.fastq 131001Alm_D13-4961_2_sequence.split.${PBS_ARRAYID}.fastq --qualityScaling illuminasanger
Filter out the bad overlaps from the fa and quala generated with SHERA:
...
perl fix_index.pl 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq 131001Alm_D13-4961_3_sequence.fastq > 131001Alm_D13-4961_3_${PBS_ARRAY_ID}.filter_0.8.fastqOrfastq
Or, if you have to generate it from the header (if the index is already present in the header):
...
split_libraries_fastq.py -i 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq -m mapping_file.txt -b 131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.index.filter_0.8.fastq --barcode_type 8 --rev_comp_barcode --min_per_read_length .8 -q 10 --max_bad_run_length 0 -o uniqueunique_output_${PBS_ARRAYID} --phred_offset 33
...