You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview:

The first step to begin processing raw data from 16S Illumina libraries is to split the multiplexed libraries, trim the sequences to the same length and filter for quality. I've found that trimming to the same length is best, because the same sequence might cluster differently depending on what length it is. I've also included the parameters things that I've found most useful.

Process:

The easiest way to process fastq files from 16S Illumina data is to use Qiime. Follow the link for a tutorial:

http://http://qiime.org/tutorials/tutorial.html

This is installed on the darwin cluster (beagle) through the command:

module add qiime-default

Only the dependencies required for the default Qiime pipeline are loaded. Other tools might not be available.

To process the data, I've written a script that can automate the process. The script needs to be edited to include the path to the solexa file.

This can be submitted to the cluster with the following command (the RunQiime.csh file should be attached):

qsub -cwd RunQiime.csh

Below I've elaborated on what each of the variables mean:

#define your file paths
#where is the fastq file you want to process
SOLFILEF=

This is the path to your solexa file as a fastq file.

#Where it the assoicated mapping file
MAPFILE=

The mapping file is expanded on in the Qiime webpage tutorial above.

#the oligo file
OLIGO=

This is a file required by mothur to remove the primer and diversity region from the library construct. Mothur looks for the exact sequence, and only keeps sequences when the primer seqeunce is found. As a note, if you are using Phusion or another proof-reading polymerase, mismatches in the primer site within 4 bases from the 3' end are reverted to the template sequence, which would consequently be missed. I've removed the last 4

#where the programs are called from
BIN=/data/spacocha/bin
#where you want to put the output of the analysis
UNIQUE=

  • No labels