Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview:

...

The

...

first

...

step

...

to

...

begin

...

processing

...

raw

...

data

...

from

...

16S

...

Illumina

...

libraries

...

is

...

to

...

split

...

the

...

multiplexed

...

libraries,

...

trim

...

the

...

sequences

...

to

...

the

...

same

...

length

...

and

...

filter

...

for

...

quality.

...

I've

...

found

...

that

...

trimming

...

to

...

the

...

same

...

length

...

is

...

best,

...

because

...

the

...

same

...

sequence

...

might

...

cluster

...

differently

...

depending

...

on

...

what

...

length

...

it

...

is.

...

I've

...

also

...

included

...

the

...

parameters

...

things

...

that

...

I've

...

found

...

most

...

useful. This also makes a first pass OTU classification at 97% to a reference database (closed-reference OTU picking). This is a good for taking a first pass look at your data.

Process:

The easiest way to process fastq files from 16S Illumina data is to use Qiime. Follow the link for a tutorial:

http://http://qiime.org/tutorials/tutorial.html

...

This

...

is

...

installed

...

on

...

the

...

darwin

...

cluster

...

(beagle)

...

through

...

the

...

command:

...

module

...

add

...

qiime-default

...

Only

...

the

...

dependencies

...

required

...

for

...

the

...

default

...

Qiime

...

pipeline

...

are

...

loaded.

...

Other

...

tools

...

might

...

not

...

be

...

available.

...

To

...

process

...

the

...

data,

...

I've

...

written

...

a

...

script

...

that

...

can

...

automate

...

the

...

process.

...

The

...

script

...

needs

...

to

...

be

...

edited

...

to

...

include

...

the

...

path

...

to

...

the

...

solexa

...

file

...

etc.

...

This

...

can

...

be

...

submitted

...

to

...

the

...

cluster

...

with

...

the

...

following

...

command

...

(the

...

RunQiime.csh

...

file

...

should

...

be

...

attached):

...

qsub

...

-cwd

...

RunQiime.csh

...

Below

...

I've

...

elaborated

...

on

...

what

...

each

...

of

...

the

...

variables

...

mean:

...

#define

...

your

...

file

...

paths

...


#where

...

is

...

the

...

fastq

...

file

...

you

...

want

...

to

...

process

...


SOLFILEF=

...

This

...

is

...

the

...

path

...

to

...

your

...

solexa

...

file

...

as

...

a

...

fastq

...

file.

...

#Where

...

it

...

the

...

associated

...

mapping

...

file

...


MAPFILE=

...

The

...

mapping

...

file

...

is

...

expanded

...

on

...

in

...

the

...

Qiime

...

webpage

...

tutorial

...

above.

...

#the

...

oligo

...

file

...


OLIGO=

...

This

...

is

...

a

...

file

...

required

...

by

...

mothur

...

to

...

remove

...

the

...

primer

...

and

...

diversity

...

region

...

from

...

the

...

library

...

construct.

...

Mothur

...

looks

...

for

...

the

...

exact

...

sequence,

...

and

...

only

...

keeps

...

sequences

...

when

...

the

...

primer

...

sequence

...

is

...

found.

...

As

...

a

...

note,

...

if

...

you

...

are

...

using

...

Phusion

...

or

...

another

...

proof-reading

...

polymerase,

...

mismatches

...

in

...

the

...

primer

...

site

...

within

...

4

...

bases

...

from

...

the

...

3'

...

end

...

are

...

reverted

...

to

...

the

...

template

...

sequence

...

(actually,

...

NEB

...

says

...

that

...

it

...

will

...

change

...

anything

...

within

...

10

...

bps

...

of

...

the

...

3'

...

end),

...

which

...

would

...

then

...

be

...

consequently

...

discarded

...

at

...

the

...

trim.seqs

...

step.

...

I

...

typically

...

change

...

the

...

last 4  from the 3'

...

end

...

of

...

the

...

primer

...

to

...

N's

...

to

...

be

...

safe.

...

Filtering

...

sequences

...

that

...

don't

...

match

...

the

...

primer

...

sequence

...

has

...

been

...

shown

...

to

...

improve

...

the

...

quality

...

of

...

the

...

data

...

(

...

Zhou_2011.pdf

...

).

...

#where

...

the

...

programs

...

are

...

called

...

from

...


BIN=/data/spacocha/bin

...

This

...

is

...

a

...

folder

...

where

...

you

...

should

...

put

...

the

...

following

...

files:

...

fastq2Qiime_barcode.pl

...

revert_names_mothur.pl

...

#where

...

you

...

want

...

to

...

put

...

the

...

output

...

of

...

the

...

analysis

...


UNIQUE=

...

It's

...

a

...

unique

...

name

...

for

...

this

...

analysis.

...

It

...

can

...

include

...

folders

...

(which

...

must

...

exits),

...

but

...

the

...

final

...

extention

...

is

...

a

...

pre-fix

...

for

...

all

...

files.

...

#reference

...

fasta

...

file

...

(latest

...

greengenes

...

OTUs)

...

REFERENCEFA=

...

These

...

can

...

be

...

downloaded

...

from

...

the

...

top-right

...

corner

...

of

...

the

...

Qiime

...

blog

...

homepage

#reference taxonomies

REFERENCETAX=

This will be in the same download as above, only in the taxonomy folder.

Output:

The output files that are created are the following (where the UNIQUE would be replaced by the variable defined above):

UNIQUE_output/ucrC/seqs_otus.mat

...

-This

...

is

...

a

...

matrix

...

of

...

your

...

OTUs,

...

mapped

...

to

...

the

...

reference

...

UNIQUE

...

_output/seqs.trim.names.fasta

...


-These

...

are

...

the

...

trimmed,

...

cleaned

...

fasta

...

sequences

...

that

...

were

...

clustered.

...

UNIQUE

...

_output/split_library_log.txt

...

-This

...

contains

...

stats

...

about

...

how

...

your

...

libraries

...

are

...

processed

Additional Notes:

Please let me know if there is anything that you don't understand about this process. I am happy to help. I think everything that you need is there, but I might have missed something.