Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Alm

...

lab

...

Protocol

...

for

...

processing

...

overlapping

...

16S

...

rRNA

...

reads

...

from

...

a

...

MiSeq

...

run

...

Experimental

...

design

...

The

...

specifics

...

of

...

the

...

sequencing

...

set-up

...

and

...

molecular

...

construct

...

will

...

determine

...

exactly

...

how

...

the

...

data

...

needs

...

to

...

be

...

sequenced

...

and

...

processed.

...

There

...

are

...

a

...

few

...

different

...

designs

...

that

...

the

...

Alm

...

lab

...

has

...

set

...

up:

...

1.)

...

(Standard)

...

Multiplexing

...

different

...

samples

...

together

...

to

...

be

...

sequenced

...

in

...

one

...

lane

...

of

...

Illumina,

...

marking

...

each

...

unique

...

sample

...

with

...

a

...

unique

...

barcode

...

on

...

the

...

reverse

...

primer

...

of

...

step

...

2

...

that

...

is

...

read

...

during

...

the

...

indexing

...

read.

...

This

...

is

...

common

...

for

...

up

...

to

...

96

...

samples

...

(or

...

105

...

including

...

additional

...

barcodes

...

that

...

are

...

not

...

in

...

the

...

96-well

...

format).

...

The

...

sequencing

...

should

...

be

...

done,

...

not

...

with

...

the

...

standard

...

Illumina

...

indexing

...

primer,

...

but

...

with

...

the

...

reverse

...

complement

...

of

...

the

...

2nd

...

read

...

sequencing

...

primer.

...

This

...

is

...

a

...

custom

...

barcode

...

that

...

should

...

be

...

included

...

in

...

the

...

sequencing

...

set-up.

...

See

...

sequencing

...

section

...

below

...

for

...

protocol.

...

2.)

...

Multiplexing

...

multiple

...

different

...

plates

...

of

...

samples

...

together

...

using

...

a

...

barcode

...

located

...

5'

...

to

...

the

...

primer

...

used

...

in

...

genome

...

amplification

...

(typically

...

U515)

...

and

...

a

...

reverse

...

barcode

...

that

...

is

...

read

...

during

...

the

...

indexing

...

read.

...

This

...

is

...

not

...

typical

...

for

...

the

...

Alm

...

lab

...

MiSeq

...

protocol,

...

since

...

getting

...

only

...

about

...

12-25

...

million

...

reads

...

from

...

MiSeq

...

is

...

sufficient

...

for

...

about

...

100

...

samples,

...

not

...

more.

...

However,

...

it

...

is

...

a

...

possible

...

scenario

...

for

...

samples

...

which

...

do

...

not

...

require

...

high

...

coverage.

...

3.)

...

Mixing

...

both

...

genome

...

sequencing

...

and

...

16S

...

rRNA

...

amplicon

...

sequencing

...

together

...

in

...

one

...

lane.

...

Adding

...

genome

...

library

...

preps

...

to

...

16S

...

amplicon

...

lanes

...

improves

...

the

...

quality

...

of

...

the

...

base

...

calling

...

by

...

adding

...

diversity

...

without

...

loosing

...

much

...

to

...

phiX

...

sequencing.

...

The

...

genome

...

library

...

constructs

...

typically

...

contain

...

barcodes

...

in

...

the

...

forward

...

and

...

reverse

...

read

...

sequences,

...

and

...

do

...

not

...

typcially

...

have

...

an

...

indexing

...

read

...

associated

...

with

...

them.

...

However,

...

adding

...

them

...

to

...

a

...

lane

...

which

...

does

...

have

...

a

...

index

...

read

...

is

...

ok.

...

4.)

...

An

...

experimental

...

set-up

...

using

...

both

...

forward

...

and

...

reverse

...

orientation

...

of

...

the

...

16S

...

rRNA

...

among

...

different

...

samples,

...

and

...

staggering

...

the

...

diversity

...

region

...

5'

...

of

...

both

...

primers

...

used

...

in

...

genome

...

amplification

...

allows

...

for

...

sufficient

...

base

...

diversity

...

to

...

run

...

16S

...

rRNA

...

libraries

...

without

...

wasting

...

phiX

...

data.

...

In

...

this

...

case,

...

half

...

of

...

the

...

samples

...

begin

...

by

...

sequencing

...

from

...

the

...

U515

...

primer

...

in

...

the

...

forward

...

read,

...

and

...

half

...

begin

...

by

...

sequencing

...

from

...

the

...

U786

...

from

...

the

...

forward

...

read.

...

Additionally,

...

the

...

number

...

of

...

bases

...

before

...

the

...

primer

...

sequence

...

varies

...

from

...

4-9

...

bp.

...

Sequencing

Sequencing our construct on MiSeq is slightly different than standard Illumina MiSeq sequencing. Load the providedsample sheet, (which arbitrarily specifies a 250 paired-end run with an 8nt barcode read) and spike in 15uL of the anti-reverse BMC index primer @ 100uM (5' AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG 3') into tube 13 of the cartridge. This should provide three reads (forward, reverse and index) at 250, 8, 250 bp each.

De-multiplexing

Depending on whether your samples contains data from other projects that you don not want to process, you can demultiplex at various stages. Typically if there are multiple un-related projects in the same run, I will pull out all of the reads that map to the specific barcodes I am interested in first, so that I don't have to process extra data. You also have the option of removing unwanted barcodes at the qiime: split_libraries_fastq.py

...

step

...

by

...

providing

...

a

...

mapping

...

file

...

containing

...

only

...

the

...

barcodes

...

you

...

want.

...

This

...

makes

...

sense

...

if

...

you

...

if

...

you

...

will

...

eventually

...

work

...

with

...

all

...

of

...

the

...

data,

...

but

...

in

...

sets.

...

Otherwise,

...

if

...

you

...

will

...

never

...

need

...

to

...

work

...

with

...

the

...

other

...

data

...

in

...

the

...

lane,

...

it

...

doesn't

...

make

...

sense

...

to

...

process

...

it

...

at

...

all.

...

Run

...

this

...

program

...

to

...

parse

...

out

...

sequences

...

from

...

the

...

raw

...

data

...

according

...

to

...

the

...

following

...

order:

...

1.)

...

Single

...

barcodes

...

that

...

are

...

unique

...

in

...

the

...

sample.

...

These

...

are

...

possibly

...

control

...

samples

...

or

...

extra

...

barcodes

...

that

...

were

...

done

...

uniquely

...

and

...

are

...

the

...

only

...

piece

...

of

...

information

...

indicating

...

which

...

samples

...

that

...

sequence

...

came

...

from

...

2.)

...

Next,

...

it

...

looks

...

for

...

the

...

presence

...

of

...

the

...

forward

...

and

...

reverse

...

barcodes

...

in

...

the

...

forward

...

and

...

reverse

...

read.

...

These

...

are

...

typically

...

from

...

genome

...

sequences.

...

3.)

...

Finally,

...

it

...

looks

...

for

...

paired

...

data,

...

pulling

...

out

...

reads

...

that

...

have

...

both

...

the

...

forward

...

barcode

...

and

...

the

...

indexing

...

barcodes

...

that

...

match

...

input

...

samples.

...

All

...

other

...

samples

...

that

...

do

...

not

...

match

...

are

...

discarded.

...

Program:

...

parse_Illumina_multiplex2.pl

...

<Solexa

...

File1>

...

<Solexa

...

File2>

...

<mapping>

...

<output_prefix>

...

<mapping>

...

input

...

file

...

should

...

have

...

the

...

following

...

fields

...

(tab

...

delimited,

...

here's

...

an

...

example

...

file

...

):

...

Barcode

...

construction,

...

output

...

name,

...

forward

...

barcode

...

name,

...

forward

...

barcode

...

seq,

...

forward

...

barcode

...

orientation,

...

index

...

barcode

...

name,

...

index

...

barcode

...

seq,

...

index

...

barcode

...

orientation,

...

reverse

...

barcode

...

name,

...

reverse

...

barcode

...

seq,

...

reverse

...

barcode

...

orientation

...

Samples

...

with

...

the

...

same

...

output

...

name

...

will

...

be

...

in

...

the

...

same

...

file.

...

Barcode

...

construction

...

must

...

be

...

one

...

of

...

the

...

following

...

exact

...

fields:

...

single,

...

double

...

or

...

forbar.

...

Use

...

single

...

for

...

option

...

1

...

above

...

(single

...

barcodes

...

identify

...

the

...

samples),

...

the

...

2

...

The

...

output

...

should

...

be

...

forward

...

and

...

reverse

...

files

...

labeled

...

output_prefix.output_name.1

...

and

...

output.2

...

respectively.

...

These

...

can

...

be

...

used

...

as

...

the

...

fastq

...

files

...

in

...

downstream

...

processes.

...

Overlapping

...

the

...

reads

...

You

...

may

...

have

...

sufficient

...

length

...

to

...

overlap

...

the

...

forward

...

and

...

reverse

...

reads

...

to

...

create

...

a

...

longer

...

sequence.

...

This

...

process

...

will

...

be

...

time

...

consuming,

...

but

...

it

...

gains

...

phylogenetic

...

resolution

...

and

...

can

...

be

...

useful

...

for

...

many

...

applications.

...

We

...

use

...

SHE-RA,

...

which

...

was

...

created

...

to

...

have

...

a

...

sophisticated

...

calculation

...

of

...

quality

...

for

...

overlapped

...

bases,

...

given

...

the

...

quality

...

of

...

the

...

overlapped

...

bases

...

and

...

whether

...

or

...

not

...

they

...

match.

...

Other

...

software

...

exists

...

(and

...

is

...

faster),

...

but

...

will

...

do

...

multiple

...

things

...

at

...

once,

...

including

...

trimming

...

the

...

sequences

...

for

...

quality

...

and

...

will

...

not

...

provide

...

as

...

good

...

an

...

estimate

...

of

...

the

...

quality

...

of

...

the

...

overlapped

...

bases.

...

If

...

other

...

programs

...

are

...

used,

...

it

...

might

...

be

...

necessary

...

to

...

use

...

other

...

programs

...

to

...

de-multiplex

...

samples

...

after

...

using.

...

With

...

SHE-RA,

...

we

...

overlap,

...

then

...

re-generate

...

the

...

fastq

...

files

...

to

...

use

...

with

...

QIIME

...

split_libraries_fastq.py.

...

First,

...

divide

...

up

...

your

...

samples

...

into

...

about

...

1

...

million

...

reads

...

per

...

file.

...

This

...

can

...

typically

...

be

...

processed

...

on

...

our

...

computers

...

in

...

about

...

10

...

hours.

...

perl

...

~/bin/split_fastq_qiime_1.8.pl

...

131001Alm_D13-4961_1_sequence.fastq

...

100

...

131001Alm_D13-4961_1_sequence.split

...

perl

...

~/bin/split_fastq_qiime_1.8.pl

...

131001Alm_D13-4961_2_sequence.fastq

...

100

...

131001Alm_D13-4961_2_sequence.split

...

Then,

...

overlap

...

each

...

of

...

the

...

100

...

files

...

with

...

SHERA

...

where

...

$

...

{PBS_ARRAYID}

...

is

...

the

...

process

...

number

...

for

...

parallel

...

processing

...

perl

...

~/bin/SHERA_code/concatReads_1.8.pl

...

131001Alm_D13-4961_1_sequence.split.$

...

{PBS_ARRAYID}.fastq

...

131001Alm_D13-4961_2_sequence.split.$

...

{PBS_ARRAYID}.fastq

...

--qualityScaling

...

illumina

...

Filter

...

out

...

the

...

bad

...

overlaps

...

from

...

the

...

fa

...

and

...

quala

...

generated

...

with

...

SHERA:

...

perl

...

/mit/spacocha/bin/SHERA_code/filterReads.pl

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}

...

.fa

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.quala

...

0.8

...


Use

...

mothur

...

to

...

re-generate

...

the

...

fastq

...

files:

...

mothur

...

"#make.fastq(fasta=131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fa,

...

qfile=131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.quala)"

...

Now,

...

you

...

will

...

either

...

have

...

to

...

fix

...

the

...

index

...

file

...

to

...

contain

...

only

...

the

...

reads

...

in

...

your

...

file

...

(if

...

the

...

index

...

read

...

is

...

a

...

separate

...

file):

...

perl

...

~/bin/fix_index.pl

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.

...

fastq 131001Alm_D13-4961_3_sequence.fastq

...

>

...

131001Alm_D13-4961_3_${PBS_ARRAY_ID}.filter_0.8.

...

fastqOr,

...

if

...

you

...

have

...

to

...

generate

...

it

...

from

...

the

...

header

...

(if

...

the

...

index

...

is

...

already

...

present

...

in

...

the

...

header):

...

perl

...

/mit/spacocha/bin/fastq2Qiime_barcode2.pl

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq

...

>

...

131001Alm_D13-4961_1_sequence.split.index.filter_0.8.fastq

...


This

...

file

...

can

...

be

...

used

...

for

...

specific

...

header

...

configuration

...

where

...

the

...

fastq

...

files

...

look

...

like

...

this,

...

where

...

it

...

pulls

...

out

...

the

...

longest

...

string

...

of

...

base

...

letters

...

(ATGCKMRYSWBVHDNX)

...

after

...

the

...

#,

...

in

...

this

...

case

...

it

...

would

...

be

...

TGGGACCT

...

and

...

creates

...

a

...

false

...

quality

...

for

...

each

...

base

...

as

...

the

...

lower

...

case

...

of

...

each

...

barcode

...

letter:

...

@MISEQ:1:2106:21797:11095#TGGGACCT_204bp_199.2_0.90

...

TGTAGTGCCAGCCGCCGCGGTAATACGTAGGTGGCGAGCGTTGTTCGGATTTATTGGGCGTAAAGGGTCCGCAGGGGGTT

...


CGCTAAGTCTGATGTGAAATCCCGGAGCTCAACTCCGGAACTGCATTGGAGACTGGTGGACTAGAGTATCGGAGAGGTAA

...


GCGGAATTCCAGGTGTAGCGGTGGAATGCGTAGATATCTGGAAGAACACCGAAAGCGAAGGCAGCTTACTGGACGGTAAC

...


TGACCCTCAGGGACGAAAGCGTGGGGATCAAACAGGATTAGAAACCCCTGTAGTCC

...

Result

...

is:

...

@MISEQ:1:2106:21797:11095#TGGGACCT_204bp_199.2_0.90

...

TGGGACCT

...

+

...

tgggacct

De-multiplex

...

Now

...

the

...

re-created

...

fasta

...

and

...

index

...

read

...

can

...

be

...

used

...

as

...

normal

...

with

...

QIIME

...

or

...

other

...

software

...

of

...

your

...

choice:

...

split_libraries_fastq.

...

py  -i

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.filter_0.8.fastq

...

-m

...

mapping_file.

...

txt  -b

...

131001Alm_D13-4961_1_sequence.split.${PBS_ARRAYID}.index.filter_0.8.fastq

...

--barcode_

...

type 8 --rev_comp_barcode

...

--min_per_read_length

...

.8

...

-q

...

10

...

--max_bad_run_length

...

0

...

-o

...

 unique_output_${PBS_ARRAYID}

...

--phred_offset

...

33

...

This

...

will

...

create

...

a

...

seqs.fna

...

file

...

which

...

can

...

be

...

used

...

in

...

down

...

stream

...

analysis.