This information is specific for the 16S Illumina Libraries. Multiplexed genome libraries should follow the information for the genome barcodes.
Outline:
In order to multiplex more than 96 samples into a lane, a forward barcode is required. This is because the reverse barcodes cost a lot of money to make, and you can get more bang for your buck using the same reverse barcodes again with a different forward barcode. The forward barcode is a 5 bp sequence before the U515 F primer sequence. The forward primer must also include homology to the second step forward primer sequence. The entire construct is depicted in Fig S1.pdf.
Forward Primer Barcode Sequences:
These are the forward barcode sequences that we currently have are here: Manually_copied_forward_barcodes.xls
- From Ilana Brito from Alm lab (posted by Sarah Preheim)
Protocol for library whole genome construction
- Shear DNA by sonication. Make sure your sample is in 50ul of solution. Start with 2-20ug of DNA. Fill BioRupter with water (upto .5 inches from line) and then ice upto line. Do 6 cycles, replace ice. Repeat for a total of 18-20 cycles of 30seconds on/off with “H” setting. Average 200-400 base pairs.
- 2. End-repair
- Blunt and 5’-phosporyate the DNA from step 2 using Quick blunting kit.
- Mix:
sheared DNA (2μg) 45.5μl
10x Blunting Buffer 6μl
1mM dNTP Mix 6μl
Blunt enzyme mix 2.5μl
TOTAL 60μl
- Incubate at RT for 30 minutes
- Purify using Qiagen MinElute column (these are kept in the fridge.) Elute in 12μl.
- 3. Ligate Solexa adaptors
- Solexa adapters must be hybridized before use. Heat to 95 for 5 minutes, cool slowly to Room temperature.
- Ligate adaptors, using a 10x molar excess of each one, and as much DNA as possible.
- Mix:
End-repaired DNA 10μl (12.5 pmol)
100μM IGA adapter A# 1.25μl (125 pmol)
100μM IGA adapter B#-PE 1.25μl (125 pmol)
2X Quick Ligation Reaction Buffer (NEB) 15μl
Quick T4 Ligase (NEB) 2.5μl
TOTAL 30μl
- Incubate at RT for 15 minutes.
- 4. Size selection and purification using SPRI beads.
- Mix DNA and beads to appropriate ratio: 0.65X SPRI beads: Add 19.5 μl of SPRI beads to 30μl reaction from step 3.
- Incubate at RT for 20 minutes.
- Place tubes on magnet for 6 minutes.
- Transfer all SN to new tube. Discard beads.
- Mix DNA and beads to appropriate ratio, 1X SPRI beads: Add 10.5 μl SPRI beads to 49.5μl reaction.
- Vortex, spin.
- Incubate at RT for 7-20 minutes.
- Place tubes on magnet for 6 minutes.
- Remove all SN, keep beads.
- Wash with 500μl 70% EtOH, incubate for 30 seconds, remove all SN.
- Repeat: Wash with 500μl 70% EtOH, incubate for 30 seconds, remove all SN.
- Let dry completely for 15 minutes. Remove from magnet.
- Elute in 30μl EM.
- Vortex.
- Incubate at RT for 2 minutes.
- Put on magnet for 2 minutes
- Transfer SN to new tube.
- 5. Nick translation
- Bst polymerase can be used for nick translation---it can be used at elevated temperatures which is good for melting and secondary structures and lacks both 3’-5’ and 5’3’ exonuclease activity.
- Mix:
Purified DNA 14 μl
10X Buffer (NEB) 2μl
10mM dNTPs 0.4μl
1mg/ml BSA 2μl
Water 0.6μl
Bst polymerase (Enzymatics) 1μl
TOTAL 20μl
- Incubate at 65 degrees, 25 minutes.
- 6. Library Enrichment by PCR.
- Perform 2 25μl reactions: (100μM primer)
- Mix:
H2O 19.125μl
5X Pfu Turbo buffer 5μl
dNTPs 10mM 0.5μl
40μl Solexa PCR-A-PE 0.25μl
40μl Solexa PCR-B-PE 0.25μl
SybrGreenI 0.125μl
Nick-translated DNA 2μl
Pfu Turbo 0.25μl
TOTAL 25μl
- Program:
- 95˚C 120sec
- 95˚C 30sec
- 60˚C 30sec
- 72˚C 60sec
- 95˚C 120sec
- Go to step 2 34 more times.
- 72˚C 5 min
- 4˚C Forevair
- These 2 reactions are to check cycle time only. Look at the melting curves---use the mid-log point to pick the ultimate cycle time.
- Prep PCR as above, but in 2 100μl reactions using 8μl of sample in each, and cycle with cycle number.
- Mix:
H2O 77μl
5X Pfu Turbo buffer 20μl
dNTPs 10mM 2μl
40μl Solexa PCR-A-PE 1μl
40μl Solexa PCR-B-PE 1μl
Nick-translated DNA 8μl
Pfu Turbo 1μl
TOTAL 100μl
- Run on a QIAElute column. Elute in 50ul. (You could also do a single SPRI---check the ratios of beads to reaction volume)
- Analyze using Bioanalyzer.
Overview
We have designed barcodes to multiplex samples together in a single Illumina lane. Currently, only three reads supported by illumina, a forward read, a reverse read and a barcode read. However, we had incorporated an additional barcode read into the first read as well. The current design outlined in Fig S1.pdf.
Designing Illumina amplicon libraries
Any PCR amplicon (16S, TCR-beta, etc.) can be used with this scheme, since it was designed to be modular. The first step primers must contain the following:
1.) The genomic DNA primer binding sites (to attach and extend the PCR product)
2.) The forward primer must contain some site diversity. This diversity is important for cluster identification and having the first read begin with conserved primer sequence will severely impact the quality of the data. In Fig_S1.pdf, this diversity region is a string of YRYR (N's can not be used -with IDT anyway- unless specifying an equal ratio of the four bases and that might be costly). However, you can additionally add another set of barcodes and order different step one primers with the forward barcodes attached. The only caveat with this method is that you need at least four different forward barcodes in one lane to get enough diversity. The barcodes should be relatively evenly added to the sample in a ratio of 1:1:1:1 of each barcode. More than four barcodes in the forward read should increase the quality of the calls.
Specs
Here are specs for the most recent reverse barcodes:
Uri Laserson_6957574_6123588.XLS
In addition, there are 9 additional barcodes outside of the 96 in the plate above: 097-105. These can be used for multiplexing mock or control samples into your lane separately.
Name |
Sequence |
PE-IV-PCR-097 |
CATTTCGCT |
PE-IV-PCR-098 |
TTGCTCGTG |
PE-IV-PCR-099 |
TCCGCTCAC |
PE-IV-PCR-100 |
CCCAACAAA |
PE-IV-PCR-101 |
GCAGACCAA |
PE-IV-PCR-102 |
TGGCGATAT |
PE-IV-PCR-103 |
TGGTTCTGC |
PE-IV-PCR-104 |
GGTACGAGT |
PE-IV-PCR-105 |
ACCCGTTCG |
Overview
In order to make this site more useful to all, here are a few tips on how to use this site and how to make a Wiki blog post so others will be able to find the information they need. This should make the system more user friendly for all.
How to use this site to find information
Labels heatmap:
In order to find information that you want, look at the bottom of the home page for the specific labels containing the words that you are interested in. For example, if you want to learn about how to process raw fastq data from an Illumina library, you might start by clicking the Illumina label at the bottom of the home page.
Theme pages:
Alternatively, there are some theme pages which list all posts that have a particular theme. You can go to the Bioinformatics link on the left had site of the home page to get to this page. This page lists all of the posts with the Bioinformatics label. Look through all of those posts.
Search box:
You can also search the whole Wiki from the search box at the top right.
How to add information to this site
Blog posts:
The easiest way to have the pages be self-organizing is to input your information as blog posts and use the appropriate labels. When choosing labels to use, consider each word as a meaningful label (if you want to put a space in a term like 16S library use an underscore as 16S_library since library in itself might not be the most useful term). Try to use labels that other have already chosen if possible but add new labels if it seems appropriate.
Theme pages:
In addition to the list of all labels on the front page, it might be nice to make pages that have similar themes. Some of these major themes are already there, but feel free to add a theme when it is appropriate (for example, if there are any sampling protocols, we might want to make a field work page or something).
Additional Information:
For those of you who have interest, explore all of the options available on the Wiki, which includes calendars and the like. If you have a need, please use these tools to increase the utility of this site.
How to make a blog post
It's easy to make a blog post. Just go to the top right hand corner of any page where it says "Add" and choose "Blog Post". The great thing is that you can add attachments, links and images to the post with the insert button. There are also a lot of great macros to choose from if you have something specific in mind you might be able to find one. But, most importantly, you should add the labels at the bottom under "Labels:". This is an important step in order for the site to be self-organizing and for information to be readily accessible to others. You can add pages as well, but this might take some organization. Pages and blogs seem to me to be identical in the way they are created, so the same things apply if you want to make a page.
Thanks for sharing your expertise with the group!
Overview:
The first step to begin processing raw data from 16S Illumina libraries is to split the multiplexed libraries, trim the sequences to the same length and filter for quality. I've found that trimming to the same length is best, because the same sequence might cluster differently depending on what length it is. I've also included the parameters things that I've found most useful. This also makes a first pass OTU classification at 97% to a reference database (closed-reference OTU picking). This is a good for taking a first pass look at your data.
Process:
The easiest way to process fastq files from 16S Illumina data is to use Qiime. Follow the link for a tutorial:
http://http://qiime.org/tutorials/tutorial.html
This is installed on the darwin cluster (beagle) through the command:
module add qiime-default
Only the dependencies required for the default Qiime pipeline are loaded. Other tools might not be available.
To process the data, I've written a script that can automate the process. The script needs to be edited to include the path to the solexa file etc.
This can be submitted to the cluster with the following command (the RunQiime.csh file should be attached):
qsub -cwd RunQiime.csh
Below I've elaborated on what each of the variables mean:
#define your file paths
#where is the fastq file you want to process
SOLFILEF=
This is the path to your solexa file as a fastq file.
#Where it the associated mapping file
MAPFILE=
The mapping file is expanded on in the Qiime webpage tutorial above.
#the oligo file
OLIGO=
This is a file required by mothur to remove the primer and diversity region from the library construct. Mothur looks for the exact sequence, and only keeps sequences when the primer sequence is found. As a note, if you are using Phusion or another proof-reading polymerase, mismatches in the primer site within 4 bases from the 3' end are reverted to the template sequence (actually, NEB says that it will change anything within 10 bps of the 3' end), which would then be consequently discarded at the trim.seqs step. I typically change the last 4 from the 3' end of the primer to N's to be safe. Filtering sequences that don't match the primer sequence has been shown to improve the quality of the data (Zhou_2011.pdf).
#where the programs are called from
BIN=/data/spacocha/bin
This is a folder where you should put the following files:
#where you want to put the output of the analysis
UNIQUE=
It's a unique name for this analysis. It can include folders (which must exits), but the final extention is a pre-fix for all files.
#reference fasta file (latest greengenes OTUs)
REFERENCEFA=
These can be downloaded from the top-right corner of the Qiime blog homepage
#reference taxonomies
REFERENCETAX=
This will be in the same download as above, only in the taxonomy folder.
Output:
The output files that are created are the following (where the UNIQUE would be replaced by the variable defined above):
UNIQUE_output/ucrC/seqs_otus.mat-This is a matrix of your OTUs, mapped to the reference
UNIQUE_output/seqs.trim.names.fasta
-These are the trimmed, cleaned fasta sequences that were clustered.
UNIQUE_output/split_library_log.txt
-This contains stats about how your libraries are processed
Additional Notes:
Please let me know if there is anything that you don't understand about this process. I am happy to help. I think everything that you need is there, but I might have missed something.