Another great FiB podcast produced by the TWiT Network, and Mark Pelletier from Yale School of Medicine. He assembled a panel of scientists and engineers to discuss the future of biotechology: Ed DeLong from MIT represented the field of Metagenomics, Drew Endy (MIT) the field of Engineering or synthetic genomics, John Bergeron (McGill University) the field of proteomics and Lee Hood (Institute for Systems Biology in Seatle) the field of systems biology.
The panel discusses how metagenomics, synthetic genomics, proteomics are all converging towards a better understanding at the system level of the environment or the human body.
Another great podcast that puts everything into perspectives. Sciences as it is practiced today is a real multidisciplinary approach and true convergence of all these disciplines will make major advances toward a better understand of the planet or the human body as a global ecosystems.
7.02.2007
5.25.2007
NIH Roadmap - The Human Microbiome Project approved!
Toronto - 5/23/07: Tonight I attended the ASM satellite meeting on the Human Microbiome Project where Jane Peterson from NHGRI told us that on May 18, 2007 the Institutes and Centers' directors met to review the Roadmap proposals and that the Human Microbiome Project (HMP) was approved for immediate implementation as five year program. An announcement is on the NIH web-site.
She obviously did not mention anything about level of funding, but said that RFAs will be out late summer/early fall for funding in the summer 2008. This is really good news. Jane continued by giving a report on the NIH Roadmap workshop on the Human Microbiome Project that took place in April. One of the thing that was somewhat disappointing, but again those a just recommendations, is the way the "reference microbiome" is proposed to be established: 100 subjects (which she called a significant number) and 30-100 cavities, by "light coverage" 16S rRNA sequencing. I feel that this approach might not be appropriate and that may be a more appropriate way to start this project is to use rapid and inexpensive methods such as T-RFLP or the PhyloChip to type the microbial communities of a very large number of subjects (>1000). Such a large number of subjects should allow for clustering of community types, hence helping in selecting representative subjects from each clusters for further analysis (deep 16S rRNA gene sequencing and analysis). The selected subjects will represent larger group of individuals and the findings will be applicable to those groups as well. It is important to understand the distribution of human associated community types in the human population prior to selecting people for metagenomics analysis. Her report included recommendations from the workshop attendees on technology development (sequencing technologies and "site-specific detection and dynamic measurement of microbes at the levels of DNA, RNA and protein, metabolites, cells and their three dimensional complexes" (in situ imaging of single cells, improved culturing and sample preparation methods, as well as proteomics and other -omics technologies). She also stressed out the recommendations on bioinformatics needs, especially the development of metadata standards and data release policies. The study of diseased state was also recommended. One issue that people do not thing much about are ethical issues related to such projects. Questions of privacy and ownership are very important and will need to be addressed. Jane stressed out that this project should include the participation and coordination with the international community to reduce redundancy and share information, challenges and solutions. This is going to be a very exciting project.
The panel of speakers included Jo Handelsman, David Relman, George Weinstock and Jane Peterson. Jo Handelsman reported on the NRC report on Metagenomic, which is now available online, David Relman discussed his experiences with studying the human microbiome, and George Weinstock gave a very rosy picture of the use of 454 sequencing and other next-generation sequencing technologies (Solexa and AB SOLiD technologies)for de novo microbial genome sequencing. I do not share his enthusiasm on this, especially when it comes to apply these technologies to metagenomics. Overall, it was a great session, too bad it was taking place on Thursday night, the attendance was somewhat low, as lots of people have already left Toronto.
She obviously did not mention anything about level of funding, but said that RFAs will be out late summer/early fall for funding in the summer 2008. This is really good news. Jane continued by giving a report on the NIH Roadmap workshop on the Human Microbiome Project that took place in April. One of the thing that was somewhat disappointing, but again those a just recommendations, is the way the "reference microbiome" is proposed to be established: 100 subjects (which she called a significant number) and 30-100 cavities, by "light coverage" 16S rRNA sequencing. I feel that this approach might not be appropriate and that may be a more appropriate way to start this project is to use rapid and inexpensive methods such as T-RFLP or the PhyloChip to type the microbial communities of a very large number of subjects (>1000). Such a large number of subjects should allow for clustering of community types, hence helping in selecting representative subjects from each clusters for further analysis (deep 16S rRNA gene sequencing and analysis). The selected subjects will represent larger group of individuals and the findings will be applicable to those groups as well. It is important to understand the distribution of human associated community types in the human population prior to selecting people for metagenomics analysis. Her report included recommendations from the workshop attendees on technology development (sequencing technologies and "site-specific detection and dynamic measurement of microbes at the levels of DNA, RNA and protein, metabolites, cells and their three dimensional complexes" (in situ imaging of single cells, improved culturing and sample preparation methods, as well as proteomics and other -omics technologies). She also stressed out the recommendations on bioinformatics needs, especially the development of metadata standards and data release policies. The study of diseased state was also recommended. One issue that people do not thing much about are ethical issues related to such projects. Questions of privacy and ownership are very important and will need to be addressed. Jane stressed out that this project should include the participation and coordination with the international community to reduce redundancy and share information, challenges and solutions. This is going to be a very exciting project.
The panel of speakers included Jo Handelsman, David Relman, George Weinstock and Jane Peterson. Jo Handelsman reported on the NRC report on Metagenomic, which is now available online, David Relman discussed his experiences with studying the human microbiome, and George Weinstock gave a very rosy picture of the use of 454 sequencing and other next-generation sequencing technologies (Solexa and AB SOLiD technologies)for de novo microbial genome sequencing. I do not share his enthusiasm on this, especially when it comes to apply these technologies to metagenomics. Overall, it was a great session, too bad it was taking place on Thursday night, the attendance was somewhat low, as lots of people have already left Toronto.
5.10.2007
The Human Microbiome Project at ASM
The ASM general meeting will take place in Toronto, May 21-25. George Weinstock from Baylor University in Texas is organizing a Satellite Symposium on the Human Microbiome project. David Relman and Jo Handelsman will talk. I think that the most interesting will be the Q&A session that will follow their presentations.
Jane Peterson from NHGRI will present a report from the NIH Roadmap workshop on the Human Microbiome project. I think it should be a very exciting evening. The symposium will be held at the Toronto Convention Center, on May 24 from 6:30 to 9:00PM. I'll be there!
4.24.2007
Podcast features Metagenomics
I recently came across a podcast entitled "Future in Biotechnology" from the TWiT.tv podcast network, this week in technology (TWIT). Marc Pelletier, a canadian from Montreal, who is now working at Yale School of Medicine, interviews scientists that are making major impact in science and technology. There are two episodes I would most recommend listening to. The first episode is with Drew Endy, from MIT, a founder of the private synthetic biology company Codon Devices, is also a founder of the MIT-based BioBricks Foundation. The foundation promotes an open-source registry of modular DNA parts that can be used like Legos for the creation of designer organisms. Already more than 300 such parts are registered on the Web site. The BioBricks Foundation also advocates for the responsible and ethical use of synthetic biology. Drew gives amazing talks and his interview is as entertaining as his talks, with great science.
The second episode is an interview with Ed Delong, also from MIT, talking about environmental metagenomics. The episode nicely follows Drew Endy's interview, and Ed DeLong does a great job a linking the two. This interview could be use as an educational tool as part of a lecture on metagenomics and its implications in studying the environments. Ed Delong is one of the pioneer in environmental metagenomics. Definitively worth listening.
Also, if you are like me and enjoy everything about the Macintosh, check out the other TWiT podcasts. Leo Laporte knows a lot abut technology and he is very entertaining. I would recommend listening to MacBreak Weekly for Mac news and views.
I attended the NIH sponsored workshop on the Human Microbiome Project, and I am preparing a report. Stay tune. In the meantime, if you can read the views of Jonathan Eisen or Steven Salzberg.
The second episode is an interview with Ed Delong, also from MIT, talking about environmental metagenomics. The episode nicely follows Drew Endy's interview, and Ed DeLong does a great job a linking the two. This interview could be use as an educational tool as part of a lecture on metagenomics and its implications in studying the environments. Ed Delong is one of the pioneer in environmental metagenomics. Definitively worth listening.
Also, if you are like me and enjoy everything about the Macintosh, check out the other TWiT podcasts. Leo Laporte knows a lot abut technology and he is very entertaining. I would recommend listening to MacBreak Weekly for Mac news and views.
I attended the NIH sponsored workshop on the Human Microbiome Project, and I am preparing a report. Stay tune. In the meantime, if you can read the views of Jonathan Eisen or Steven Salzberg.
3.27.2007
Affymetrix Microbiology Symposia Series
A little bit of self publicity! I will be giving a webinar on Thursday March 29th, 2007 on the work we have done with Affymetrix tiling arrays. This webinar series is sponsored by Affymetrix and highlight creative use of their technology.
We have been working with Affymetrix tiling arrays for about 3 years now, with great succcess. We have designed two tiling arrays for Bacillus anthracis, the causative agent of anthrax. One of the major problem working with such custom-designed arrays is that Affymetrix does not provide any software support for data display and analysis. I'll be describing the array design and a series of algorithm for data analysis, in particular TadPol (Tiling Array Discovery of Polymorphism). TadPol identifies polymorphic regions (SNPs and INDELs) from comparative genome hybridization data. The data is used for genotyping.
If you are interested, here is the web link to register for the webinar. It will be my first webinar, so I'm not sure how it turn out, but it should be an interesting experience!
Here is the announcement:
SNPs, Chips and Transcriptomics in Bacillus anthracis
Thursday, March 29, 2007, 9:00 am (PDT)
Participate in a conference call seminar and Q & A featuring Jacques Ravel, Ph.D. from The Institute for Genomic Research
Dr. Ravel will discuss his research using a custom-designed Affymetrix GeneChip® array that tiles the entire 5.5 Mb genome of Bacillus anthracis Ames Ancestor with 2.1 million overlapping 25-mer oligonucleotides on a single array. Examples of genotyping, polymorphism discovery (SNPs and INDELs), transcript mapping and expression studies will also be discussed. REGISTER FOR SYMPOSIUM
We have been working with Affymetrix tiling arrays for about 3 years now, with great succcess. We have designed two tiling arrays for Bacillus anthracis, the causative agent of anthrax. One of the major problem working with such custom-designed arrays is that Affymetrix does not provide any software support for data display and analysis. I'll be describing the array design and a series of algorithm for data analysis, in particular TadPol (Tiling Array Discovery of Polymorphism). TadPol identifies polymorphic regions (SNPs and INDELs) from comparative genome hybridization data. The data is used for genotyping.
If you are interested, here is the web link to register for the webinar. It will be my first webinar, so I'm not sure how it turn out, but it should be an interesting experience!
Here is the announcement:
SNPs, Chips and Transcriptomics in Bacillus anthracis
Thursday, March 29, 2007, 9:00 am (PDT)
Participate in a conference call seminar and Q & A featuring Jacques Ravel, Ph.D. from The Institute for Genomic Research
Dr. Ravel will discuss his research using a custom-designed Affymetrix GeneChip® array that tiles the entire 5.5 Mb genome of Bacillus anthracis Ames Ancestor with 2.1 million overlapping 25-mer oligonucleotides on a single array. Examples of genotyping, polymorphism discovery (SNPs and INDELs), transcript mapping and expression studies will also be discussed. REGISTER FOR SYMPOSIUM
3.12.2007
The Shortest Genome Sequence Paper!
The JGI Los Alamos National Lab has just published one of the shortest (if not the shortest) genome sequence paper. The paper is published online ahead of print in the Journal of Bacteriology and report on the genome sequence of Bacillus thuringiensis Al Hakam, a strain isolated in Iraq by the United Nation Special Commission. The author list is about 1 and 1/2 pages and the text not much longer. This paper was published with the sole intention to announce the release of the sequence in GenBank. There is no science beside the fact that the cry, cyt and vip genes (toxins genes) were not found, did you need the genome for get to that conclusion?
Is this where the future of genome sequence papers is heading? Announcements for GenBank Release. I'm not sure if I like this.
We are now sequencing several strains of the same species and I think it can become somewhat of a problem because some of the sequences turn out to not be very informative. But are we selecting the strain to sequence rationally? A lot of work goes into sequencing, assembling, closing, annotating and analyzing a genome. If nothing novel or interesting comes out of this, that is a lot time and effort wasted without returns for the people involved in terms of papers. I think this work should still be published. I had suggested a while back to convince a journal (PLoS?) to have one issue (or part of) or one section in each issue dedicated to genome sequences, just like NAR does for databases. I think these types of paper would fit better in that context. As a standalone paper in the middle of a new issue of the Journal of Bacteriology might not be the best use of journal pages. What about a community page, just like in PLoS Biology? Wouldn't it be a better format? I think it is important to recognize the work that has gone into sequencing and analysis a genome, but what is the best format? Genome sequences are still very important resources that the scientific community uses. Are we getting to the point where a genome sequence is to be released into GenBank and does not warrant a publication unless its analysis advances scientific knowledge? Just what happened to gene sequences not that long ago!
Is this where the future of genome sequence papers is heading? Announcements for GenBank Release. I'm not sure if I like this.
We are now sequencing several strains of the same species and I think it can become somewhat of a problem because some of the sequences turn out to not be very informative. But are we selecting the strain to sequence rationally? A lot of work goes into sequencing, assembling, closing, annotating and analyzing a genome. If nothing novel or interesting comes out of this, that is a lot time and effort wasted without returns for the people involved in terms of papers. I think this work should still be published. I had suggested a while back to convince a journal (PLoS?) to have one issue (or part of) or one section in each issue dedicated to genome sequences, just like NAR does for databases. I think these types of paper would fit better in that context. As a standalone paper in the middle of a new issue of the Journal of Bacteriology might not be the best use of journal pages. What about a community page, just like in PLoS Biology? Wouldn't it be a better format? I think it is important to recognize the work that has gone into sequencing and analysis a genome, but what is the best format? Genome sequences are still very important resources that the scientific community uses. Are we getting to the point where a genome sequence is to be released into GenBank and does not warrant a publication unless its analysis advances scientific knowledge? Just what happened to gene sequences not that long ago!
3.09.2007
Hawkeye: Finally a tool for assembly validation!
This is related to my first posting. Mike Schatz, a very talented graduate student in Computer Sciences at The University of Maryland College Park Center for Bioinformatics and Computional Biology (CBCB) working with Steven Salzberg, has published today a really good paper entitled "Hawkeye: a visual analytics tool for genome assemblies" in Genome Biology. Finally a tool that allows any scientist to evaluate the quality of a genome assembly. The paper describes the capabilities of the software, including assembly vizualization (including traces), assembly analysis (assembly statistics), contig viewer and scaffold viewer among others. The software is amazingly fast and doesn't slow down when working with large genomes and allows for assembly problems diagnotics and validation. Mike even describes how Hawkeye can be use for biological investigation. Using Hawkeye, Mike was able to identify the assembly of the 6 plasmids harbored by Bacillus megaterium one of my project.
I think this software should become a standard which reviewers, editors... could use to validate genome assemblies prior to publication. Now, all we need is the genome sequencing scientific community to agree on data release standards, more than the consensus sequence will be needed!
This paper is published as "Open Access" and Hawkeye is freely available as part of the AMOS package on sourceforge. Steven Salzberg's group published another paper in the same issue of Genome Biology on "computional discovery of Rho-independant trasncription terminators" as well as a very interesting editorial on "Genome re-annotation: a wiki solution?", unfortunately not as Open Access.
One more thing, Hawkeye works on a Mac. A huge plus to me!!
I think this software should become a standard which reviewers, editors... could use to validate genome assemblies prior to publication. Now, all we need is the genome sequencing scientific community to agree on data release standards, more than the consensus sequence will be needed!
This paper is published as "Open Access" and Hawkeye is freely available as part of the AMOS package on sourceforge. Steven Salzberg's group published another paper in the same issue of Genome Biology on "computional discovery of Rho-independant trasncription terminators" as well as a very interesting editorial on "Genome re-annotation: a wiki solution?", unfortunately not as Open Access.
One more thing, Hawkeye works on a Mac. A huge plus to me!!
Microbial Genome News on Technorati
I have now registered the Microbial Genome News on Technorati.
Technorati Profile
You can view my profile on Technorati by clicking on the link above.
Technorati Profile
You can view my profile on Technorati by clicking on the link above.
3.08.2007
Quality Standards for Genome Sequence
I have been involved in sequencing and analyzing microbial genomes for over 5 years. I have been very fortunate to author and co-author a series of papers describing the genome sequence analysis of several organisms. During the publication submission/review process there has always been one thing that surprised me; that is the fact that none of the journals or the editors handling the papers have ever required to see or check the sequence itself or more importantly the quality of the assembly. For example, I have never been asked to provide information about the sequence read coverage over the entire genome, quality of each base pairs in the genome sequence, what criteria were used to define the quality of the sequence or even the genome sequence itself!
The reality is that there are no universal quality standards for genome sequences.
Due to an ever-decreasing cost, microbial whole genome sequencing is now performed in many different sequencing centers and individual groups. This expansion is a good thing as genomics just like molecular biology not long ago, is becoming a tool for scientist to address fundamental questions. However, I believe that this expansion should not come to the expense of quality. While sequence data from genome projects is made available to all scientists through GenBank, it is no longer sufficient for just the final consensus sequence to be made available on project publication. The assembly and the quality scores that underlie each base call within the consensus sequence must also be made available. This information is critical to evaluate the overall quality of the work.
The NCBI Assembly Archive is a resource where sequence data, quality scores, sequence chromotograms (even pyrosequencing flows) and assembly data can be uploaded to a publicly accessible central repository. However, this or any other such initiative can only be successful if the scientific community populates the repositories with both finished and draft genome sequence data in compliance with an accepted community standard. To date, only a few sequencing center have deposited data into the Assembly Archive (TIGR, JCVI, IMBGA and CRA).
The broader microbial genomics community, together with scientific funding and publishing bodies need to meet and develop new data release standard for microbial genomes. This is even more important has new sequencing technologies are driving cost down, making it affordable for many to sequence genomes. These standards need to account for these new technologies.
For any particular genome, these standards could embrace the timely release of trace, contig and associated quality scores as well as the consensus sequence, using the timeline agreed to by the sequencing center and the funding agency for that genome project. The standard could also define the minimum sequence quality and coverage required for the release or publication of complete and draft microbial genome sequence data. There is an enormous amount of sequence data pending, planned and anticipated. Complete and open access to the underlying quality information, as well as the consensus sequence, will be needed to best capitalize on this forthcoming deluge.
The reality is that there are no universal quality standards for genome sequences.
Due to an ever-decreasing cost, microbial whole genome sequencing is now performed in many different sequencing centers and individual groups. This expansion is a good thing as genomics just like molecular biology not long ago, is becoming a tool for scientist to address fundamental questions. However, I believe that this expansion should not come to the expense of quality. While sequence data from genome projects is made available to all scientists through GenBank, it is no longer sufficient for just the final consensus sequence to be made available on project publication. The assembly and the quality scores that underlie each base call within the consensus sequence must also be made available. This information is critical to evaluate the overall quality of the work.
The NCBI Assembly Archive is a resource where sequence data, quality scores, sequence chromotograms (even pyrosequencing flows) and assembly data can be uploaded to a publicly accessible central repository. However, this or any other such initiative can only be successful if the scientific community populates the repositories with both finished and draft genome sequence data in compliance with an accepted community standard. To date, only a few sequencing center have deposited data into the Assembly Archive (TIGR, JCVI, IMBGA and CRA).
The broader microbial genomics community, together with scientific funding and publishing bodies need to meet and develop new data release standard for microbial genomes. This is even more important has new sequencing technologies are driving cost down, making it affordable for many to sequence genomes. These standards need to account for these new technologies.
For any particular genome, these standards could embrace the timely release of trace, contig and associated quality scores as well as the consensus sequence, using the timeline agreed to by the sequencing center and the funding agency for that genome project. The standard could also define the minimum sequence quality and coverage required for the release or publication of complete and draft microbial genome sequence data. There is an enormous amount of sequence data pending, planned and anticipated. Complete and open access to the underlying quality information, as well as the consensus sequence, will be needed to best capitalize on this forthcoming deluge.
Subscribe to:
Posts (Atom)