Friday SNPpets

27 January, 2012 (09:10) | SNPpets | By: Mary

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • RT @BGI_Events: Important question, & great primer RT @deannachurch: From @kbradnam:. ‘When is a genome finished?’ http://tinyurl.com/7rypaxe  //nice! [Mary]
  • Because of meeting @papermantis at @scio12 conference, today I am checking out PATRIC (or Pathosystems Resource Integration Center) which describes itself as “providing rich data and analysis tools for all bacterial species in the selected NIAID category A-C priority pathogens list.” So far I am impressed & looking forward to digging into it deeper as I have time. [Jennifer]
  • @OpenHelix: DNA Day essay contest announced! Via @DNAday Also teaching+learning resources from @GeneticsSociety http://t.co/PMEOyQt1 ht @geneticmaize [Mary]
  • Things that make you go hmmmm….. RT @phylogenomics: Am wondering – will GINA cover studies of microbes living in and on people http://t.co/JaWuFTKN #UCDCitSci [Mary]
  • Interesting example of why integration of data across resources is hard. Chemistry issue, but true of all sorts of bio and gene related things. Hat tip Antony Williams on G+. See the post Challenges of data integration. [Mary]
  • FYI, from ExPASy News: “Due to maintenance work, many ExPASy proteomics services will be unavailable from Sunday January 29th to Wednesday February 1st, 2012. These services include PROSITE, ENZYME, Protein Spotlight, World-2DPAGE, Swiss-2DPAGE and proteomics tools such as ProtParam, Compute pI/MW.” [Jennifer]
  • The watermelon map! Woot! @francfue: A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome http://t.co/PkNQIcY0 [Mary]
  • Testify! @KamounLab: “Given proper training and demystification biologists are perfectly capable of working their own #bioinformatics” http://t.co/OJ9IWMgV [Mary]
  • RT @mary_carmichael: Start of the Human Circuit Project? @broadinstitute launches effort to catalog all biochemical wiring in human cells: http://t.co/tm8wnMDa” [Mary (not Carmichael)]
  • More #scio12 goodies: @genome_gov: Check out ‘science online’ genomic medicine session on wiki: http://t.co/sKjbjfml Thanks @MishaAngrist [Mary]
  • RT @yokofakun: [delicious] miRdSNP: a database of disease-associated SNPs and microRNA target sites on 3′UTRs of human genes #t… http://t.co/Ccq1seEL [Mary]

 

What’s the answer? (web site usage stats)

26 January, 2012 (09:13) | What's the Answer? | By: Mary

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question:

Major bioinformatics website usage statistics

Does anyone know of links to the web usage statistics over time of the following major bioinformatics portals?

  • The UCSC Genome Browser
  • Ensembl
  • Galaxy
  • NCBI
  • any others that I’ve left off this list

Many thanks, Casey

–Casey Bergman

There is a selected answer that has some great data on the UCSC Genome Browser usage, but there’s still room for more answers if you have some data on these publicly available sites. Have a look, and share your knowledge if you have some links.

Angry at the Genome? Sigh. I rant.

25 January, 2012 (12:40) | General Science, Genomics Research | By: Mary

If you had a healthy, safe, and nourishing childhood, you probably remember this fondly for many reasons. But among the most basic of reasons is that it was simple. Answers were easy: food comes from mom, the house is just there, and friends were all doing the same thing as you were–school, play, homework.

It would be very cool if life was simple. But it’s not. And neither is biology. But I think that being “Angry at the Genome” is unfair to the genome.

I understand that we all hope for simple answers, and simple solutions to medical problems. But only cranks can promise that (buy my product to clear your toxins and prevent cancer! vaccines and GMOs cause autism! chiropractors fix your DNA! optimize your DNA with quantum something!).

The genome had millions of years to get to the state that it’s in today. And humans have done a lot of outbreeding, unfortunately–er, well, from a controlled experimental perspective, that is. To presume that we should have figured out human health in the last hundred–and especially the last 10 years–is really simplistic. And wishful thinking.

We have made–and continue to–make progress. We continue to refine models. We continue to improve technology and generate more data. And I’m sorry that your medical issue hasn’t been solved yet. But it’s not because people aren’t trying, or that the answers don’t exist, or that there’s a conspiracy involved. We just have a limited number of flashlights and financial resources to point at these problems right now.

I am more optimistic than I’ve been in a long time, actually. The secrets we are unearthing in possible mechanisms and pathways are tremendous–if preliminary. The possibilities of getting to the roots of individual patient situations with whole genome sequencing are huge–if preliminary. Our new technologies may let us expand our range of model organisms (where the answers are more simple and fit the models better) and may offer us excellent leads that we can follow up in the notoriously outbreeding humans.

I wish it was more simple–and that MDs had factual answers and effective treatments that are as appealing (yet wrong) as the cranks offer. I wish  more was “actionable” right now. But it’s not. And it may take some time. And there are hazards still, and situations that we need to address (like privacy and misuse of genomic information). And there will be cranks–and I support more regulation to protect patients from them. But we are rowing in the right direction. That’s all we can do right now.

Video Tip of the Week: Biological Sequence Analysis I @ NIH

25 January, 2012 (10:09) | Tip of the Week | By: Trey

Well, more than a tip, a lecture. We haven’t done a tip today, we are in grant application process (time limiting) and this is an excellent video we’d like for more to see. Mary posted the first lecture, The Genomic Landscape circa 2012, in a series given at NIH. As the course description mentions, “The lectures are geared at the level of first year graduate students, are practical in nature, and are intended for a diverse audience.” Having watch the first and seeing most of this second, I’d say this is for 1st year graduate students and research scientists who are delving more into genomics and bioinformatics. You can view the course syllabus here and learn more about the course and topics.

Today, the second lecture in the series is up on GenomeTV and the lecture list entitled “Biological Sequence Analysis I”:

In this video he goes through the basics of sequence alignment, discussing such topics as similarity vs. homology, global vs. local alignments, scoring matrices, blast and blah. The introduction is pretty straightforward and excellent for those just starting out and for review for those of us who know these concepts but need refreshing.

It’s 90 minutes long, and worth every minute.

Quick Links:

Course page http://www.genome.gov/12514288

Direct link to YouTube for this lecture: http://www.youtube.com/watch?v=Ud_6VpX5AgI

Reference:

 
Green, E., Guyer, M., Green, E., Guyer, M., Manolio, T., & Peterson, J. (2011). Charting a course for genomic medicine from base pairs to bedside Nature, 470 (7333), 204-213 DOI: 10.1038/nature09764

What’s in dbSNP? MassGenomics illustrates it

24 January, 2012 (10:12) | Genomics Research, Genomics Resource News | By: Mary

One of the things we find as we do our workshops around the country is that a lot of people assume that dbSNP has just SNPs–meaning Single Nucleotide Polymorphisms = 1 nucleotide that is variant in a given spot. But in fact there’s more going on in dbSNP. This led to them changing their header this past year to indicate that:

We have also pointed out that at the UCSC Genome Browser they have indicated this for years by calling their dbSNP track “simple nucleotide polymorphisms” and some people are surprised to realize that:

So when I saw the link to this post at MassGenomics this morning via twitter (hat tip to @brent_p) I went to have a look.

The Current State of dbSNP

It’s quite a nice overview of the current breakdown of the variants in dbSNP in the 135 build. Go read. And have a look at that chart of the growth of dbSNP. Wowsa. And we’re just getting started, really….

The Genomics Landscape circa 2012 lecture by Eric Green (with video)

23 January, 2012 (10:13) | Genomics Research | By: Mary

There is a course running right now at NHGRI that is covering a timely set of genomics aspects. Called Current Topics in Genome Analysis 2012 (CTGA), it’s going to hit a great series of aspects of the current landscape–and offer a look at the future–in genomics.

On the course page you can download the slides, and watch the YouTube videos of the lecture. I had a chance to watch the first one so far, and I’ll summarize it below. But be sure to check back to the Genome.gov YouTube page or the course page to explore them all as they come along. Here’s the first one, which is important to set the frame for the series.

In the first lecture, Eric Green sets the framework for the course and gives a terrific overview of where we came from prior to having a reference sequence, where we are today, and where we need to go. Much of the same content was covered in the paper (below) so if you want to supplement your understanding of this with all the references that’s a great place to look. A number of the slides used are figures directly from that paper if you want more details and a closer look at the data.

The first part of the seminar covers the historical foundations of the field. It also does a great job of illustrating the major accomplishments since the “end” of the genome project. And he does a survey of the state of the field of genomics. He sets this up in a series of “5 steps in the path to genomic medicine” that begins at about 20 minutes in.  The first step covers important genomic elements and human evolution. He highlights projects such as ENCODE and Genome 10k among others.  Step 2 covers aspects of human genetic variation, and includes such projects as HapMap and 1000 Genomes. The next step addresses the basis for genomic diseases–and he explains GWAS and describes successes of these types of experiments. He touches briefly on the “missing heritability” and the need to have more sequences to get at these tougher problems. I liked the way he set up his pie charts that described the challenges and the reasons we haven’t had as much impact on the non-monogenic + coding sequence sorts of health issues. The next step covers the new technology (next-generation) that’s racing us through the new sequencing phase and it brings us to the final point. Step 5: Routine analysis of genome sequence. He admits this is further out at this point, and demonstrates the fire hose of data we are facing. He acknowledges that the largest bottleneck in genomics right now is this analysis step. He references other steps to come, of course, as well.

In each of these steps he talk about people who are coming in to expand on these topics in subsequent lectures in the series. Those really sound terrific, and I intend to check them out.

He summarizes this section by saying that we have tremendous amounts of data, great technologies, and incredible opportunities in front of us. And that we are well poised for a revolution in genomic medicine–but that challenges remain. The last 20 minutes or so are spent on a gaze into the future, with those opportunities and challenges. In this section he also references the fact that in sheer cell numbers you are outnumbered by your bacteria 10-1, and how important the human microbiome project is going to be.

Green frames this talk–and this series–as being heavily weighted to “optimists” for genomics. I’m certainly in that category, and I think this is going to be a valuable set of talks. And he also notes that is will be heavily weighted to human genomics and human health–but we know a lot of this technology and knowledge is penetrating other areas too.  This particular seminar sets out the foundation for the course, and is definitely worth your time. Check them all out at the GenomeTV channel as they appear.

 

Quick Links:

Course page http://www.genome.gov/12514288

Direct link to YouTube for this lecture: http://youtu.be/GLwCs370IGI

Reference:

Green, E., Guyer, M., Green, E., Guyer, M., Manolio, T., & Peterson, J. (2011). Charting a course for genomic medicine from base pairs to bedside Nature, 470 (7333), 204-213 DOI: 10.1038/nature09764

Friday SNPpets

20 January, 2012 (09:15) | SNPpets | By: Mary

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the answer? (big questions!)

19 January, 2012 (08:34) | What's the Answer? | By: Mary

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is more of a discussion, but I thought it was interesting to read what people thought in their different focus areas. And the “meaning of life” tag cracked me up:

What are the big questions that bioinformatics and computational biology will be answering in the next few years?

We all know that bioinformatics and computational biology are here to stay and that their impact on scientific thought and direction will be increasing in the future. Since there is a lot of interdisciplinary research being undertaken by folks who participate in this forum, I am curious as to what big and interesting biological problems folks think will be best solved either directly by computational approaches or in an integrated computational and bench science environment. Of course, I know that the answer to this is “everything”, but I am really curious about specific questions in your field of interest.

Sean Davis

I added what I think are some major issues and likely wins, but it was neat to see the other thoughts people had too. Go check them out, and add your own thoughts if you have some ideas.

Video tip of the week: OpenHelix App on SciVerse to Extend Research

18 January, 2012 (09:00) | General Science, Tip of the Week | By: Jennifer


We’ve all seen the discussions – on twitter, in journals, lots of places – on how to collect, store, find and use all the data that is and will be generated. Here at OpenHelix we believe that there is a gold mine of bioscience data that is being vastly underutilized, and our goal is to help make that data more accessible to researchers, clinicians, librarians, students and anyone else who is interested in science.

We go at our goal in a variety of ways, including: this blog with its weekly tips, answers and other posts; with our online tutorial materials on over 100 different biological databases and resources; and with our live trainings, many of which are sponsored by resource providers such as the UCSC Genome Browser group.

In today’s tip I will introduce you to another one of our efforts to “extend research” by showing you a glimpse of an OpenHelix app that we designed for the SciVerse platform, which Elsevier has described as an “ecosystem providing workflow solutions to improve scientist productivity and help them in their research process”. This app scans a ScienceDirect journal article for any database names or URLs that we train on, and then displays a list of such resources in the window of the app. A researcher can use this list to go from a research article to our training on how to use the resource, and to the resource itself. We believe this type of integration will help extend research by making it easier to find, access and use data associated with a paper. If you have access to articles through ScienceDirect, and you try out our app, please comment here & let us know what you think, or suggest future enhancements. Also you could consider reviewing it for the app gallery. Thanks!

Quick links:

SciVerse Hub http://www.hub.sciverse.com

SciVerse Application Gallery http://www.applications.sciverse.com

OpenHelix SciVerse App Description http://bit.ly/xtGcco

References:
Reference shown in Tip (subscription required): Mortensen, H., & Euling, S. (2011). Integrating mechanistic and polymorphism data to characterize human genetic susceptibility for environmental chemical risk assessment in the 21st century Toxicology and Applied Pharmacology DOI: 10.1016/j.taap.2011.01.015

OpenHelix Reference (free from PMC here): Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics, 11 (6), 598-609 DOI: 10.1093/bib/bbq026

SciVerse Reference (subscription required): Bengtson, J. (2011). ScienceDirect Through SciVerse: A New Way To Approach Elsevier Medical Reference Services Quarterly, 30 (1), 42-49 DOI: 10.1080/02763869.2011.541346

New ENCODE track data available

16 January, 2012 (09:16) | Genomics Resource News | By: Mary

On Friday I caught this announcement from the ENCODE mailing list. Fresh new data for your mining pleasure!

Histone Modifications by ChIP-seq from ENCODE/University of Washington

This track shows genome-wide maps of histone modifications associated
with active promoters (H3K4me3), repressed regions (H3K27me3), and
active transcription (H3K36me3) in 57 cell types, as identified by ChIP-seq.

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUwHistone
----------------------------------------------------------------------
CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB

This track displays the methylation status of specific CpG dinucleotides
in 61 cell types as identified by the Infinium Human Methylation 450
Bead Array platform.  In general, methylation of CpG sites within a
promoter causes silencing of the gene associated with that promoter.

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeHaibMethyl450
----------------------------------------------------------------------
RNA Subcellular CAGE Localization from ENCODE/RIKEN

This track from the ENCODE Transcriptome group shows 5' cap analysis
gene expression (CAGE) tags and clusters.
A total of 34 Experiments were conducted in 12 cell lines and one tissue
(prostate), with RNA extracted from 6 isolated cellular compartments and
in whole cell.

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRikenCage

----------------------------------------------------------------------
Affymetrix Exon Array from ENCODE/University of Washington (Release 2)

This track from the ENCODE Transcriptome group shows human tissue
miroarray data from the Affymetrix Human Exon 1.0 GeneChip.
Release 2 of this track shows experiments in 14 additional cell lines.

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUwAffyExonArray

If you want to learn more about how to explore the ENCODE data, check out the ENCODE tutorial materials freely available, sponsored by the ENCODE team at UCSC here: http://www.openhelix.com/ENCODE

ENCODE project page: http://genome.ucsc.edu/ENCODE/