A History of Bioinformatics (told from the Year 2039)

A week or so back I was watching the chatter around the #ISMB / #BOSC2014 meeting, and saw a number of amusing and intriguing comments about Titus Brown’s keynote talk.

You can see a lot of chatter about it in the Storify. I was delighted to soon see this follow up tweet:

I didn’t have time to watch it right away, but when I did, I really enjoyed it. It’s worth your time if you have some interest about the directions of this field. It’s not easy to pull off a talk like you are 25 years into the future. It’s also rife with danger–as later people might use pieces of it against you. Lincoln Stein wrote an amusing follow-up to to a prediction talk he gave in 2003, entitled: Bioinformatics: Gone in 2012 (follow up piece linked below).  Or it could just end up so embarrassingly off-target that you’ll look like some of the folks that Titus highlights in the talk, whose predictions about future technologies were pretty…um…well, you’ll see. But it’s a clever way to think about the future that we want, and how the path could look to get us there.

SPOILERS: Here are some of my favorite tidbits, mostly for my own notes:

  • Bioinformatics sweatshops [I fear this too]
  • California has disappeared [egads, but...]
  • MicrosoftElsevier [snicker]
  • Universities have collapsed [hmm, not convinced on this]
  • Pioneering appointment of Phil Bourne: “NIH finally realized that training was important” [~20min; oh, please let this come true]
  • the problems of “Glam Data” [contrast to "glam journals" today]
  • in the future, because of better education, 80% of the US will accept evolution [from your lips to...wait...]
  • ~33min, interesting look at the actual outcomes of techno-progress and how they diverged from predictions; via Heinlein’s “Where To?” with 4 curves of predicted human progress (linked below). [Heh, I'm in this argument a lot, this could be handy--piece + chart linked below]
  • “I have no idea what I’m doing, but I’m trying new things.” [~38min, about forging unchartered directions in a young field]
  • At the end, ~56min: “Let the crazy people do the crazy things. See what happens.” [Testify.]

Boy, the pressure is on Phil Bourne to solve everything. This is a recurring theme at every genomics and bioinformatics event I see lately…I wish him luck sorting this out. Good news from this talk is that he seems to have done it.

And the slides are here, with Talk notes for the Bioinformatics Open Source Conference (2014) at Titus’ blog.

References:

Stein L.D. (2008). Bioinformatics: alive and kicking, Genome Biology, 9 (12) 114. DOI: http://dx.doi.org/10.1186/gb-2008-9-12-114

Heinlein R. (1952). Where to?, Galaxy Magazine, February 13-22. ["Your personal telephone will be small enough to carry in your handbag." Well, he nailed that one.]

{sorry,  had to republish to get it in to the ResearchBlogging queue. RB was down yesterday.}

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Heh:

 

What’s The Answer? (data sharing with Bittorrent)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted Biostar item is a new feature–and they are looking for your input and testing if it is a feature you might use.

Forum: Data sharing via Bittorrent is coming to Biostar

Hello Everyone,

We are adding bittorrent data sharing to Biostars.  Help us identify bugs and issues by creating a few torrents and adding them to posts on the test site. Also feel free to comment and provide suggestions and feedback. The description of how it works is at:

http://test.biostars.org/info/data/

An example post with data can be seen at:

http://test.biostars.org/p/101/

A few details on how it works:

  1. Torrents can get attached to posts, answers or comments
  2. A post may have multiple torrents attached.
  3. Biostars will attempt to connect the IP number of the Bittorrent peer connection to the IP number of the Biostar user account. This allows you to see who the person that shares the data is.
  4. Anonymous users cannot create torrents but they may share existing datasets.
  5. Data may be shared without making it visible on Biostar (although this should not be considered a secure way to share data)

(note: the test site will not log you into your old account since the emails are protected so don’t report that as an issue)

Istvan Albert

Although it seems to be well received, people have issues with some institutions that don’t allow Bittorrent access due to some past bad behaviors…so people have raised that issue. So if you want to try it out, or have concerns, let ‘em know over there.

Video Tip of the Week: VectorBase, for invertebrate vectors of human pathogens

I wish I had been clever enough to coordinate this week’s Video Tip of the Week with “Mosquito Week” a couple of months back. There was a bunch of chatter at that time about this infographic that was released by Bill Gates, which illustrated the contribution of various human-killing species. The mosquito was deemed: The Deadliest Animal in the World. Jonathan Eisen took issue with the numbers, however, noting that if you are consistent about the way you count disease vectors, humans come out on top (or, bottom, I guess, in this category). Still, Eisen noted, mosquitoes are important and demand attention. But there are lots of other vectors to keep in mind as well.

Luckily, the team at VectorBase is on it. VectorBase has been providing information on invertebrate vectors of human pathogens for a long time. They collect a variety of species data, including mosquitoes, but also a lot more–ticks, lice, flies, etc. Check out their list of organisms here: https://www.vectorbase.org/organisms . They have information not only on basic biology, but also information about the very key problems of resistance to insecticides as well.

We’ve been fans of VectorBase for years, and have highlighted them in the past, after a site redesign a couple of years ago, and a few other times with various other news tidbits. But I was delighted to discover recently that they have a new overview video which is my favorite kind to highlight in these tips. If you are new to a resource, a brief overview is the most helpful way to understand the kinds of data and tools you’ll see at their site. They have a lot of other slide/PDF tutorials as well, which focus on specific tools and features that will supplement an overview. But in our experience, a video overview is a bit more tempting when you are first becoming acquainted with a resource.

So here I’ve embedded the VectorBase overview, which you can also find here: https://www.vectorbase.org/tutorials/tour. The slides to accompany it are also available there.

So have a look at VectorBase’s important collection of species data and tools. You can also read more about their foundations and directions in their publications, including the one below. I keep up with news about their new features from their newsletter, but you can also see other types of community outreach strategies over at their site.

Quick link:

VectorBase: www.vectorbase.org

Reference:

Megy K., D. Lawson, D. Campbell, E. Dialynas, D. S. T. Hughes, G. Koscielny, C. Louis, R. M. MacCallum, S. N. Redmond & A. Sheehan & (2012). VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics, Nucleic Acids Research, 40 (D1) D729-D734. DOI: http://dx.doi.org/10.1093/nar/gkr1089

Bonus video: The Gates blog hosted this highly-produced video about mosquito bites and their impact.

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

This is AWESOME:

(you can see the Hi-res version here: http://helikoid.si/ismb14/zitnik-zupan-ismb14.png )

What’s the Answer? (non-PhD bioinformatics job skills)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post was popular, and offers some chatter on the state of the field with regard to employment opportunities. And this is the kind of question that it’s hard to get answers out of the literature for.

Question: I’m applying for a non-PhD bioinformatics position in your lab. What do you look for?

I’ve been lurking here for years and I’d like to cover a topic that isn’t covered that much.

Bioinformatics is a tough field to not have a PhD. Nonetheless, research positions do exist where only a bachelors is required and research experience is also stated as between 0-2 years. I’d like to give a hypothetical situation that describes a good percent of such applicants to these positions. The motivation here is to survey what are ultimately core requirements for these positions and what is maybe considered “bells and whistles”.

I’m fresh out of college and I have a BS and/or Masters in Bioinformatics along with ~two years research in a lab. I’m applying to your lab, what are you looking for? And what requirement(s) can you excuse or not weight that heavily?

Edit. Sort of a related question, is requiring knowing hadoop and also the biochemistry/biophysics behind RNA-seq at the same time an outrageous expectation for a non-Phd?

scical

Everyone has been following the drama (and the graphs) about how many PhDs vs how many academic jobs there are. Certainly not everyone needs to have a PhD, and this seems a valid and useful question. It got some thoughtful answers from potential employers too. Check out the discussion.

Video Tip of the Week: Google Genomics, API and GAbrowse

This week’s video tip comes to us from Google–it’s about their participation in the “Global Alliance for Genomics and Health” coalition. Global Alliance is aimed at developing genomic data standards for interoperability, and they’ve been working on creating the framework (some background links below in the references will provide further details). It has over 170 members, and one of these members is Google. Although Google talked about this earlier this year when they joined this group, more recently pieces have begun to emerge about the directions and specific tools. Google’s efforts made the mainstream news recently in their announcement about working on a project to examine genomic data associated with autism.

Although this video doesn’t talk about a single specific tool like we usually cover, it provides more detail about this framework for building tools which is important to understand. And in this video I learned about a new browser developed under this project that I did have a quick look at, and I’ll add below.

They browser that they reference is called GAbrowse–I assume that means Global Alliance browse–but there’s not a lot of detail. Their “about” dialog box says this:

GABrowse is a sample application designed to demonstrate the capabilities of the GA4GH API v0.1.

Currently, you can view data from Google, NCBI and EBI.

  • Use the button on the left to select a Readset or Callset.
  • Once loaded, choose a chromosome and zoom or drag the main graph to explore Read data.
  • Individual bases will appear once you zoom in far enough.

The code for this application is in GitHub and is a work in progress. Patches welcome!

I kicked the tires a bit, but it’s clearly not fully fleshed out at this point. When I tried to zoom up from the nucleotide level it went up a bit, but eventually you hit a point that says “This zoom level is coming soon!” So certainly there’s more to come, and a lot more functionality that would be necessary. But it’s early. And it’s just a demo. I have no idea if it’s intended to become a stand-alone public browser.

So if you are interested in issue of cross-compatibility of human genomic data (and as far as I can tell this is all human-centric, I’d like to see a wider conversation on this), it’s probably worth knowing what Google is offering here. You should also be aware of what the Global Alliance is working on. Below I’ve added some of the publications and media I’ve seen about their efforts.

Hat tip to Can Holyavkin on Google+ for the link to the video.  https://plus.google.com/u/0/114690993717100405711/posts/gwNy5E7E6Vb?cfem=1

Quick links:

Global Alliance for Genomics and Health: http://genomicsandhealth.org/

Google genomics: https://developers.google.com/genomics/

GAbrowse: http://gabrowse.appspot.com

Reference:
(2013). Global Alliance to Create Standards For Sharing Genomic Data, American Journal of Medical Genetics Part A, 161 (9) xi-xi. DOI: http://dx.doi.org/10.1002/ajmg.a.36168

Callaway E. (2014). Global genomic data-sharing effort kicks off, Nature, DOI: http://dx.doi.org/10.1038/nature.2014.14826

White paper 2013: http://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf

Framework for Responsible Sharing of Genomic and Health-Related Data – DRAFT # 7 http://genomicsandhealth.org/our-work/work-products/framework-responsible-sharing-genomic-and-health-related-data-draft-7

Terry S.F. (2014). The Global Alliance for Genomics , Genetic Testing and Molecular Biomarkers, 18 (6) 375-376. DOI: http://dx.doi.org/10.1089/gtmb.2014.1555 [available here from GA: http://genomicsandhealth.org/files/public/gtmb%252E2014%252E1555%5B2%5D.pdf]

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (mutation nomenclature)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

When I touched on the variation tools at NCBI for this week’s tip, I didn’t go into detail on how the specific variations are designated. But I happened to be looking through the Biostar questions for this week’s highlight, and noticed that someone was not familiar with how the ClinVar mutations are denoted. So I thought maybe others would find that useful information as well.

Question: ClinVar Mutation representations and Descriptions

I was looking into ClinVar data for getting mutation lists. There were mutations which were in the form GENE:c.*** representing they are CDS mutations and GENE:p.*** representing the amino acid changes.

What are those in the following forms represent?

  1. m.***
  2. GENE:n.***
  3. GENE:g.***
  4. nsv***

Example:

TBC1D24:c.1143-6C>T – CDS mutation

NP_002760.1:p.Cys139Ser –  Protein mutation

m.1606G>A ??

U43746.1:n.2241A>G ??

NC_000023.11:g.53254331_53296102dup41772 ??

nsv513787 ??

vigprasud

Have a look at the answers at Biostar. Zhaorong’s answer is correct. This nomenclature is certainly a bit cryptic if you aren’t familiar with the Human Genome Variation Society (HGVS) system. It’s worth looking into the background and framework for this if this is data you are likely to be working with. The history of this strategy goes back quite a ways as you can see from their publication list. But below I’ll add a reference that I think helps to understand the structure if you are new to it.

For even more help in understanding why getting nomenclature right is so crucial–check out the paper below that came out recently, on naming just the TP53 variations . This is a gene that has clinical relevance–and if you are aiming treatments at mutated TP53 you have to be sure you are getting the right one. It’s not just a trivial nuisance to understand how to define mutations–it can matter at the clinic and this will only become increasingly important as we get sequence from more tumors and other clinical situations. And I think this paper makes the point about the complexity and the needs for standardization.

References:
Laros J.F.J., Johan T den Dunnen & Peter E M Taschner (2011). A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form, BMC Bioinformatics, 12 (Suppl 4) S5. DOI: http://dx.doi.org/10.1186/1471-2105-12-s4-s5

Soussi T. & Peter E.M. Taschner (2014). Recommendations for Analyzing and Reporting TP53 Gene Variants in the High-Throughput Sequencing Era , Human Mutation, 35 (6) 766-778. DOI: http://dx.doi.org/10.1002/humu.22561