What’s the Answer? (new Ensembl stuff)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Increasingly, the old method of email announcements of new features is going the way of the dinosaur (but not fast enough for me). Here’s an example of the way to do this kind of outreach now beyond the mailing list–used the “News” category at Biostar.

News: Ensembl 76 is out

We’re pleased to announce that the newest Ensembl release, e76, is out. The new release features:

Read more about our new release on our blog.

Emily_Ensembl

The new Ensembl release has some features you should definitely know about. I love that cell line piece–I went to check it out right away. I’d love to see more resources post their news like that. Anyway–go have a look.

Video Tip of the Week: Immune Epitope DB (IEDB)

This week’s tip was inspired by the recent NHGRI workshop of the future directions for funding and resourcing of genomics-related projects. Titled “Future Opportunities for Genome Sequencing and Beyond: A Planning Workshop for the National Human Genome Research Institute” brought together a lot of influential folks on this topic, and had them noodle on the priorities and major gaps in this arena that should get more attention going forward.

Much of the meeting was live-streamed, which was really great. You can see the video segments and sometimes the slides are available on the workshop page. One of the great things about this meeting was that there’s so much excitement about what scientists want to do, and all the terrific ideas that are out there. One of my personal favorites was the Human Cell Atlas presented by Aviv Regev. I’d love to work on that. I loved working on the Adult Mouse Anatomical Dictionary and Gene Expression Database at Jax.

But for today’s focus, I’ll turn to a totally different aspect of genomics research that intrigues me–the immune system. As an undergraduate in microbiology and immunology, the fact that microbes and their teeny genomes could wreak havoc on large mammals fascinated me (Ebola–I mean, seriously, it’s not that big). And that the hosts have developed the mix-and-match adaptable response and antibody system to do battle–clever stuff, as long as it doesn’t turn into an autoimmune situation…. But this could also be turned to good use if you want to battle cancer cells with immunotherapies. So when David Haussler’s talk brought that back around–the idea of the complexity of the immune response genomics which is not well characterized yet–I connected with that idea as well. And it struck me that I had not ever featured the Immune Epitope Database before, which Haussler had mentioned in his talk. It was also noted that this is an interesting system because it is also a hybrid of proteomics and genomics information that’s required to be wrangled. And if this is a direction that NHGRI will emphasize, it’s important to know what’s out there, and think about the ways to go forward.

So here’s Haussler’s talk to set the foundation, but there’s another video about the database I’ll point to below.

In this talk he mentioned NetMHC for peptide binding prediction as well, and ImmPort at NIAID. There was a quick mention of an unfunded prototype UCSC immunobrowser to keep an eye out for. And for the most part these resources aren’t new–you can find a number of publications that go back and describe the foundations and development over the years. And it seems to be a good solid foundation, and with appropriate support can continue to keep this important information coming.

To learn more about IEDB, you can access their documentation, which includes a whole list of video tutorials. Here I’ll highlight the intro/overview one–but there are others that offer specific guidance on other tasks. I can’t embed this one, so the link will take you over to the video at their site.

Click the image to visit the video page.

Click the image to visit the video page.

So have a look at the IEDB resources, and think about the future directions of this important aspect of genomics.

Quick links:

NHGRI workshop: http://www.genome.gov/27558042

IEDB: http://www.iedb.org/

Intro IEDB video: http://www2.immuneepitope.org/videos/site_overview.cfm

NetMHC: http://www.cbs.dtu.dk/services/NetMHC/

ImmPort: http://immport.niaid.nih.gov/

References:

Vita R., J. A. Greenbaum, H. Emami, I. Hoof, N. Salimi, R. Damle, A. Sette & B. Peters (2010). The Immune Epitope Database 2.0, Nucleic Acids Research, 38 (Database) D854-D862. DOI: http://dx.doi.org/10.1093/nar/gkp1004

Kim Y., Z. Zhu, D. Tamang, P. Wang, J. Greenbaum, C. Lundegaard, A. Sette, O. Lund, P. E. Bourne & M. Nielsen & (2012). Immune epitope database analysis resource, Nucleic Acids Research, 40 (W1) W525-W530. DOI: http://dx.doi.org/10.1093/nar/gks438

Lundegaard C. & M. Nielsen (2008). Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers, Bioinformatics, 24 (11) 1397-1398. DOI: http://dx.doi.org/10.1093/bioinformatics/btn128

Bhattacharya S., Linda Gomes, Patrick Dunn, Henry Schaefer, Joan Pontius, Patty Berger, Vince Desborough, Tom Smith, John Campbell & Elizabeth Thomson & (2014). ImmPort: disseminating data to the public for the future of immunology, Immunologic Research, 58 (2-3) 234-239. DOI: http://dx.doi.org/10.1007/s12026-014-8516-1

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

What’s the Answer? (real time collaborative coding)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlight is a new feature from the “Tools” category. I’ve talked about CodersCrowd before–I think it’s a nice teaching tool. But on Biostar the other day I noticed an announcement of a new feature that takes it even further–a way to collaborate on the code in real time.

I’m just going to copy the texty part here, go over and see the example image.

Tool: Real Time Programming for Bioinformatics (and for fun)

Hello all,

I wanted to share the latest development of CodersCrowd with fellow coders here, and let you know that I add a real time programming capability, using the same editor, which is suitable for mentoring, teaching or team oriented coding session.

[big sample image here]

All you have to do is to hit the “collaborate” button when you want to start a new live coding session, and invite your team with the link given to you.

This capability is made possible using the awesome togetherjs from Mozilla

Along with the possibility to run the code using Docker containers this bring a real fun when using CodersCrowd

As always, criticisms are more than useful so dont hesitate, and contributors you’re more than welcome

More about this here http://blog.coderscrowd.com/real-time-programming-for-bioinformatics-and-for-fun/

Rad

There aren’t any comments from folks yet, but it got good up-votes so I hope people are checking it out.

Video Tip of the Week: EpiViz Genome Browsing (and more)

This is the browser I’ve been waiting for. Stop what you are doing right now and look at EpiViz. I’ll wait.

I spend a lot of time looking at visualizations of various types of -omics data, from a number of different sources. I’ve never believed in the “one browser to rule them all” sort of thing–I think it’s important for groups to focus on special areas of data collection, curation, and visualizion. Although some parts can be reused and shared, of course, some stuff just should be viewed win certain species or strategies that don’t always end up nicely in a “track” of data that you can slap on some browser.

My dreams of this began in earnest with the Caleydo tools I’ve been talking about for a long time. Years ago I began imagining genome browser data in one panel, pathway maps in the nearby one, TF motifs, an OMIM page loaded up, and other stuff that was all part of my train-of-thought on some issue. They Caleydo team has continued on this path, and their EnRoute and Entourage tools get part of that way too. You can do some of that with the nifty BioGPS layouts. I also love the idea of looking at multiple genomic regions at the same time, in the manner that the Multi-Image Genome viewer (MIG) enables.

So we are getting closer and closer. And this EpiViz tool is an excellent demonstration of how to combine necessary genome track data visualizations and other analysis strategies into one viewer. It also allows other types of data to come in, with the Data-Driven Documents tools. You should read the paper, you should try out their software, and have a look at this overview video the EpiViz team has provided to get started.

Off we go. More like this please.

Quick links:

EpiViz Browser example: http://epiviz.cbcb.umd.edu/

EpiViz main site: http://epiviz.github.io/

References:

Chelaru F., Smith L., Goldstein N. & Bravo H.C. (2014). Epiviz: interactive visual analytics for functional genomics data., Nature methods, PMID: http://www.ncbi.nlm.nih.gov/pubmed/25086505

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (SNPs in promoters)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question is one that I hear a lot in workshops.

Question: Database for finding SNPs (Mutation-deletion/SNPs) In the promotor region of any gene

Is there any online database in which I can find the mutations in the promotor region of oncogenes. Cosmic and other databases are not good for that purpose.Any one ???

vrun.bnsl

The only answer over there has several nice options–go have a look.  We’ve talked about some of them, but I should really think about doing tips-of-the-week on at least one of them. Stay tuned.

Video Tip of the Week: Biodalliance browser with HiSeq X-Ten data

Drama surrounding the $1000 genome erupts every so often, and earlier this year when the HiSeq X Ten setup was unveiled there was a lot of chatter–and questions: Is the $1,000 genome for real? And some push-back on the cost analysis: That “$1000 genome” is going to cost you $72M. A piece that offers nice framework for the field of play is here: Welcome to the $1,000 genome: Mick Watson on Illumina and next-gen sequencing. Aside from the media flurry, though, what matters is the data. And not many people have had access to the data yet.

Via Gholson Lyon, I heard about access to some:

A set of collaborators (The Garvan Institute of Medical Research, DNAnexus and AllSeq) have provided a test data set from the X Ten. I’ll let them describe this effort:

Take advantage of this unique opportunity to explore X Ten data.

The Garvan Institute of Medical Research, DNAnexus and AllSeq have teamed up to offer the genomics community open access to the first publicly available test data sets generated using Illumina’s HiSeq X Ten, an extremely powerful sequencing platform.  Our goal is to provide sample data that will allow you to gain a deeper understanding of what this technological advancement means for your work today and in the future.

My focus won’t be this data itself–but if you are interested in many of the technical aspects of this system and their process, have a listen to this informative presentation by Warren Kaplan from Garvan:

The sample data is derived from a cell line, the GM12878 cells. These cells are from the Coriell Repository here: Catalog ID: GM12878. Conveniently, this is one of the Tier 1 cell lines from the ENCODE project too, so there is other public data out there on this cell line–which I have explored in the past and knew some things about.

There are 2 different data sets of the sequence in the download files, and one of them is available in the browser to view. I’m sure the Genoscenti will be all over the downloadable files. But because I’m always interested new visualizations, I wanted to explore the genome browser they made available. Although I had heard of Biodalliance before, we hadn’t highlighted it as a tip, so I thought that would be interesting to explore. Biodalliance is a flexible, embeddable, extensible system that’s worth a look on it’s own, besides delivering this test data. And if you come by at a later date and the X Ten data is no longer available, go over to their site for nice sample data sets. Their “getting started” page has a nice intro to the features.

In the video, I’ll just take a quick test drive around some of the visualization features with the X-Ten GM12878 data. I’ll look at a couple of sample regions, one with the SOD1 gene just to illustrate the search and the tracks. And I’ll look at a region that I knew from the previous ENCODE CNV data had a homozygous deletion to see how that looked in this data set. (If you want to look for deletions later, search for the genes OR2T10 or UGT2B17).

Note: the data is time-sensitive–apparently it’s only available until September 30 2014. So get it while it’s hot, or browse around now.

Quick Links:

Test data site: http://allseq.com/x-ten-test-data

Biodalliance browser software details: http://www.biodalliance.org/

References:

Down T.A. & T. J. P. Hubbard (2011). Dalliance: interactive genome viewing on the web, Bioinformatics, 27 (6) 889-890. DOI: http://dx.doi.org/10.1093/bioinformatics/btr020

Check Hayden E. (2014). Is the $1,000 genome for real?, Nature, DOI: http://dx.doi.org/10.1038/nature.2014.14530

Dunham I., Shelley F. Aldred, Patrick J. Collins, Carrie A. Davis, Francis Doyle, Charles B. Epstein, Seth Frietze, Jennifer Harrow, Rajinder Kaul & Jainab Khatun & (2012). An integrated encyclopedia of DNA elements in the human genome, Nature, 489 (7414) 57-74. DOI: http://dx.doi.org/10.1038/nature11247

Garvan NA12878 HiSeqX datasets by The Garvan Institute of Medical Research, DNAnexus and AllSeq is licensed under a Creative Commons Attribution 4.0 International License

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

 

What’s the Answer? (electronic lab notebooks)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question actually started on Twitter, and led me back to Biostar. I saw this question come across:

And I was interested in several of the answers. But one of the great things was the answer from Pierre–links to Biostar–with several different discussions of this.

This is a resource with history and depth! And although those answers were some time ago, they offer useful thoughts about the features to consider when making a choice. So that kind of institutional memory can be really helpful.

But I was also interested in the other answers–including DokuWiki, “universal open-source Electronic Laboratory Notebook” (referenced below), Labguru, and other people’s less formal solutions and suggestions.

Reference:

Voegele C., N. Robinot, J. McKay, P. Damiecki & L. Alteyrac (2013). A universal open-source Electronic Laboratory Notebook, Bioinformatics, 29 (13) 1710-1712. DOI: http://dx.doi.org/10.1093/bioinformatics/btt253