Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (3D structures with mutations)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question was about visualizing variations in linear graphics as well as in 3D structural representations.

Question: Map genetic mutations to protein domain/structure

I am trying to map genetic mutations to protein domain/structure. Ideally, I want to visualize the variants in linear protein domain diagram and 3D protein structure like the attached images. I did research, but I can’t find good tools/databases for such work.

I know similar questions have been asked here like How To Create Mutation Diagram In R Or In Any Tools?. But it is only the protein domain diagram (with no 3D structure), plus the protein domain annotation there seems to be limited.

Thank you all in advance!

[Graphic over there shows an image of what the original poster wants to visualize]

mittjohns

I had recently talked about Mutation Mapper as another answer to a related question. But at that time I didn’t note that you can also get a 3D structure from there. Glad to see someone mention it as a possible answer on this new question.

Video Tip of the Week: GWATCH, for flying over chromosomes

Ok, so it’s not *just* for flying over chromosomes. There’s more to it, of course. But that’s the part of GWATCH (Genome-Wide Association Tracks Chromosome Highway) that caught my attention. I’m always looking for different ideas and strategies to visualize data, and this was the first time I drove along the whole length of a human Chromosome 9 highway, seeing the various SNPs along the way.

A post on Google+ pointed me to the GWATCH paper and software, so hat tip to Taras Oleksyk. And I was pleased to see that they’ve done a video explaining their project and demonstrating the software, so that will be this week’s Tip of the Week.

It’s not the first time I’ve seen a 3D representation of SNPs. I remember seeing that from GeneSNPs in the past. But GeneSNPs visual option was a way to look at the features within a single gene–you could seen introns, exons, and choose to view SNPs by features like “non-synonymous”, and you could examine the frequency. It was an interesting way to combine a lot of data, but captured only one limited region. GWATCH goes much wider than that, letting you scan along whole chromosomes for patterns. That said–it would be very cool to have those features, and maybe a pointer to possible promoter regions, along the roadway as well. At first I didn’t notice the gene symbol track–er sidewalk?–along the edge of the view. But seemed to me you could add more sidewalk, a bike lane….Of course, then I want to add a domain bypass….Anyway–it’s got me thinking about ways to explore.

And I’ve focused on that unusual “moving browser” for this post, but there’s more to the tool that that. There are other ways to slice the data in 2D that can be helpful for your analyses. And it’s not limited to GWAS data either. But you can see more about that in both the video and it’s covered in their paper. So explore GWATCH more from their site, and you can load up their sample data and take it for a spin. You go to the site and click on the “Active Datasets” to see the ones they’ve provided. Open one, click on the “Highway Chromosome Browser” to select one. But you can also see the other types of tools they have from there.

Quick links:

GWATCH: http://gen-watch.org/ for taking it for a spin

Reference:

Svitin A., Sergey Malov, Nikolay Cherkasov, Paul Geerts, Mikhail Rotkevich, Pavel Dobrynin, Andrey Shevchenko, Li Guan, Jennifer Troyer, Sher Hendrickson & Holli Dilks & (2014). GWATCH: a web platform for automated gene association discovery analysis, GigaScience, 3 (1) 18. DOI: http://dx.doi.org/10.1186/2047-217x-3-18

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (Docker, actually…)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

In kind of an amusing pairing to this weeks Tip-of-the-Week on Docker, I was looking out for good questions/discussions at Biostar when I came across this one. The discussion opens with the idea of curating a bunch of “efficient” bioinformatics programs. This is a worthy exercise in time and resources. The conversation flows around then to include Docker, but also to note that Docker isn’t the right thing in every case potentially. So have a look  at how the idea percolated around–a good demonstration of crowdsourcing something useful for the community.

News: Pandora’s Toolbox – a collection of bioinformatics programs

Hello everyone,

I am developing a toolbox with collection of source codes of efficient bioinformatics programs.

They are available under the condition that you cite the individual authors and not Pandora’s toolbox.

My goal is to develop additional code so that we can mix and match them to solve various problems efficiently.

———————————–

github page -
https://github.com/homologus/Pandoras-Toolbox/

Blog posts -
Introducing ‘Pandora’s Toolbox’ and ‘Pandora’s Modules’
http://www.homolog.us/blogs/blog/2015/01/05/introducing-pandoras-toolbox-and-pandoras-modules/

An Update on Pandora’s Toolbox
http://www.homolog.us/blogs/blog/2015/01/08/an-update-on-pandoras-toolbox/
The following programs are currently included in the collection.

[list of stuff in there, go have a look over there for that set]

ugly.betty77

So, in short, more ideas for using Docker in the genomics software community. Jess’ sayin. And a nice coincidence for my blogging this week.

Video Tip of the Week: Genome assemblers and #Docker

Last fall there was a tip I did on Docker, which was starting to pick up a lot of chatter around the genoscenti. It was starting to look like a good solution for some of the problems of reproducibility and re-use of software in genomics–containerize it. Box it up, hand it off. There’s certainly a lot of interest and appeal in the community, but there are still some issues to resolve with rolling out Docker everywhere. However, my impression is that the Docker team and community seems interested and active in evolving the tools to be as broadly useful as possible.

So when this tweet rolled through the #bioinformatics twitter column on my Tweetdeck, I was excited to see this talk by Michael Barton (who has the best twitter handle in the field: @bioinformatics). It’s a terrific example of how Docker can be aimed at some of the problems in the bioinformatics tool space. It’s not the only option, or course. Some workflow resources like Galaxy can cover other features of genomics researchers’ needs. But as a general solution to the problems of comparing software and distributing complete working containers, Docker seems to developing into a very useful strategy.

Here’s the video:

Although this is longer than our typical “tips”, I’d recommend that you carve out some time to watch if you are new to the idea of Docker. In case you don’t have time right now for the talk, here’s a summary. For the first 10 minutes, there’s a gentle introduction for non-genomics nerds about what sequencing is like right now. Then Michael describes how the assembler literature works–with completing claims about the “better” assembler as each new paper comes along. This includes a sample of the types of problems that assemblers are trying to tackle with different strategies.

Around 14min, we begin to look at what it’s like to be the researcher who needs to access some assembler software. Then he describes how different lab groups–like remote islands–can instantly ship their sequence data around today. But that biologists are like “longshoremen for data”: they have to unload, unpack, install, try to get all the right pieces together to make it work in a new lab. We are doing “break bulk” science right now. That was a really terrific assessment of the state of play, I thought.

If you are ok with the other pieces, you can skip to around 16min, where we get to know about a specific example of the benefits of Docker for this type of research. Michael goes on to describe how Docker has helped him to build a system to catalog and evaluate various assemblers. He developed the project called nucleotid.es (pronounced just as “nucleotides”),  which he goes on to describe. It offers details about various assemblers, which have been put into containers that are easy to access and to use to compare different software. There are examples of benchmarks, but you can also use these containers for your own assembly purposes. You can explore the site for more detail and a lot of data on the assembler comparisons that they have already. A good overview of the reasons to do this can also be found in the blog post over there:  Why use containers for scientific software?

At about 25min, some of the constraints and problems they are noted. Fitting Docker into existing infrastructure, and incentivising developers to create Docker containers, can be issues.  But the outcomes–having a better strategy than traditional publication for reproducibility, having ongoing access to the software, and the “deduplication of agony” seems to be worth investigating, for sure. deduplication_of_agony Then Barton describes what the pipeline could look like for a researcher with some new sequence–you can use the data from a variety of assemblers to make decisions about how to proceed, rather than sifting through papers or just using what the lab next door did. And if you have a new assembler, you can use this setup to benchmark it as well.

So if you’ve been hearing about Docker, and have been concerned about access and reproducibility issues around genomics data and software, have a look at this video. It nicely presents the problems we face, and one possible solution, with a concrete example. There may be other useful methods as well–like offering a central portal for uses to access multiple tools, like AutoAsssemblyD has described–but that’s really for a different subset of users. But for the more general problem of software comparisons, benchmarking, and access to bioinformatics tools, Docker seems to offer a useful strategy. And I did a quick PubMed check to see if Docker is percolating through the traditional publication system yet, and found that it is. I found that ballaxy (“a Galaxy-based workflow toolkit for structural bioinformatics”) is offered as a Docker image, which means that having a grasp of Docker going forward may really be useful for software users rather quickly….

Quick links:

nucleotid.es: http://nucleotid.es

Docker: http://www.docker.com

References (and in this case the slide deck):



And other useful and related items from this post:

Automating the Selection Process for a Genome Assembler, JGI Science Highlights. October 17, 2014. http://jgi.doe.gov/automating-selection-process-genome-assembler/

Veras A., Pablo de Sá, Vasco Azevedo, Artur Silva, Rommel Ramos, Institute of Biological Sciences, Federal University Pará, Belém, Pará & Brazil (2013). AutoAssemblyD: a graphical user interface system for several genome assemblers, Bioinformation, 9 (16) 840-841. DOI: http://dx.doi.org/10.6026/97320630009840

Hildebrandt A.K.,  D. Stockel, N. M. Fischer, L. de la Garza, J. Kruger, S. Nickels, M. Rottig, C. Scharfe, M. Schumann, P. Thiel & H.-P. Lenhof & (2014). ballaxy: web services for structural bioinformatics, Bioinformatics, 31 (1) 121-122. DOI: http://dx.doi.org/10.1093/bioinformatics/btu574

How to nearly derail a woman’s career: Mary-Claire King’s BRCA1 project grant

This item was floating around the twitterz this weekend. I can’t remember who pointed it out first, but I didn’t have a chance to get to it then. I was able to look for it later, and it was worth looking for. Mary-Claire King talks about her some career struggles she faced right on the cusp of getting the grant that became the BRCA1 project. It’s a story of some personal agony, persistence, and a nearly magical assist from an unlikely source.

 

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Advice for 2015:

 

Molecular Medicine Tri-Con 2015, early registration ends soon (#TRICON)

Quick note about this upcoming conference in San Francisco, February 15-20: http://www.triconference.com/. OpenHelix will have a booth there, and I’ll post details about that later, but wanted to draw your attention right now to some of the content.

There’s a lot of stuff going on at this conference, but some particular talks of note to readers of this blog include some folks in the Informatics Channel you might want to hear from:

Genome and Transcriptome Analysis:

Integrating Transcriptome and Genome Sequencing to Understand Functional Variation in Human Genomes

Tuuli Lappalainen, Ph.D., Principal Investigator & Core Member, New York Genome Center; Assistant Professor, Systems Biology, Columbia University

Detailed characterization of cellular effects of genetic variants is essential for understanding biological processes that underlie genetic associations to disease. Integration of genome and transcriptome data has allowed us to characterize regulatory and loss-of-function genetic variants as well as imprinting both at the population and individual level, as well as their tissue-specificity and role in disease associations.

++++++++++++++

Stable Reference Structures for Human Genome Analysis

David Haussler, Ph.D., Distinguished Professor and Scientific Director, UC Santa Cruz Genomics Institute, University of California Santa Cruz

Currently there are many different ways to map individual patient DNA and call genetic variants relative to the human reference genome GRCh38, and on top of this, when an expanded version GRCh39 arrives, quite a bit of remapping and recalling turmoil will be created. I describe a new scheme being developed with assistance from the Global Alliance for Genomics and Health in which mapping to the reference genome and calling variants would become a precisely defined and relatively stable process, with a well-defined incremental update when the reference genome expands to a more comprehensive version. This will enable a better standardized and more accurate discourse about human genetic variation for science and medicine.

++++++++++++++

Accessible and Reproducible Large-Scale Analysis with Galaxy

James Taylor, Ph.D., Ralph S. O’Connor Associate Professor, Biology; Associate Professor, Computer Science, Johns Hopkins University

I will discuss the Galaxy framework for accessible genomic data analysis. I will particularly highlight new features of Galaxy which are enabling analysis at increasingly larger scales, including UI and backend improvements, as well as other recent improvements to Galaxy.

++++++++++++++

There’s a lot more going on as well, but this track seemed particularly well suited to our readers. Have a look.

Note: OpenHelix is a part of Cambridge Healthtech Institute.

What’s the Answer? (publishing tool papers)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted forum discussion was really interesting to me. In the original post, there are pros and cons of publishing software tool papers. I think all of these are useful points for discussion. But it was also interesting what others commented and replied on the topic.

Forum: On the utility of publishing a tool paper

I’ve been considering writing an application note for the pyfaidx module (for reading/writing indexed FASTA files), but I’m not sure if the effort involved in authoring and publishing an application note is worth it. Several projects have published their work as application notes, but I’m not sure that a “me too” attitude helps here.

Reasons I can think of for publishing a tool:

  1. Citations. Obviously it’s easier for people to reference your work.
  2. Content discovery. Not everyone knows what they’re searching for, and while GitHub and Google do help here, not everyone is an SEO genius.
  3. Context for usage. Several application notes I’ve seen provide use cases or examples where the tool may provide an advantage.

Downsides:

  1. Time
  2. Publication fees
  3. Danger of producing a stale description of your software. Software development should be motivated by use cases, bugs, and user feedback. All can really change the functionality and interface of software.

Any thoughts about pros/cons of tool publications would help.

Matt Shirley

Go have a look at the discussion in full. A someone who has searched for a lot of software, only to find references to internal personal scripts, broken links and outdated personal web pages in too many cases, I certainly favor publishing in some findable, archived format somewhere. But I don’t think it has to cost much–I don’t care if it was in a traditional journal format. There are ways around that now that would perfectly suffice for these types of smaller utilities or data sets.