Video Tip of the Week: yEd Graph Editor for visualizing pathways and networks

This week’s video tip of the week closes out a series that began last month. I started to explore one gene co-expression tool, which led me to another tool for visualization, and so on. This week’s tool is the final piece that you need to know about if you want to create the kind of interaction/network diagrams used in the modeling of a system that I covered last week.

The yEd Graph Editor is different than some of the tools. As a corporate product, it doesn’t have

yFiles layouts options in Cytoscape

yFiles layouts options in Cytoscape

the kind of scientific paper trail that some academic tools will. But if you search Google Scholar for “yED Graph Editor” you’ll see people from a wide range of disciplines have used it for their research projects. I first learned about yEd when I was using Cytoscape, and saw that some of the choices for layouts were based on the yEd features. This short overview video from the yWorks folks will explain what some of those layout styles are.

As you can see in this video, the use of yEd is not only for biological interactions–it can do a whole lot of graphing that is entirely unrelated to biology. But the features work for biological networks, and you can customize the graphics to represent your own topic of interest.

There are longer videos with more detail on the use cases for yEd. This one uses a sample flow chart to illustrate the basic editing features. It quickly covers many helpful aspects of establishing and editing a visualization.

You can also find videos from folks who use yEd for their projects on YouTube, some of which might be more specific for a given field of research. But these should give you the basics of why yEd can be used for the types of projects that you saw in the previous tips with Virtually Immune and BioLayoutExpress3D. And like I noted with Virtually Immune, you can get your hands on the files in the Pathway Models collection, and launch a yEd file to go into the features with a detailed example. The complexity you can generate with these models is astonishing.

There was no reference specifically for yEd that I was able to locate, but you can find that lots of people use yEd graph editor on a wide range of research topics in Google Scholar. So if you are looking to see if someone in your research area has used yEd, you may find some examples. If you are going to consider exploring the BioLayout and Virtually Immune tools, it will help to understand the framework. And also as I mentioned in Cytoscape–understanding yEd helped me to grasp the layout options there too. So try out yEd for pathway and network visualization if you have needs for those types of representations in your research and presentations. It’s free to download and use.

Quick links:

yED Graph Editor: http://www.yworks.com/yed

yEd Graph Editor Manual: http://yed.yworks.com/support/manual/index.html

References:

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1

Smoot M.E., K. Ono, J. Ruscheinski, P.-L. Wang & T. Ideker (2010). Cytoscape 2.8: new features for data integration and network visualization, Bioinformatics, 27 (3) 431-432. DOI: http://dx.doi.org/10.1093/bioinformatics/btq675

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (tidy data format)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post at Biostar is about “tidy data”. Ah, quite the concept. The day when data becomes tidy will be one to celebrate. Anyway, I think it’s a worthwhile discussion to have, and I’m looking forward to the comments as this develops. If you have thoughts, please bring them over there too.

Usually I highlight most of the question here, but this time there are pieces that are too large–examples of format issues–so I’ll just give you the bullet and send you over to Biostar to read the whole thing.

Forum: Principles of Tidy Data (Hadley Wickham) and the VCF format

Hadley Wickham, the author of ggplot and many other popular R packages, has recently published a very good paper regarding the principles of tidy data. This article introduces a new library called tidyr, and also proposes a standard for formatting and organizing data before data analysis.

I personally think that the principles proposed in the article are very good, and that they help a lot in data analysis. Some of these are already adopted by many ggplot2/plyr users, as you need a data frame in a long format in order to produce most of the plots.

My question is whether it would make sense to apply these principles to bioinformatics. In particular, if we look at the VCF format, it fails at least two of the rules mentioned in the paper:

- “3.1. Column headers are values, not variable names”  (because individuals are encoded on distinct columns)

- “3.2. Multiple variables stored in one column” (because each genotype column contains the status of one or more alleles, plus its coverage etc…

For example, let’s take the example from the 4.0 specs of VCF:

[examples here]

[More discussion of the issues within samples, so go read over there]

What do you think? Will we all convert to tidy VCF in the far future?

–Giovanni M Dall’Olio

So, tidy VCF. What do you think? Some people are already musing about it. Discuss over there.

Reference:
Wickham H.W. (2014). Tidy Data, Journal of Statistical Software, 59 (10). http://www.jstatsoft.org/v59/i10

Video Tip of the Week: “Virtually Immune” computational immune system modeling

This week’s video tip of the week is the next in a series. It began when I took a look at GeneFriends, and their option to output the data for use in BioLayout Express3D. So of course we had to then take a look at BioLayout. While I was exploring BioLayout, I came across Virtually Immune. This project contains intricate network diagrams of immune-system related responses which you can load into BioLayout and explore. It is a very neat way to get further in your understanding of BioLayout functions, as well as being an amazing example of how to model a key system for human health. Here is their video overview:

Virtually Immune is developing computational models of the behavior of immune system responses, in part to help reduce the use of animal models. As part of a CrackIT project challenge, they developed a model of Influenza A lifecycle and macrophage responses that you can explore to help understand the goals of the project. On their “about” page, the overall goal includes:

By enabling scientists to run in silico experiments we hope to help them to model infectious and inflammatory disease-associated processes and thereby accelerate the development of of therapeutic agents. In so doing we hope this resource will assist in the reduction and refinement of the use of animals in immunological research.

Their text-based tutorial walks you through the basic steps of building the kinds of models they have: read the literature, draw the pathway you want to represent, initialize the conditions, and then simulate with BioLayout3D. The last step–Verify–means you go back to the bench and see if your computational model predictions make sense. Hopefully refining your ideas computationally can streamline the work in the lab.

To get the best sense of the capabilities of this project, go to their Pathway Models page. From here you can load up any of the examples in BioLayout and look around. When you hover over a pathway a “Show Me” button will appear near the bottom, and clicking that will load up the data in a larger format that you can explore it. On the bottom of the new page, you can click the BioLayout button to visualize this in 3D.

If you aren’t researching immune system features, that’s fine. But it will still help you to understand how pathways relevant to your work could be modeled.

Quick links:

Virtually Immune: http://www.virtuallyimmune.org/

Virtually Immune tutorial: http://www.virtuallyimmune.org/tutorial/

BioLayout Express3D: http://www.biolayout.org/

Reference:

[can't find one for Virtually Immune yet; will attach one if I find it in the future]

Enright, A., & Ouzounis, C. (2001). BioLayout–an automatic graph layout algorithm for similarity visualization Bioinformatics, 17 (9), 853-854 DOI: 10.1093/bioinformatics/17.9.853

Theocharidis A., Stjin van Dongen, Anton J Enright & Tom C Freeman (2009). Network visualization and analysis of gene expression data using BioLayout Express3D, Nature Protocols, 4 (10) 1535-1550. DOI: http://dx.doi.org/10.1038/nprot.2009.177 *cough* access from their publications page…

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1

Oxford plots from the gibbon genome paper

A while back I talked about the software in the gibbon genome paper. I went through to try to pull out as much of the software as I could as sort of a catalog of a representative genome project. Of course, there was a lot in there. Some of it, though, consisted of unpublished code.

fig2_dotplotsOne of the figures I liked very much because it contained a lot of information quickly was this Figure 2 from the main paper, with the Oxford plots for comparison, and then the view of the phylogenetic tree. I mused about whether this was available somewhere, and I contacted the team to find out. Javier Herrero has been really terrific about answering my questions and getting back to me with more details. The plot code was an internal script, and the tree layout wasn’t a special tool, but just a graphical arrangement done by hand later.

So knowing my interest in this software, Javier let me know the other day that he’s put that code for the plots on Github. You can access it yourself there. Note–it requires eHive and Kent libraries. And this makes the dot plots, but you still would have to lay out the tree by hand.

But now you can plot these types of comparisons if you want to try it out.

Quick link:

Oxford plots: https://github.com/jherrero/oxford-plots

Reference:

Carbone L., R. Alan Harris, Sante Gnerre, Krishna R. Veeramah, Belen Lorente-Galdos, John Huddleston, Thomas J. Meyer, Javier Herrero, Christian Roos, Bronwen Aken & Fabio Anaclerio & al. (2014). Gibbon genome and the fast karyotype evolution of small apes, Nature, 513 (7517) 195-201. DOI: http://dx.doi.org/10.1038/nature13679

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (missing applications, revisited)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post is actually a trip down memory lane. It floated up to the top recently because the someone raised the question again:

HI all,

Since last post in this thread is almost 4 years old, I am just curious. What was already sold, what has changed and more important, Which Application Is Truly Missing In Bioinformatics today?

One of the things I see is still the need for some data format standards. Another one is related to lack of global standards how to build data analysis pipelines.

I am curious about your thoughts.

klemen

Bioinformatics moves very fast in some ways, yet in other ways the same old problems remain. It was kind of interesting to look over the things we all desired years ago, and think about where we are since then.

Original post question:

Question: Which Application Is Truly Missing In Bioinformatics?

It’s a simple & straight questions. Just think about an app that when you found it, you first thought would be – “OMG!!! That’s it” – or smth like – “I wish I could have found/written/idealized it before”. Don’t need to be a bioinformatical swiss knife or a McGuyver paper clip. Just smth that would make your life much happier/easier.

My example is quite simple. I really wish that some sort of Monte Carlo Simulator of Generic Urn Models (population genetics rlz!) just appear in the net, with a nice, clean and well documented API (written in C) and bindings for my favorite scripting languages. That’s what I really miss, right now. What’s your story?

Jarretinha

So go over and walk down memory lane. This is kind of an interesting way to have the sort of institutional memory of a specialist group to look back on, stuff that you don’t necessarily capture in the formal science routes.

Video Tip of the Week: BioLayout Express3D for network visualizations

My previous Video Tip of the Week highlighted the GeneFriends tool. With GeneFriends you can search for co-expression of genes in RNA-seq data sets. But you can take these results further and visualize them with the BioLayout Express3D tool, so I wanted to bring more details about BioLayout in this tip since we haven’t covered it before.

BioLayout isn’t a new tool, it’s been around for some time. The first published report of it appeared in 2001. Their publications page reflects their progress over the years, including a new paper recently put out for open peer review (very nifty, kudos on that). BioLayout keeps getting new features as it is under active development, and it keeps incorporating the key standards like BioPax that are important for interoperability of tools in this space. You can learn more about BioPax and related standards from the The ‘COmputational Modeling in BIology’ NEtwork (COMBINE) site.

This video tip will highlight their overview video to give you a taste of what BioLayout Express can do. But they have a page with more videos that can take you further on understanding and using the features of the software.

There’s a Nature Protocols paper that they produced a few years back that helped me to grasp what they want to accomplish and how to work with BioLayout. Although some of the details will have changed, I like these kinds of papers as a way to approach the concepts of working with the tools, so I’ve included that below as well. You can access it from their publications page.

BioLayout Express can handle very impressive numbers of data sets and the corresponding nodes and edges. Their publications page also offers a look at how some researchers have used their tool to advance their research. I like when tool providers offer these kinds of published examples, it helps to see how people really are using the tools.

Quick links:

BioLayout Express3Dhttp://www.biolayout.org/

References:

Enright, A., & Ouzounis, C. (2001). BioLayout–an automatic graph layout algorithm for similarity visualization Bioinformatics, 17 (9), 853-854 DOI: 10.1093/bioinformatics/17.9.853

Theocharidis A., Stjin van Dongen, Anton J Enright & Tom C Freeman (2009). Network visualization and analysis of gene expression data using BioLayout Express3D, Nature Protocols, 4 (10) 1535-1550. DOI: http://dx.doi.org/10.1038/nprot.2009.177 *cough* access from their publications page…

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1

NCBI to hold two-day genomics hackathon in January

Because this came to my email on the Wednesday before the holiday, it seemed to me that some people might miss it who might like to attend. So I just wanted to boost the signal a bit by re-posting it. It came from the NCBI Announcement mailing list if you want to see the whole thing, I’m excerpting just some of it here. It has an application piece, FYI.

From January 5th to 7th, NCBI will host a genomics hackathon focusing on advanced bioinformatics analysis of next generation sequencing data. This event is for students, postdocs and investigators already engaged in the use of pipelines for genomic analyses from next generation sequencing data. Working groups of 5-6 individuals will be formed for DNA-Seq/multiomics, RNA-Seq, metagenomics and Epigenomics. These groups will build pipelines to analyze large datasets within a cloud infrastructure.

Organization:
After a basic organizational session, teams will spend 2.5 days analyzing a challenging set of scientific problems related to a group of datasets. Students will analyze and combine datasets in order to work on these problems. This course will take place on the NIH main campus in Bethesda, Maryland.

Datasets:
Datasets will come from the public repositories housed at NCBI. During the course, students will have an opportunity to include other datasets and tools for analysis. Please note, if you use your own data during the course, we ask that you submit it to a public database within six months of the end of the event.

Products:
All pipelines and other scripts, software and programs generated in this course will be added to a public GitHub repository designed for that purpose. A manuscript outlining the design of the hackathon and descripting participant processes, products and scientific outcomes will be submitted to an appropriate journal.

Application:
To apply, complete the form linked below (approximately 10-15 minutes to complete). Applications are due December 1st by 5pm EST.

Participants will be selected from a pool of applicants; prior students will be given priority in the event of a tie. Accepted applicants will be notified on December 10th by 9am EST, and have until December 12th at noon to confirm their participation. Please include a monitored email address, in case there are follow-up questions.

[some stuff removed here, with requirements, pre-reqs, and some other details on the actual event stuff. See full version here.]

* Genomics hackathon application form: https://docs.google.com/forms/d/1isJT0Ns-5MHX8mH4xQnDEFbhlu4HombXspQQaADQoec/viewform

Hack away.

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…