Thanksgiving week, light posting. One holiday genome (cranberries).

cranberry_rakeThere’s nobody reading the blog this week each year, everyone is traveling or napping, at least in the US. So I’ll just bring a holiday genome I came across recently. Cranberries. This fruit is one of very few native North American fruits that are widely cultivated. I went looking to see if a genome paper was out yet, and there it was:
The American cranberry: first insights into the whole genome of a species adapted to bog habitat

As I was reading about the project, I thought I should know a bit more about specifically how they are grown. I’ve seen the flooded harvesting images, but I didn’t know what happened prior to that–the “bog habitat”. Conveniently, one of the research sites had links to some interesting videos of how cranberries are farmed. Sand–really–sand is the foundation of the fields. These dead-looking vines are laid out, and then partially buried in the sand. In a few years you will get cranberries. It’s kind of astonishing to actually see it–it looks so barren and lifeless at first.

Planting cranberries was the part new to me, and that video is posted here, but there are several more that include harvesting and shipping.

This genome project has also been added to the Sequenced Plant Genomes wiki that James Schnable maintains at CoGePedia: And it’s on the phylogenetic tree right near the blueberries (another North American native) on that page.

Cranberry genetics and genomics research site: . They also link to other groups involved in this work, but this is the one where I found the video.

Another fun fact for you to share at the dinner table: Probing Question: What is a heritage turkey?

“Some of these varieties were the progenitors of our current commercial turkeys, and they are fairly closely related to them genetically,” explains Hulet. “Today’s commercial turkeys are white because people didn’t like the little dots of pigment left on the skin after the feathers are pulled out, so breeders selected for a white-skinned turkey.” The white color is more natural for chickens, he explains, “while it’s a mutation for turkeys.”

Enjoy your mutant foods this holiday season.

Back to regular posting next week.


Polashock J., Ehud Zelzion, Diego Fajardo, Juan Zalapa, Laura Georgi, Debashish Bhattacharya & Nicholi Vorsa (2014). The American cranberry: first insights into the whole genome of a species adapted to bog habitat, BMC Plant Biology, 14 (1) 165. DOI:

Oy. I worry about this with cell line studies a lot. Mis-IDed + contaminated.

cellsVia NCBI Announce mailing list:

NCBI BioSample includes curated list of over 400 known misidentified and contaminated cell lines

The NCBI BioSample database now includes a curated list of over 400 known misidentified and contaminated cell lines. Scientists should check this list before they start working with a new cell line to see if that cell line is known to be misidentified.

Continuous cell lines are used widely in research as model systems for normal cellular processes and disease states. However, as noted by many (e.g. PubMed 23235867, 20143388, 19003294, 18072586, and 17522957), cell line cross-contamination or misidentification represents a serious and widespread problem, and researchers should take great care to check that their cell line is what they think it is. Cell lines can be easily mislabeled or become overgrown by cells derived from a different individual, tissue or species.

This problem is so common it is thought that thousands of misleading and potentially erroneous papers have been published using cell lines that are incorrectly identified (PubMed 20448633). The first step in combating this problem is to make sure your cell line is not on the list of known misidentified and cross-contaminated cell lines. Detailed information about how to test your cell lines is provided by the International Cell Line Authentication Committee.

NCBI BioSample curated list of misidentified and contaminated cell lines:[Attribute]

Articles on cell line cross-contamination and misidentification in PubMed mentioned above:

The International Cell Line Authentication Committee:

I also worry about SNV and all sorts of other issues within the cell lines. When the first data was coming out on CNVs in the ENCODE cell lines, I found duplications, and homozygous and heterozygous deletions, that would have concerned me if I was working on certain pathways. If I was still doing cell biology, I’d sequence my cell line of choice before I did another experiment with them.  Below I’ve linked to the PubMed reference they provided in the body.


American Type Culture Collection Standards Development Organization Workgroup ASN-0002. (2010). Cell line misidentification: the beginning of the end, Nature Reviews Cancer, 10 (6) 441-448. DOI:

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…



What’s The Answer? (genomics is not special, stop reinventing the wheel)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted Biostar post is one of the most interesting ones I’ve seen in a while. It started with a provocative premise, and this provoked a number of really fascinating responses and discussion. To lure you over there, here’s a tweet that captures the initial post:

(and this generated some chatter on twitter, if you follow the time stamp you can see that)

One of the response resounded across the genoscenti as well:

I think those short summaries are better than me bringing the post over here like I usually do. You should read the whole thing in situ, with the responses. So just go over from the links in the tweets, or from here.

Heh. This is what’s great about forums. This is way better than you get in the stuffy mainstream literature (with the except of Dan Graur).

Video Tip of the Week: GeneFriends

It was just a little tweet, with hardly any information about the function or purpose of the resource mentioned. But the cute name drove a lot of people to take a look at GeneFriends from our blog recently, so I figured it was worth highlighting this tool as our Video Tip of the Week.

So here’s the original tweet, hat tip to Jack Scanlan:

I admit, I looked too. I had imagined something like a personal genomics matching site, but that’s not what it is. GeneFriends is a tool that uses gene co-expression data to try to identify which genes are “friends” with other genes in networks. These can be known genes, or they can be uncharacterized genes. The current implementation is for human data.

Not a new tool, the original implementation of GeneFriends with microarray-based data sets came out some time ago. There are 3000 data sets in that part of the previous tool. But their new paper describes a different version, now done with RNA-seq data. The paper says there are over 4000 RNA-seq samples from 240 studies, via the SRA database. In the new paper they describe the criteria for selection and their strategy for calling co-expression. They state that their goal is to help unearth leads on annotation for uncharacterized genes, and this also includes non-coding RNA sequences.

GeneFriends employs a RNAseq based gene co-expression network for candidate gene prioritization, based on a seed list of genes, and for functional annotation of unknown genes in humans.

There is a short video with their foundation and philosophy about the GeneFriends tool:

Another video goes a bit further and illustrates an example of the functionality. On the site you can try this yourself with the handy “show example” buttons they have. In addition to what you’ll find at their site, they also demonstrate that you can bring your results over to the BioLayout tool to work with them further. They also note that you can upload the results into Cytoscape.

It’s pretty straightforward to use the basic features of GeneFriends, but there is additional detail on the underpinnings from their “about” page. The papers below also cover the foundations and their new directions. You should also be aware of the limitation of the RNA-seq data that they discuss in the new paper. But check it out to see if you can discover some new relationships among transcripts of interest with GeneFriends.

Quick links:

GeneFriends main page:

GeneFriends previous microarray version:

van Dam S., Rui Cordeiro, Thomas Craig, Jesse van Dam, Shona H Wood & João de Magalhães (2012). GeneFriends: An online co-expression analysis tool to identify novel gene targets for aging and complex diseases, BMC Genomics, 13 (1) 535. DOI:

van Dam S., T. Craig & J. P. de Magalhaes (2014). GeneFriends: a human RNA-seq-based gene and transcript co-expression database, Nucleic Acids Research, DOI:

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (mobile bioinformatics apps)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted discussion is about mobile apps. The original post sought some suggestions on what might be a useful mobile app. I would have to say the community seemed…er…underwhelmed with the thought of mobile apps for stuff. But that said, maybe there is a killer app out there waiting to happen. Do you have any ideas on what you’d want to see on a mobile device?

Forum: Bioinformatics Mobile App

Hi Everyone,

We are in the process of creating bioinformatics mobile applications. Rather than common app we want to give app for scholars and scientist for them to access the data wherever they and whenever they want.

Please give your suggestions and recommendations to pick the area or functionalties need to be implemetned.



I thought the discussion was interesting, even if nothing came immediately to mind. Although I recently had some fun with the PDB mobile app, it was mostly to look at cool structures while I was bored in a queue somewhere. I also know that one time at a dinner party the TimeTree app came in handy for looking for a date for a last common ancestor. But I can’t think of much heavy lifting I’d want to do on a small screen. But if you have some ideas, do share them over there.

Video Tip of the Week: UpSet about genomics Venn Diagrams?

Who can forget the Banana Venn? It was one of the most talked-about visualizations in genomics that I’m aware of.

So, yeah–#NotSureWhatItMeansButDontCare, and the extended Storify of the responses are still worth reading. It even got the wider tech media’s attention: Just look at that banana genome Venn diagram, by Cory Doctorow. I remember trying to follow the diagram for about 20 minutes before I gave up. But I still loved it for its audacious attempt to genesplain. It was impenetrable. But seriously intriguing. It was awarded the title of “best genomics Venn Diagram ever” by Jonathan Eisen.

It also spawned other examples. The loblolly pine genome folks did one of their own. Recently I actually had to look up what a jujube looked like to see if resembled the Venn they just recently delivered. Um, sorta, maybe–but I don’t know that was the goal or just a happy coincidence of a kinda oval fruit. However, I did catch a fun discussion on the actual origin of the species GO Venn, and currently the evidence points to the rat genome team, however the original published image lacks whiskers and eyes:

So as amusing as this has all been, one team took another approach to this issue. They wondered if this Venn craze was the best way to tackle this data, or if there were more effective and interactive ways to explore this sort of data. Some data set visualization tools may not be right for a task. Give me the bullet One problem is scaling Venn diagrams to capture the full range of features that that genomics folks want to illustrate. They are now prepared to UpSet the applecart. In their intro video to UpSet, they summarize with this:

I’ve talked about the terrific data visualization tools around the Caleydo project a number of times. They are developing really useful and intuitive strategies for looking at numerous types of data, and you can see our previous posts on StratomeX, LineUp, Entourage and enRoute (the combo of genomics data and pathways here is particularly nifty). They work really hard with the theories and techniques of data visualization, and implement effective ways to explore data. They recently looked across various genomics data papers to see how data sets were being used, and they attempt to encourage good behavior with the right visualizations to make the necessary points (Points of View reference below):

Understanding the tasks that the diagrams are meant to support and being aware of the data structure are required to find an appropriate representation.

They also have tried to help. UpSet, for visualization of intersecting sets, is one of their new efforts, championed by Alexander Lex, with the other team members. Looking for both effective and efficient representation of the types of data genomics researchers need, this interactive tool is a really nice way to explore which items belong in which subset. And, of course, which ones don’t.  But that’s just the beginning. With this tool you can easily spot the intersections, query for ones you are interested in, and sort in various ways. There are ways to explore the attributes and elements for the items as well. The other great thing about the Caleydo team is that they make nice intro videos–I’ll embed the overview one as this week’s video Tip of the Week, but they have a shorter basic intro one as well. In this video the examples include Simpson’s characters and movie data sets, but it will certainly allow you to quickly grasp the utility of this tool. But there’s a lot more to it as well. Read the UpSet paper linked below (and you will spot a copy of the notorious banana Venn, in fact, which inspired their thoughts on a better way to illustrate sets). It has a lot of nice guidance on set theory and will help you think about the appropriate uses of different representations.

The github pages have more help, documentation, and a link to try out an installation with your own data. I also recently had the chance to meet Alexander at a talk he gave, and I know he’s interested in knowing what other visualization challenges are problems in genomics, and would be interested in any feedback you have on the tools.

My dreams for this tool: it would be embeddable in journal articles. So I could see the data as the team presented it, but then also be able to explore the underlying stuff. And if it could be a sort of a “session” so I could snap back to the original view. And I wish I could embed an image faintly on the background….

Quick links:


Live version to kick the tires:

Caleydo tools overall project:


D’Hont A., France Denoeud, Jean-Marc Aury, Franc-Christophe Baurens, Françoise Carreel, Olivier Garsmeur, Benjamin Noel, Stéphanie Bocs, Gaëtan Droc, Mathieu Rouard & Corinne Da Silva & (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants, Nature, 488 (7410) 213-217. DOI:

Lex A., Gehlenborg N., Strobelt H., Vuillemot R.V. & Pfister H. (2014). UpSet: Visualization of Intersecting Sets, IEEE Transactions on Visualization and Computer Graphics (InfoVis ’14), DOI: TBD

Lex A. and Nils Gehlenborg (2014). Points of view: Sets and intersections, Nature Methods, 11 (8) 779-779. DOI:

Gibbs R.A., George M. Weinstock, Michael L. Metzker, Donna M. Muzny, Erica J. Sodergren, Steven Scherer, Graham Scott, David Steffen, Kim C. Worley, Paula E. Burch & Geoffrey Okwuonu & al (2004). Genome sequence of the Brown Norway rat yields insights into mammalian evolution, Nature, 428 (6982) 493-521. DOI:

Genome Editing with CRISPR-Cas9, nifty animation

I saw this come across my twitter feed the other day, and as a nice Friday afternoon diversion I posted it to Google+. I was surprised how popular it was. So I thought–hey, I have a blog too. Let’s put it there…. So grab some coffee and watch, a nice gentle way to get your Monday underway.

This animation depicts the CRISPR-Cas9 method for genome editing – a powerful new technology with many applications in biomedical research, including the potential to treat human genetic disease. Feng Zhang, a leader in the development of this technology, is a faculty member at MIT, an investigator at the McGovern Institute for Brain Research, and a core member of the Broad Institute. Further information can be found on Prof. Zhang’s website at .

Images and footage courtesy of Sputnik Animation, the Broad Institute of MIT and Harvard, Justin Knight and pond5.

The publications page at the Zhang lab has some nice examples of CRISPR, including that knockin mouse one with cancer modeling applications. I’ve been meaning to get that but don’t have a subscription to Cell, so that was handy.

Platt R., Sidi Chen, Yang Zhou, Michael J. Yim, Lukasz Swiech, Hannah R. Kempton, James E. Dahlman, Oren Parnas, Thomas M. Eisenhaure, Marko Jovanovic & Daniel B. Graham & (2014). CRISPR-Cas9 Knockin Mice for Genome Editing and Cancer Modeling, Cell, 159 (2) 440-455. DOI:

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Er, what bottle? For upcoming bioinformatics nerd holiday parties.