Friday SNPpets

17 May, 2013 (07:41) | SNPpets | By: Mary

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

What’s the Answer? (unglamorous tasks)

16 May, 2013 (08:02) | What's the Answer? | By: Mary

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Heh. Ok, so it’s not all glamour–this discussion made me laugh. Posed in the forum:

Forum: What do you waste your time on

Like in everything else we do, the 80%-20% rule works great in bioinformatics. What is your 80% of the time wasted on? I can suggest: 1. converting accession numbers between databases 2. parsing output The bottom line of the post is how to avoid these time consuming tasks.

Asaf

Converting accession numbers and file formats was pretty popular. But the pointing-to-documentation one made me chuckle too. Go see the full spectrum of bioinformatics glory duties.

 

Video Tip of the Week: Influenza Research Database (IRD)

15 May, 2013 (08:33) | Tip of the Week | By: Mary

It may not be traditionally what you think of as flu season, but lately there’s been a great deal of talk about some viruses that are concerning public health officials and infectious disease specialists. You might have heard of the H7N9 situation in China, and the NCoV virus in France that made headlines.

But researchers are working all the time to understand, characterize, and evaluate viral sequences. They will access a number of different tools to do so. We talked last month about GISAID and EpiFlu as our Tip of the Week, and how the special access agreement they have developed has provided some otherwise reluctant governments to share the newest sequence data. So if you want the most current sequences–you would turn to EpiFlu.

There are other virus resources that you should investigate too. Another important site is the IRD, or Influenza Research Database. They have developed an extensive repository of many flu sequences, and have provided a wide range of tools to help researchers investigate and evaluate the data. In addition, they have incorporated some tools that provide novel analyses of the underlying data. This includes predictions of cytotoxic T-cell epitopes, and a sequence feature variant type analysis that they mention in their recent paper.

For this week’s video tip, I include their first of eight videos that will help you to understand their organization and tools. But be sure to keep going for the other 7 that they offer at their YouTube pages.

Be sure to also read the paper that they recently published–it has a nice overview of their tools, their strategies, and also provides a nice use case example of how to flow through a typical evaluation. And if you want other virus data besides flu check out the companion site Virus Pathogen Database and Analysis Resource. You’ll see a similar organization but with a wider range of sequences available. They have a separate thread of videos for the ViPR tools as well.

Quick links:

IRD, Influenza Research Database: http://www.fludb.org/

Other viruses not just flu are in the Virus Pathogen Resource (ViPR) that offers a similar structure and tools: http://viprbrc.org

References:
Squires, R., Noronha, J., Hunt, V., García-Sastre, A., Macken, C., Baumgarth, N., Suarez, D., Pickett, B., Zhang, Y., Larsen, C., Ramsey, A., Zhou, L., Zaremba, S., Kumar, S., Deitrich, J., Klem, E., & Scheuermann, R. (2012). Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance Influenza and Other Respiratory Viruses, 6 (6), 404-416 DOI: 10.1111/j.1750-2659.2011.00331.x

Pickett, B., Sadat, E., Zhang, Y., Noronha, J., Squires, R., Hunt, V., Liu, M., Kumar, S., Zaremba, S., Gu, Z., Zhou, L., Larson, C., Dietrich, J., Klem, E., & Scheuermann, R. (2011). ViPR: an open bioinformatics database and analysis resource for virology research Nucleic Acids Research, 40 (D1) DOI: 10.1093/nar/gkr859

Decoding Annie Parker: film about the BRCA hunt

10 May, 2013 (18:12) | Genomics Research | By: Mary

I didn’t know that this film was even in the works.

I know there’s controversy over the patents, but you have to acknowledge that the underlying science was really important. And I’m rather pleased to see a woman scientist in film. Looking forward to seeing it somewhere.

Here’s the film website: http://decodingannieparker.com/

++++++++++++++++

Hat tip David Bachinsky via twitter:

++++++++++++++++

EDIT: I just wanted to add some information that’s breaking now about Angelina Jolie’s recent announcement of her double mastectomy due to her BRCA1 testing. And here’s a good piece about some context for that: A Cautionary Perspective On Angelina Jolie’s Double Mastectomy

Friday SNPpets

10 May, 2013 (07:44) | SNPpets | By: Mary

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

 

What’s the Answer? (data access #fail)

9 May, 2013 (07:55) | What's the Answer? | By: Mary

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question got a lot of votes! Sadly, it’s far too common. And probably most people have run into this problem in bioinformatics and genomics at some point. Sometimes it’s data, sometimes it’s software.

Question: What if authors do not share their published data ?

Dear biostars,

I am wondering what do you do if any author do not share his/her published data with you ? Usually I send an email request after checking their supplemental and public database (ex: GEO) links or others in their paper carefully. Most of the time they do respond to my requests. However sometimes I never get any response from either first or corresponding author regrading sharing their data even after multiple requests. It is really frustrating as the guidelines of that journal (Cell, Nature) clearly state that the data should be public. Do you have any similar experience ? If you write any complaint to the editor, does it work ?

Thanx in advance
Sorry that it is not really a bioinformatics question

repinementer

Zev Kronenberg gives an excellent stepwise answer. Be sure to look at step zero! But there are a couple of other items you can read through as well. Go have a look at the answers.

But there’s no excuse for this now. With FigShare, GitHub, and other existing options for various types of things that need to be made available, it needs to improve. Reviewers need to demand this, and editors have to insist on this. And those who don’t deliver will likely be subject to some public shaming–because that’s the last step when all else fails (see Zev’s step 7).

Tip of the Week: Transfac (and HGMD, Proteome, etc)

8 May, 2013 (05:22) | Tip of the Week | By: Trey

BioBase is a provider of expert-curated biological databases. Two well known BioBase databases are TransFac and HGMD. Both have publicly available data (see previous links), but if you go to the BioBase site, you’ll find there are subscription based data access also for more feature-rich access. HGMD is the Human Gene Mutation database and “ represents an attempt to collate known (published) gene lesions responsible for human inherited disease.” TransFac on the other hand “provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.” As you can tell from a search of our blog, HGMD is often cited as a good location for human disease data, as TransFac is for TFBS.

BioBase has a series of video tutorials for both TransFac and HGMD (and more for the other databases such as Proteome, Genome Trax and ExPlain). For this weeks tip of the week, we’ve embedded two video tutorials.

This first explains MATCH, an analysis tool in TransFac to predict binding sites for Transcription Factors in a particular DNA sequence.

 

 

The second video tip is a quick tutorial on how to get started with searching HGMD

 

If you are interested in advanced searching of these two databases, or Genome Trax, Proteome or ExPlain, check out the video tutorials from BioBase.

Related Tutorials:DGV: Database of Genomic Variants, DBTSS

Friday SNPpets

3 May, 2013 (07:15) | SNPpets | By: Mary

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • RT @BioinfoTools: Some thoughts on science journal Nature offering statisticians to their authors: http://t.co/5jhJnEbGbS
  • Ha ha ha! RT @dgmacarthur: @genomesunzipped @NeuroMinded Yes: every base is sacred, every base is great. When a base gets wasted Ewan gets quite irate.
  • RT @carlzimmer: Here’s a paper title I’d like to see: “The Genome of the Silver Maple Is Like the Genomes of Every Other Frigging Maple, Except More Boring”
  • But then this: RT @VinJLynch: @carlzimmer I’d like to see “White Oak Genome Provides Unique Insights into Bourbon”…
  • RT @ensembl: Nice balanced article on the #ENCODE controversy. http://t.co/4uYc7QClO0
  • RT @phidias51: Just figured out how @utopiadocs connects you w/other researchers. Click Altmetrics, see who tweeted the article, follow them.
  • Snorf (about the 1000bullgenomes.com project) RT @phylogenomics: OMG @atulbutte we need to do all the microbiomes from these bulls – it would be the 1000bullshits project
  • RT @marc_rr: The strategies which a class of students uses to annotate bacterial genomes with #bioinformatics and #biocuration: http://t.co/9yrk7vdBIm
  • RT @phylogenomics: The Tree of Life:  The need for a phylogeny driven genomic encyclopedia of eukaryotes http://t.co/9offu8LA9M
  • RT @_inundata: Dear every journal,
    1. Open a @GitHub org account.
    2. If authors have a repo for a paper, fork it after acceptance & link it to it.

 

genome_insigts

What’s the Answer? (cancer data discrepancies)

2 May, 2013 (07:40) | What's the Answer? | By: Mary

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question highlights the cancer data issues, and since I did a cancer database as “Tip of the Week” that got some decent interest, I thought I’d keep with the theme. Maybe some researchers who are more familiar with the cancer data sets will have some insights.

Question: cBio portal vs. Oncomine: Difference in samples and expression data

I was looking for expression levels for two genes involved in serous ovarian cancer data from TCGA via CBio portal. Based on a Z-score threshold of 2.0, I found the following percentage of samples (cases) have expression levels affected (UP or Down)

Case Set: Tumors with mRNA data (Agilent microarray): All samples with mRNA expression data (489 samples) CCNE1 – 11% CDK12 – 7%

For same dataset in Oncomine (I am using the free version):

TCGA Ovarian (517 samples <- this number is higher than what is reported in TCGA ovarian cancer publication) expression data is provided as log-2 median intensity and the in Oncomine shows that higher expression level of CCNE1 and CDK12 expression level is correlated with different grades – for example Grade 3 tumor (Grade 3 (431 samples) have higher expression level of both genes.

I have also noticed that the dataset 517 samples were assigned as No Associated Paper 2011/03/24. I am wondering if the data is reffering to this paper on TCGA ovarian cancer dataset. http://www.nature.com/nature/journal/v474/n7353/full/nature10166.html

I am wondering why such a discrepancy or am I missing something here.

PS. I have posted this question on both Oncomine and cBio list, but did not receive any responses yet. I am wondering if anyone here with experience on one of the platform could provide insight to this.

The other issue that interested me was the support problem. This is a skilled super-user trying hard to do it right–contacting the support teams of the sites, and getting no response. I think that’s one of the most frustrating things about this arena. Some projects are well resourced for user support. Some are not. But if smart users can’t figure out what’s going on with your site’s data, your resource isn’t as useful as you think it is. I wish support was valued more. But if you know what’s up–go over and offer an answer.

Video Tip of the Week: My Cancer Genome

1 May, 2013 (08:17) | Genomics Research, Tip of the Week | By: Mary

computer_docThere are a lot of cancer database resources out there. Most of the ones we’ve focused on have been the data repository types. TCGA, ICGC, CaBIG, COSMIC, Cancer Genome Workbench, UCSC Cancer Genomic Browser, and of course big repositories like GEO. Researchers will need these sources of data to locate key alterations in cancer cells and tissues, and to evaluate changes with treatment conditions. But these are possibly not the most useful places for clinicians faced with a specific sample, or for patients trying to understand their situations. As more and more tumor sampling data becomes available, direct and specific access to actionable pieces of information will be crucial.

The MyCancerGenome site aims to serve that actionable end of the data spectrum. It has been developing for a while, but the recent story in the New York Times reminded me of it: Variations on a Gene, and Tools to Find Them. So for this week’s Video Tip of the Week, I bring you a look at the My Cancer Genome resources. They have a nice intro video that I will include here. It highlights features that I wouldn’t have been able to access–the part that links patient records + mutations + the curated detailed pages about the mutations and relevant studies. The public has access to that last part, but you wouldn’t be able to see the electronic health record part from the public side.

Papers are coming out that describe the deposition of information into the MyCancerGenome site. You can learn more about the philosophy and strategy about cataloging the somatic mutations that are clinically relevant in the recent paper about the DIRECT (DNA mutation Inventory to Refine and Enhance Cancer Treatment) project. A tab at that site shows you the initial data associated with that, from non-small cell lung cancer (NSCLC) mutations in the Epidermal Growth Factor Receptor (EFGR). And as more of this data comes along we’ll see it grow, of course. Seems a good step in translational medicine. So have a look at the useful and evidence-based information about specific cancer-related variations they are collecting.

Another feature is a search option to find clinical trials–by disease or by gene. I don’t think I’ve seen a gene-specific search for this kind of information before. This could be useful for people who need access to new treatment options if they have specific mutation data about their own tumors.

Have a look at My Cancer Genome, and think about where we are going with this data. I hope that the new cancer genomics data will really help drive appropriate and effective treatment strategies.

Quick link:

My Cancer Genome site: http://www.mycancergenome.org/

NYT article: Variations on a Gene, and Tools to Find Them

References:

Swanton, C. (2012). My Cancer Genome: a unified genomics and clinical trial portal The Lancet Oncology, 13 (7), 668-669 DOI: 10.1016/S1470-2045(12)70312-1

Yeh, P., Chen, H., Andrews, J., Naser, R., Pao, W., & Horn, L. (2013). DNA-Mutation Inventory to Refine and Enhance Cancer Treatment (DIRECT): A Catalog of Clinically Relevant Cancer Mutations to Enable Genome-Directed Anticancer Therapy Clinical Cancer Research, 19 (7), 1894-1901 DOI: 10.1158/1078-0432.CCR-12-1894

My Cancer Genome. 2013. http://www.mycancergenome.org (Accessed 4/30/2013).