Tag: ENCODE

New viewing options in the UCSC Genome Browser

30 August, 2010 (09:25) | Genomics Research, Genomics Resource News | By: Mary

Last week there were a couple of announcements from the UCSC Browser team that I wanted to talk about. Both of them affect the visualizations you can do in the Browser.

New Drag and Reorder Functionality Released

....It is now possible to rearrange the order of tracks within the browser image.
To reorder tracks, click-and-hold the side label or gray mini-button of a single
track and drag the highlighted track to a new position within the image....

I’m going to do a quick movie of what that means–it has no audio just to keep it smaller and quick. But this will allow you to move the tracks you want closer together without going to the configuration page to do it now. It also means you may have to use the Default Tracks and Default Order buttons to go back to what the original views are. Keep that in mind if someone else shares a computer with you.

I want to mention a couple of glitches in this, though: some people have reported that in their stored sessions and custom tracks that the orders are being altered. And when you upload tracks now you don’t have the same configuration options, so you’ll notice that on your saved upload items.  Here’s a word from the team from the discussion mailing list:

We have noted your bug that some sessions that had tracks that were re-ordered using the old paradigm are now out of order. We are currently testing a fix and hope to have it out in a week or so.

New ENCODE Integrated Regulation track released

There are huge challenges in visualizing the wealth of ENCODE data that’s now coming out, and UCSC is actively developing new strategies and methods to manage the visualization needs. They have now added a new “super-track” to visualize some  of the data. I can’t link to this email as it is only for registered mailing list members, but here’s a piece of it. The whole thing is in the “News”  item on the UCSC home page right now http://genome.ucsc.edu/

The ENCODE Data Coordination Center at UCSC is pleased to announce the release
of the ENCODE Integrated Regulation super-track, a collection of regulatory
tracks containing state-of-the-art information about the mechanisms that turn
genes on and off at the transcription level.

Individual tracks within the set show enrichment of histone modifications
suggestive of enhancer and promoter activity, DNAse clusters indicating
open chromatin, regions of transcription factor binding, and transcription
levels. When viewed in combination, the complementary nature of the data
within these tracks has the potential to greatly facilitate our understanding
of regulatory DNA. (To view a browser session showing the ENCODE Integrated
Regulation super-track, see

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Example1&hgS_otherUserSessionName=hg18EncReg.

I know these came out last week, but I like to take a little while to explore the new features and get used to them before I speak to them–especially when they are significant alterations of the functionality. And it gives a bit of time to see if there are issues developing around the use of them.

We will update our training materials with these new features soon.  For the freely available materials sponsored by UCSC start here: http://www.openhelix.com/ucsc

Ok, really, I’m going to blog again…

19 July, 2010 (15:07) | General Science | By: Mary

Sorry for the sparseness of late. We were all  over the place doing UCSC Genome Browser (we do intro + advanced), ENCODE, and Galaxy workshops.  At NIH we also did IMG and VISTA (Man, that security at NIH is fierce….).  Trey is still on the road, in fact, doing the training in Morocco.

Ok, you couldn’t be there…but all of those trainings are available on our web site right now, except for ENCODE. The same material we do in the online materials is what we do in workshops. The only one of those that requires subscription is IMG. And you won’t find ENCODE as a stand-alone tutorial yet–but that’s coming. We now have sent the script to the studio and we’ll be assembling that soon.

I do want to mention one thing that we think is interesting, and we see in almost every training we do. Nearly every time, more than half of the attendees at our trainings are female. Based on what you read about women falling out of the pipeline in science, you’d think there would be no way we’d even get 50%. But generally it is more than half women in these trainings. (We have the data if anyone can think of a way we can use that to get a grant :) )

Our current theory is that women are more likely to admit they could use the training (something like asking for directions…you know…?).  Or do men prefer documentation? We don’t know. What’s your theory?

Friday SNPpets

26 March, 2010 (08:00) | SNPpets | By: Trey

We are going to try a new Friday feature. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. We are going to start posting them in a Friday feature of a list of snippets of information we call “SNPpets” :D (cute, huh?). So without further ado…

Quick Reference Cards for teaching and outreach

23 March, 2010 (11:25) | Genomics Research | By: Mary

We know there are a number of different ways that scientists and students become familiar with genomics software.   Some of it comes from the traditional publication routes–like the very handy NAR Database issue.  Or like the Current Protocols papers we’ve done recently.  We have these online tutorials that people use in various ways: some teach themselves by watching the video and working the exercises, some download the matching slide sets and run local workshops (our catalog: some are free/sponsored and green icons indicate that; red indicates subscription required). Librarians are using them to become “embedded” in courses in some cases.

A less-well-known type of material we have is the Quick Reference Card.  These are printed cards with URLs, hints, tips, definitions, shortcuts–for stuff that you may want a quick reminder of: where a feature is located, or how to use it.  People who run the local workshops will sometimes write to us to get a set for their courses.  They are great to give out at conferences to raise awareness of the software.

We have these cards for several resources that we also have free sponsored training videos + slides + exercises with: UCSC Genome Browser (2 cards–intro and table browser); Galaxy, and our newest: RCSB PDB and SGKB.  You can go to this form and order them, and we’ll send them out.

I bring this up today because we just received word from Ensembl that they have created a card that we can distribute as a PDF.  You can print it up and put it on the wall near the computer as a handy reminder of some features and tools at Ensembl.  Click the image to download the PDF, or go directly to the link below.

Summary:

Order OpenHelix printed cards for resources: http://www.openhelix.com/cgi/qrcOrder.cgi

Ensembl PDF card download: Ensembl_card_march2010.pdf

Tip of the Week: Year of Tips, part deux

30 December, 2009 (07:47) | Tip of the Week | By: Mary

As Trey posted last week in part I, we’ve been doing tips-of-the-week for two years now. We have completed over 100 little tidbit introductions to various resources*.  At the end of the year we are doing a summary post to collect them all.  If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

I’ve got the second half of this year to summarize–July through December 2009. Check them out!

July 2009

7/1  UCSC wiki annotations UCSC has created a way for anyone to add annotations to their genes of interest using a wiki.

7/8  CellMiner from NCI A relational database for cancer cell line data.

7/15  ENCODE data at UCSC The Data Coordination Center, or DCC, for the human ENCODE data is introduced, with guidance on how to access ENCODE data in the UCSC Genome Browser.

7/22  It’s a duplicate Check out Deja Vu, a program to assess scientific abstracts for similarity, including duplication or possible plagiarism events.

7/29 VirusMINT The Molecular Interaction Database that we love added a component for Virus protein interactions.

August 2009

8/5  Genomic Encyclopedia of Bacteria & Archaea (GEBA) This piece on the strategy of choosing new genomes to sequence was introduced in August.  If this is of interest to you, also check out the recent publication about this project that we talked about here.

8/12  NCBI’s New BioSystems Resource Wherein NCBI takes on the storage and representation of biological network data.

8/19  PLAN2L for Arabidopsis literature A helpful tool for literature searching for Arabidopsis.

8/26  Acytelome, String and a new database An introduction to the Phosida database, for phosphorylation and acetylation information.

September 2009

9/2 A mouse for all reasons Learn about the knockoutmouse.org site, which provides information and reagents for generating knockout mice.

9/9  TARGeT A resource for identifying transposable elements and genes relationships from sequence submissions.

9/16  The National Center for Biomedical Ontology This central repository for Ontologies can be really helpful for bioinformatics software project developers.

9/23  JBrowse, a game changer? A look at a new strategy in genome browsing software.

9/30  Finding the right genomics resource This is a quick introduction to OpenHelix’s own new interface for searching for training movies and materials!

October 2009

10/7  NCBI Makeover! The overhaul of the NCBI interface–with a walk through memory lane at some of it’s previous incarnations–is provided.

10/14  Getting flanking sequence A quick look at how to use Galaxy to obtain adjacent sequences, which is a question we are asked frequently in training situations.

10/21  SwissVar, a New Genotype-phenotype Resource from SIB Explore ‘a portal to Swiss-Prot diseases and variants.’

10/28  Sol Genomics Network Do you like tomatoes, eggplants, potatoes, peppers, and other members of the Solanaceae? If so, check out this resource.

November 2009

11/4  CHOP CNV database If you are curious about copy-number variations, be sure to explore the collection at CHOP.

11/11  GeVo and Genome Comparison A nifty tool for genomic comparisons.

11/18  FABLE, text mining for literature on human genes A literature mining resource to improve you searching.

11/25  Got tips for us? We opened the floor for suggestions of tools to look at as we celebrated the Thanksgiving holiday.  We’ll still take suggestions–we love to explore new resources!

December 2009

12/2  RCSB PDB Comparison Tool Compare sequences for structural similarities with this handy widget.

12/9  GRAIL for prioritizing SNPs Use a list of SNPs to identify associated genes, and then sweep the literature for leads on the processes that might be involved.

12/16  GenomePad Check out this very cool iPhone app for exploring the UCSC Genome Browser.

12/23  Tip of the Week: Year (2nd!) of Tips for the first half of the year’s summary of tips.

*for the vast majority of resources we introduce in our tips, we have no financial relationship with the provider or developer. The ones we do are listed here.

One year of ENCODE data

1 December, 2009 (11:15) | Genomics Research, Genomics Resource News | By: Mary

encode_logo

We’ve talked about the ENCODE data before, and you can see a number of entries about the project with the ENCODE tag.  But last week I came across the ENCODE paper in the Nucleic Acids Research Advanced Access collection, so it seemed like a good time to review some of the information about this project.

ENCODE stands for ENCyclopedia Of DNA Elements.  It is one of the big data projects wrangled by NHGRI.  There was a pilot phase project to explore the utility and methods of assessing in extensive detail 1% of the genome–looking beyond the known and predicted genes at many more aspects of the genome.  After the results of the pilot phase were in, the project was examined again, certain choices were made on how to proceed, and the scale-up or production phase ensued.

The paper from the UCSC team describes the framework for the scale-up phase, starting with a focus on the choices that were made for cell types and data types that are used for the ongoing work. Table 1 is a nice summary overview of that to give you a sense of the scope.

They go on to describe some of the issues around housing and displaying the data from these projects.  UCSC is the DCC, or Data Coordination Center, for the data.  It often required new strategies to display the different cell types and data sets. One point they mention is that the methodology for several aspects of the project changed after the pilot is that there was much more next-gen sequencing short-read type of data coming out of the scale-up.  What this might mean for you even if you don’t care about human data or this project specifically: if you are trying to figure out nice ways to display your next-gen data you may find nice examples of strategies in this collection.  As we’ve done training on the UCSC Genome Browser and ENCODE we found people were certainly interested in the data from that perspective.

The 3 main ways to interact with the data are provided next: the regular browser, the Table Browser, and downloading every bit of it, if you like.  A major difference in the regular browser from the pilot phase is that since now the data is genome wide, the ENCODE tracks can be integrated fully with all the other data as any other track.  Since it isn’t set off as a special project with limited coverage, you now will find ENCODE tracks in the track sections where they would be expected to be found–such as regulation, or expression, depending on the data type.  The pilot ones were in separate ENCODE track group areas.  Now you just have to look for the ENCODE icon next to the tracks to know they are part of this project.

They also stress the Data Use Policy, which includes free access to the data but under the Fort Lauderdale sort of embargo strategy.  If you are going to use the data (and they want you to make discoveries, so please do) just keep an eye on the time stamp of the embargo and properly cite those sources.  There’s more detail on that on the Data Policy page.

The paper also references the OpenHelix tutorials on the UCSC Genome Browser and ENCODE data.  UCSC sponsors us to provide the training freely, and you can access three tutorials on our site:

  • Introduction, for an overview of how the main browser works, with display features and definitions for menus and such.
  • Additional Tools, this has tools associated with the UCSC Browser and this is where you’ll find the ENCODE section.   Or you can view the ENCODE section separately here in a previous post about it (and I added it below again too).  It covers much of the same material that the paper does and should supplement your reading nicely.

You can download the slides and use them in your own talks, use the exercises for students or workshops, or just point folks to the materials if you like.

One other note: there is a separate DCC for the modENCODE project with Drosophila and C. elegans, and we touch on that in a post here.

Stand-alone ENCODE tutorial section: http://www.openhelix.com/downloads/jing/encode/encode_movie.html

encode_movie

Rosenbloom, K., Dreszer, T., Pheasant, M., Barber, G., Meyer, L., Pohl, A., Raney, B., Wang, T., Hinrichs, A., Zweig, A., Fujita, P., Learned, K., Rhead, B., Smith, K., Kuhn, R., Karolchik, D., Haussler, D., & Kent, W. (2009). ENCODE whole-genome data in the UCSC Genome Browser Nucleic Acids Research DOI: 10.1093/nar/gkp961

Busting an Embargo

5 October, 2009 (10:56) | General Science, Genomics Research | By: Mary

Not me–and not one of the press embargoes.  I’m talking about a data embargo.   While on the way to a workshop this week I was reading my paper issue of Science on the flight.  And I was intrigued by the story of what happened when a data embargo was broken.  The story is: Paper Retracted Following Genome Data Breach, and it is the story of data from dbGaP being published before the authors were permitted to publish on it.

The scientist who helped to develop our dbGaP tutorial had alerted me to this story (hat tip to Cyndy :) ), because she knew how the dbGaP data access system worked.  In fact, let me quote part of our tutorial that explains it very clearly on slide 12 :

Next is the linked study title, followed by the Embargo Release date for each study. Investigators contributing data to dbGaP may retain the exclusive right to publish analyses of their datasets for a defined period of time. Prior to the Embargo Release date, other investigators may be granted access to download and analyze data, but they may not seek publication of their results until after this time.

There’s a great and risky feature of these large-scale data projects.  Investigators are asked by the NIH data sharing rules to submit data to the appropriate repository even before they’ve had a chance to publish on it.  The risk is people will scoop the submitters.  And that’s apparently what happened in this case.

We’ve also spoken to data embargo issues in the context of the ENCODE project.  In fact, one segment of our tutorial on ENCODE covers that issue.  As more and more “big data” projects roll out in this manner, there’s likely to be more of these issues cropping up.  I think PNAS had a good idea–adding an item to their author checklist that specifies whether data is under embargo rules.  (Oh, and they retracted the paper and you can see the stub here.) But I think it’s also up to the projects and databases to explain the data embargoes clearly.  The people associated with the big data projects understand the rules, but I don’t know that it has percolated through the scientific end-user community fully.   We’re trying to help get the word out with ENCODE and dbGaP in our training materials, but I know the process varies by project.  I think this episode offers a nice “teachable moment” for this.  I’ll be referring to it in future workshops, for sure.

So keep an eye out for this as you use “big data” resources.  But use them–don’t let this dissuade you. Just keep an eye on the calendar.

dbGaP: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap

UCSC Genome Browser (with ENCODE data): http://genome.ucsc.edu/ and http://genome.ucsc.edu/ENCODE/

Next-gen sequencing, with cartoons!

6 August, 2009 (11:16) | Genomics Research | By: Mary

Mike the Mad Biologist points to a nice article that describes aspects of the next-generation sequencing technologies with some helpful animations to illustrate the different styles. Mike goes on to describe that the sequencing itself isn’t the rate limiting step–the assembly and analysis steps are the hurdles really.

The dust certainly hasn’t settled on the strategies for that at this time–and as Mike describes the challenges may vary by species, but we are keeping an eye on some of the software that is being used (see Next-gen sequencing issue in Bioinformatics and Curious about short read sequencing? among others here).

This data is turning up in databases now (see this ENCODE data at the UCSC Genome Browser as just one example), and will continue to flood in at dramatic rates.  And the same technologies are being used for analysis of other aspects of biology (not just sequencing new species and individuals)–such as promoter binding or nucleosome positioning or RNA protein binding.  So it is worth taking a look at the underlying technology to understand what’s being sequenced.

Mike’s post: The Future of Bacterial Genomics: It’s Not the Sequencing, It’s the…

and the Wellcome Trust article he describes is: Genomics – the next generation

Tip of the Week: ENCODE data at UCSC

15 July, 2009 (08:00) | Genomics News, Genomics Research, Genomics Resource News, Tip of the Week | By: Mary

encode_standaloneENCODE stand for Encyclopedia of DNA Elements.  This is a major project to examine the sequence elements of the human genome in extensive detail, involving many groups in the US and beyond,  coordinated and sponsored by the NHGRI.  Last year as this was brewing I did a series of “Tips of the Week” about the ENCODE data that included the pilot project and the production phase data that is being wrangled by the folks at the UCSC Genome Browser and displayed in that framework.  Well, now that data is rolling in, and and you have access to the pre-publication data being generated. In this Tip of the Week we offer background on the project and look specifically at some of that data.

We have created a more formal presentation for this which you can watch here as a stand-alone movie.  It is longer than we usually have as a “tip”, but we think it is important to understand the background and foreground to understand how to get the most out of the data.

Congratulations to the ENCODE DCC team at UCSC that is led by Kate Rosenbloom for making this data available.  And thanks to the ENCODE project teams for generating this data.

This material is also embedded in our complete package of UCSC materials that are freely available here.  If you aren’t already familiar with the UCSC Genome Browser you may want to explore that first.  The ENCODE piece is a segment of the “Additional Tools” tutorial.  All of our slides are there and you can download and use the slides to teach students.  There is also an exercise that walks you through a sample of the data in the exercise download.

As more pieces of the project come along we expect to enhance this movie and it will touch on many of the data types available as part of this large project.  We’ll add additional exercises.  We hope you can make use of the hot-off-the-projects (before the presses!) data that can help your research.

ENCODE at NHGRI: http://www.genome.gov/10005107

ENCODE DCC site: http://genome.ucsc.edu/ENCODE/

Complete training suite on UCSC Genome Browser (ENCODE portion is in the Associated Tools section at this time): http://www.openhelix.com/cgi/tutorialInfo.cgi?id=76

ENCODE data at UCSC stand-alone movie direct link: http://www.openhelix.com/downloads/jing/encode/encode_movie.html

ENCODE wants your input on the data release policy

24 September, 2008 (01:01) | Genomics News, Genomics Research, Genomics Resource News, Tip of the Week | By: Mary

enc_data_release.jpgThis week’s Tip of the Week is a bit different than some of the others that I have done in the past. I’m going to take you through parts of a document–the newly released draft of the Data Release Policy for ENCODE (go over to this page at NHGRI and get a copy of the document). I know–you expect software from us. But I will also show you a bit of software at the end, if you can stick with me for that. OK?

We’ve been talking about the ENCODE projects about once a month lately. We are hoping to raise awareness and understanding about the framework, foundations, and goals for ENCODE. That’s because a TON of genome-wide data is going to be collected and offered to researchers worldwide as this project progresses. And as we proceed I’ll be showing you how to access that data in the UCSC Genome Browser, since UCSC is the DCC (or data coordination center) to wrangle the human data around ENCODE.

encode_logo.gifHowever, if you are going to use ENCODE data, you need to know about the guidelines for using that data. That’s what I’ll cover today. And I’ll also give you a peek at some of the first data to come through the process at UCSC on the test server*. It is a sample of ChIP-Seq data from HudsonAlpha that I’ll use as an example.

In short, this data policy tries to balance the needs of the users of this publicly-funded data with those of the scientists who are generating this data. They are proposing a 9-month non-scoop window: the providers will release the data and have 9 months to submit their manuscripts on it. In the meantime, you can look at the data and start to use it. But in general, they ask that you don’t submit a paper without the consent of the ENCODE team in that window. The appendix offers a couple of nice scenarios about the appropriate use of the data so it helps to clarify this.

I hope you’ll have a look at the ENCODE draft data release policy and think about using the ENCODE data. And please give NHGRI and the ENCODE team feedback on this.

*Note on the test server: this is a sandbox for developers at UCSC, the data might not have all be QCed yet, and data here should not be considered final form. But you can have a look.

There’s been some coverage of the request for comment elsewhere, too, if you want to read more about this: http://www.genomeweb.com/issues/news/149419-1.html

UCSC Genome Browser “News” item has a link to the document as well.