Tag Archives: tfbs

Video Tip of the Week: AnimalTFDB for transcription factors

Transcription factor details–and sources of information about them and their binding sites–are definitely among the the most common questions we hear in our workshops. There are ways to look at predictions of binding, and for some species evidence of binding, and there are ways to look for binding motifs. But these resources vary in methods and scope, and it’s not easy to obtain collected information about transcription factors in many species.

One group has tried to change this, at least for animal transcription factors, with the AnimalTFDB. They collected and curated information about more than 70 families of TFs from 50 species, and created an interface where you can explore this collection by family, by species, and more.

This was offered as one of the sources of information from a recent query at BioStar. The poster was looking specifically for non-model organism information, and this database was one of the suggestions. But you can explore that question for other details and suggestions too.

The paper from the AnimalTFDB team provides information on other sources of TF information–including bacteria, plants, and various types of related resources too. So if you are looking for non-animal details there could be some guidance for you in there. Some of these you might see in future tips!

In the AnimalTFDB system itself, you’ll find that you have multiple ways to explore the data. From the landing page you can quickly browse to the collected data by TF family, or move right to the data organized by species instead. But there is also a standard search option with a form-based query. You can refine your search in various ways with that search form.

When you get to a transcription factor (or one of the other types that’s curated, transcription co-factors and chromatin remodeling proteins) there will be links to many types of useful additional details. Transcripts, domains, GO terms, and links to multiple related resources and more.

So if you are interested in transcription factors, co-factors, and chromatin remodeling proteins in animal species, check out AnimalTFDB.

Quick link:

AnimalTFDB: http://www.bioguo.org/AnimalTFDB/

Zhang, H.M., Chen, H., Liu, W., Liu, H., Gong, J., Wang, H. & Guo, A.Y. (2011). AnimalTFDB: a comprehensive animal transcription factor database, Nucleic Acids Research, 40 (D1) D149. DOI: 10.1093/nar/gkr965

What’s the answer? (non-model org TFs)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question highlights another issue with transcription factor binding site data. Increasingly people are seeking out this data, and this time it’s not for human or a well-studied model organism. As we broaden out with more species sequence data, this will also be another big need.

This week’s question: List of TF and TFBS from a non-model species


How can I find the TF and TFBS from a non-model species (in my case the cow). Maybe is it possible to infer them with the human TF and TFBS ? My goal is to detect the TF from differentially expressed genes. and maybe the differentially expressed TF regulating differentially genes.

Other related question : how to know which gene encode a TF ?

Thanks a lot,



Again, I found a TF resource that was new to me, so I appreciated the answers. Check them out.

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • Wait, where is the $100 genome exactly? I must have missed a tweet somewhere RT @deanhendrix: If we can sequence a human genome for $100, why can’t we openly publish for $100? http://t.co/JUIg5HnZ via @thePeerJ #openaccess #cmonnow  [Mary]
  • RT @genetics_blog: NAR: Most conserved non-coding sequences are regulatory factor binding sites http://t.co/kcFwPeZ7 #bioinformatics [Mary]
  • RT @bffo: Very cool, new cystoscope App store http://t.co/u8gbLiAk #Bioinformatics [Mary]
  • Heh–fair point: RT @phylogenomics: At end @mjpallen says “Radio has survived alongside TV” in reference to how traditional microbiology will survive alongside genomics #SAMG12 [Mary]
  • RT @mem_somerville: Was looking up old story, realized that Nurse, Venter, Collins all have motorcycles. Want to see them race. (must be a gene…) [Mary]
  • @OpenHelix: RT @genome_gov: #microbiome Mapping of human microbiome produces insights, surprises. http://t.co/TYiXCvgX [Mary]
  • And then… RT @matthewherper: Was The Human Microbiome Project A Waste Of Money? – Forbes http://t.co/Crb9eFzo via @sharethis [Mary]
  • And then…snorf:
  • Oooh boy: RT @drbachinsky: Hay Festival 2012: Dull middle-aged scientists should not get grants, says DNA pioneer James Watson via @Telegraph http://t.co/1iGAfFof [Mary]
  • I want to photosynthesize. RT @Argent23: The next twist in the Elysia story. Seems like >50 algal chloroplast genes were transferred into the slug genome! http://t.co/fDJ4iJYa [Mary]

Special Bonus item: if science had tabloids—you have to go see Francis Collins and Fred Sanger “Hot Pics!” http://fakescience.tumblr.com/omgscience

Video Tip of the Week: TFBS using Mapper

Need to explore transcription factor binding sites (TFBS)? If you reading this, you might know already, but just to recap:

Transcription is regulated through the binding of transcription factor proteins to specific cis-level regulatory sites in the DNA. The nature of this regulation depends on the transcription factor. For example, some proteins activate transcription by recruiting RNA polymerase, some repress transcription by suppressing this recruitment, and others insulate proximal regions from the activity of nearby transcriptional activators or repressors. A key characteristic of each transcription factor protein is its DNA binding domain. Each DNA binding domain recognizes and interacts with DNA that matches a specific nucleotide pattern, or motif.

Determining these TFBS can help elucidate the regulation of a gene, determination of the cause of disease, and more. There are some very good transcription factor binding site databases and prediction tools available. Two that come to mind are Transfac and Jasper. There are other databases you might want to take a look at such as UniProbe, ORegAnno (which also has a UCSC track), oPOSSOM, UniProbe,  hPDI and many others. UCSC Genome Browser has a track of computationally derived conserved (human/mouse/rat) TFBS and ENCODE TFBS determined by ChIP-seq (of which you can find a mega-table here at FactorBook). PAZAR is a compilation of TF data from many small databases.  ORegAnno has a page  of additional databases and tools for TFBS and regulatory regions. Each of these have different strengths, weaknesses and data. So, get cracking :D.

The database and search tool I will focus on in this tip of the week is Mapper. Mapper uses TFBS from Transfac and Jasper and maps them to genomic locations for several species. Using “the search power of profile hidden Markov models (HMMs),” Mapper includes a database of pre-computed TFBS locations and an on-the-fly search engine for TBFS. Additionally, there is rSNPs, a nice handy tool designed to identify SNPs which have a significant effect on the score of a TFBS.

Today’s tip of the week will focus on the database and rSNPs and a basic intro to using these.
Marinescu, V., Kohane, I., & Riva, A. (2005). MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes
BMC Bioinformatics, 6 (1) DOI: 10.1186/1471-2105-6-79

(HT to Biostar and answers found here)

Tip of the week: ORegAnno for regulatory annotation

Lately we’re getting a lot of questions about ways to analyze the promoters and other regulatory aspects of genes. And for a while we were mostly pointing to the prediction data that was available in the UCSC Genome Browser’s TFBS Conserved track. TFBS Conserved is a track of computationally predicted transcription factor binding sites (TFBS) which are conserved across human/mouse/rat and based on Transfac v7.0 by BioBase.  As they say in the track description, it’s important to know this:

The data are purely computational, and as such not all binding sites listed here are biologically functional binding sites.

Though this is useful, people have been wanting more evidence based on real binding and/or activity data. Today’s tip will talk about 2 ways to get other data–beyond computational predictions. First we’ll explore ORegAnno so you’ll understand the data sources, and then we’ll also look at that data in the context of the UCSC Genome Browser and some useful data from the ENCODE project.

ORegAnno is the Open Regulatory Annotation Database, a community literature curation project for regulatory information. Anyone can participate in the curation–they provide helpful curation tools and automated cross-linking and checking features that make it easier. You would register, curate, and the data becomes available to anyone. And with the curator tools that are available the data becomes loaded into projects that coordinate with ORegAnno–including the track at the UCSC Genome Browser of ORegAnno data.

In the paper published in NAR 2008, they stated this:

The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species.

So that’s a nice set with traceable data that’s not just computational predictions. In the tip I’ll show one example of Stat1 binding, in human, near the Il10 gene. If you look at that record, you’ll see several pieces of evidence that support this data and a link to the publication that offers it.

Now, if you look at ORegAnno data over in the UCSC Genome Browser, you could compare it to the computational predictions, or TFBS data from other projects such as the ENCODE data sets with the Chip-Seq data (Yale TFBS and HAIB, for example; note: you may have to go back an assembly because the ENCODE data is not all on the current assembly at this time). This is what I show in the movie: I take an ORegAnno annotated item, visualize that with the TFBS Conserved predictions and with some ENCODE project data.  So you get all 3 types of data with a few clicks.

So there are several ways to look for TFBS data–some of it computational predictions, some literature curation, and some big data stuff from the ENCODE teams. All of them have strengths and caveats. Computational predictions may be genome wide and independent of a given cell or tissue type, but are subject to the constraints of the algorithms. Community literature curation can offer quality evidence, but may be selected by interested groups and not as broadly representative of the genome-wide situation. Big data projects can be genome-wide and have evidence in some cell types, but may be in progress and subject to checking as they are pre-publication data.  But effectively using them all could help you to understand regulation of genes that you might be interested in.

Quick Links:

ORegAnno: http://www.oreganno.org/

Biobase and Transfac: http://www.gene-regulation.com/pub/databases.html

UCSC Genome Browser: http://genome.ucsc.edu/

ENCODE data at UCSC: http://genome.ucsc.edu/ENCODE/

Griffith, O., Montgomery, S., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M., Bilenky, M., Haeussler, M., Griffith, M., Gallo, S., Giardine, B., Hooghe, B., Van Loo, P., Blanco, E., Ticoll, A., Lithwick, S., Portales-Casamar, E., Donaldson, I., Robertson, G., Wadelius, C., De Bleser, P., Vlieghe, D., Halfon, M., Wasserman, W., Hardison, R., Bergman, C., Jones, S., & The Open Regulatory Annotation Consortium. (2007). ORegAnno: an open-access community-driven resource for regulatory annotation Nucleic Acids Research, 36 (Database) DOI: 10.1093/nar/gkm967