Transcription factors, in a somewhat simplified definition, are proteins that regulate transcription by binding to specific sequence elements in regulatory genome regions such as promoters, enhancers etc.
TFClass is a classification of eukaryotic transcription factors based on the characteristics of their DNA-binding domains (DBDs). It comprises four general levels (superclass, class, family, subfamily) and two levels of instantiation (genus and molecular species). Two of them (subfamily and factor species) are optional. More detailed explanations about the classification scheme and its criteria are given here. The ontology is available in the Turtle format.
So far, TFClass comprises human transcription factors (TFs) and their mammalia orthologs. An earlier version of the full classification of human TFs as html document can also be obtained here, that of their mouse orthologs here.
The ontology is shown in the left part of the interface. To browse the classification, an individual taxon can be opened by clicking on the arrow head in front of it; clicking again closes all taxa that have been opened. On top of the tree, there is a search window with an auto-complete function for the factor names. The link underneath directs to the freely accessible TRANSFAC® FACTOR table.
In the right (main) part of the interface, additional information about the selected node is provided. For a Superclass (node with a one-digit identifier), a general definition is given as well as the numbers of biological species which have known TFs within this superclass and the number of genera defined within this superclass.
For a Class (node with a two-digits identifier), a definition is also provided, followed by the number of species and TF genera subsumed here. A sequence logo illustrates the most-conserved residues in the corresponding type of DBD. The table underneath gives links to the FASTA files with all sequences of the TFs of this node, on the left for the DBDs, on the right for the full-length proteins. Underneath are thumbnails of the corresponding phylogenetic trees of the whole sequence set (top) or for a “slim selection” of only four species, usually human, mouse, cow and Monodelphis.
A Family (or Subfamily; three- or four-digit identifiers, resp.) node has a similar structure as a Class entry (see above), except that instead of a definition, an (idealized) consensus DNA sequence may be given to which TFs of this family preferably bind, followed by a link to the list of predicted binding sites in the human, mouse, cow or dog genome (see below).
A Genus node (five-digit identifier) has as “General” information only the number of biological species that have a known TF of this group. Underneath, all these species are listed along with the available links to external databases and a sketch visualizing the position of the DBD within the canonical molecule and its splice variants as defined by UniProt. On top of this list, links are given to jump straight to the human, murine or rat entry. The order or entries in the list can be customized and will be preserved throughout the active session.
Show predicted binding sites ("Seed sites"): Linked to (sub)family entities, if present. Clicking this option will open a selection list of all positional weight matrices that are connected with any of the TFs in this (sub)family. Choosing one of them will open the complete list of potential binding sites that (a) are located in the 1kb upstream region of human genes, (b) are conserved among human, mouse, dog and cow, and (c) belong to the x% best scoring sites (x to be chosen by the user, default is 5%; score is the Match score, as described in Kel et al., 2003).
Protein expression (human only): Displayed in a separate overlay. Based on the information provided by the Human Protein Atlas, expression sources (organs, tissues) have been linked to the Cytomer ontology.
When referring to this classification, please cite:
Wingender E, Schoeps T, Haubrock M, Krull M and Dönitz J.:
TFClass: expanding the classification of human transcription factors to their mammalian orthologs.
Nucleic Acids Res. 46, D343-D347 (2018). doi: 10.1093/nar/gkx987
link
Previous publications:
Wingender, E., Schoeps, T., Haubrock M. and Dönitz, J.:
TFClass: a classification of human transcription factors and their rodent orthologs.
Nucleic Acids Res. 43, D97-102 (2015). doi: 10.1093/nar/gku1064
link
Wingender, E., Schoeps, T. and Dönitz, J.:
TFClass: An expandable hierarchical classification of human transcription factors.
Nucleic Acids Res. 41, D165-D170 (2013). doi: 10.1093/nar/gks1123
link
Wingender, E.: Criteria for an updated classification of human transcription factor DNA-binding domains.
J. Bioinform. Comput. Biol. 11, 1340007 (2013). doi: 10.1142/S0219720013400076
link
Wingender, E.: Classification scheme of eukaryotic transcription factors.
Mol. Biol. 31, 584-600 (1997); Mol. Biol. Engl. Tr. 31, 483-497 (1997).
link