I’m relatively new to Drupal internals and most of the work I do is on the database side. While searching for information on Drupal’s schema, I found very little. During my research, I put together an ER diagram of the schema installed by Drupal 7 (D8 is very similar with only 3 extra tables at time of writing) and decided to share my work. Note that the relationships I discuss here are based on the foreign key documentation that exists in core and my understanding of what I believe other relationships could be. Corrections and comments are very much welcome.

Overview

I’ll start off by showing my complete ER diagram below. You will see I grouped tables I found to be related in colored boxes. The image below is just meant to give a general overview of the schema. I will be diving into different parts of the schema in this post. I created this diagram using MySQL Workbench and the model can be downloaded from here if someone wishes to open this up in Workbench. This gist also shows the ALTER TABLE SQL statements that would need to be issued to actually create these foreign keys in MySQL. I would not recommend doing this right now with Drupal as many things would break.

Full ER Diagram.

Without delving into the relationships and details of this diagram, lets first cover some basic details. A stock install of Drupal 7 results in 73 tables being created. 10 of those tables are used for caching purposes:

Caching TableDescription
cachecaches items not separated out into their own cache tables
cache_blockthe block modules can cache already built blocks here
cache_bootstrapdata required during the bootstrap process can be cached in this table
cache_fieldstores cached field values
cache_filtercaches already filtered pieces of text
cache_formcaches recently built forms and their storage data
cache_imagecaches information about image manipulations that are in progress
cache_menucaches router information as well as generated link trees
cache_pagecaches compressed pages served to anonymous users
cache_pathcaches path aliases

11 tables are created which do not relate to any other tables:

Table NameDescription
actionsstores action information
batchstores details about batches (processes that run in multiple HTTP requests)
blocked_ipsstores a list of blocked IP addresses
floodcontrols the threshold of events, such as the number of contact attempts
queuestores items in queues
rdf_mappingstores custom RDF mappings for user-defined content types
semaphorestores semaphores, locks, and flags
sequencesstores IDs
systemcontains a list of all modules, themes, and theme engines that are or have been installed
url_aliascontains a list of URL aliases for Drupal paths
variablestores variable/value pairs created by Drupal core or any other module or theme

The 21 tables listed above are self-explanatory and I’m not going to discuss them any further in this post. They also are independent in that these tables have no relationships with other tables.

Field Related Tables

There are 8 tables installed with core related to fields and field storage:

Table NameDescription
field_data_bodystores details about the body field of an entity
field_revision_bodystores information about revisions to body fields
field_data_comment_bodystores information about comments associated with an entity
field_revision_comment_bodystores information about revisions to comments
field_data_field_imagestores information about images associated with an entity
field_revision_field_imagestores information about revisions to images
field_data_field_tagsstores information about tags associated with an entity
field_revision_field_tagsstores information about revisions to taxonomy terms/tags associated with an entity

While I was initially tempted to have these tables related to node, that would not really be correct since these tables are related to an entity. In D7, entities can be other objects besides nodes, such as users or comments. The entity_type column in these tables reflects that reality. These tables can be stored in other storage systems such as MongoDB due to the field storage API introduced in Drupal 7.

There are 2 other tables related to fields: field_config and field_config_instance. These tables store field configuration information. I believe a row in field_config_instance cannot (well at least should not) exist without the correspondong field_id in the field_config table. Hence, the one-to-many relationship from field_config to field_config_instance is an identifying relationship.

Small Groups of Tables

There are a number of groups you will notice in the full ER diagram that are made up of 2 to 3 tables. Zooming in on 4 of those groups, we can see those tables more clearly:

Zooming in on small groups.

One thing you will notice is that some relationships are shown with a solid line whereas others use a dotted line. MySQL Workbench represents identifying relationships with a solid line and non-identifying relationships with a dotted line. If you are unfamiliar with those terms, the standard defintions are:

  • identifying relationship - the foreign key attribute is part of the child’s primary key attribute.
  • non-identifying relationship - the primary key attributes of the parent must not become primary key attributes of the child.

This stack overflow answer from Bill Karwin contains a good discussion on these topics.

Now lets discuss those groups in more detail.

Registry Group

I grouped the registry and registry_file tables together. These tables are used for implementing the code registry in Drupal. A one-to-many relationship exists from registry_file to registry and this relationship is an identifying relationship. A filename should not appear in the registry table that is not present in the registry_file table.

Image Group

I grouped the image_styles and image_effects tables together. These tables store configuration options for image styles and effects. A one-to-many relationship exists from image_styles to image_effects and this relationship is a non-identifying relationship.

date_format Group

There are three tables about date formats in Drupal. date_format_type is a lookup table that stores configured date format types. After a stock install of Drupal 7, three date format types exist:

  • long
  • medium
  • short

A one-to-many relationship exists from this lookup table to both date_formats and date_format_locale.

In practice, this would be problematic. For example, a new date format can be created by an adminstrator. In D7, this results in the system_date_format_save function being called. This function will insert a row in the date_formats table that will not have a corresponding type (type will be listed as custom).

You will also notice the locked column is redundant in the date_formats table. I submitted a patch to change this.

File Group

I grouped the file_managed and file_usage tables into 1 group. These tables store information about uploaded files and information for tracking where a file is used.

I believe a 1-to-1 relationship exists from file_managed to file_usage and that this is an identifying relationship.

User Related Tables

There are quite a few tables that store user related information. Below is a figure where I zoom in on those tables.

User tables.

As you can see, the tables directly associated with users are watchdog, sessions, and authmap. These tables are in a one-to-many relationship from users. The functionality these tables provide is:

Table NameDescription
authmapstores distributed authentication mapping
sessionsstores information about a users session
watchdogcontains logs of all system events

There are then two tables that are in a many-to-many relationship with users that link this table with other groups. One of these is the users_roles table. This table links users with role. The role table is then in a one-to-many relationship with the role_permission table. The other many-to-many table is shortcut_set_users. This table links users with shortcut_set.

The tables for the menu system are not really related to users but I placed the group close by since the menu_links table maintains a one-to-many relationship with the shortcut_set table. While the tables for the menu system do not appear to be related, I do believe a relationship exists there. In particular, I think that the menu_link table has relationships to both the menu_router and menu_custom tables. The router_path column in menu_links could reference the router column in menu_router and the menu_name column in menu_links could reference the menu_name in the menu_custom table. Right now however, after a stock install of D7, a row with a menu name that is not present in menu_custom will be created in menu_links.

The menu system tables and a description of what they do is below.

Table NameDescription
menu_customholds definitions for top-level custom menus
menu_linkscontains the individual links within a menu
menu_routermaps paths to various callbacks

Node Related Tables

Node is one of the most central concepts in Drupal so as you can imagine, many tables are related to that concept. First off, a high level overview of the tables related to the node table are shown below.

Node tables.

Tables that are directly related to node are node_revision, node_access, and node_type. The node_type table is in many-to-many relationship with node and block_node_type. node_revision is in a many-to-one relationship with node as is node_access. The node_access table has only 1 row upon initial installation and references a non-existent node. An issue has been created to address this.

The tables directly related to node and a description of what they do is below.

Table NameDescription
node_accessidentifies which realm/grant pairs a user must possess in order to view, update, or delete specific nodes
node_revisionstores information about each saved version of a node
node_typestores information about all defined node types

Taxonomy Tables

Four tables in the stock schema are related to taxonomy. These tables are shown in the figure below.

Taxonomy tables.

First of all, the taxonomy_index table is in a many-to-many relationship with the node and taxonomy_term_data tables. The taxonomy_vocabulary table has a one-to-many relationship with the taxonomy_term_data table. The taxonomy_term_data table in turn has 2 1-to-many relationships with the taxonomy_term_hierarchy table.

A description of the taxonomy tables is given below.

Table NameDescription
taxonomy_indexmaintains de-normalized information about node/term relationships
taxonomy_term_datastores term information
taxonomy_term_hierarchystores the hierarchical relationship between terms
taxonomy_vocabularystores vocabulary information

Block Tables

The main table in this group is block. It has three directly related tables in one-to-many relationships: block_node_type, block_role, and block_custom.

Blocks tables.

A description of the blocks tables is given below.

Table NameDescription
blocksstores block settings
block_customstores the contents of custom-made blocks
block_node_typestores information that sets up display criteria for blocks based on content type
block_rolestores access permissions for blocks based on user roles

Search Tables

The relationships for the search tables I am a little unsure of. I believe they are as shown in the figure below.

Search tables.

The relationship I’m most unsure of here are between search_total and search_index. I don’t think the one-to-many relationship I have in place from search_total to search_index is correct.

A description of the search tables is given below.

Table NameDescription
search_datasetstores items that will be searched
search_indexstores the search index and associates words, items, and scores
search_node_linksstores items that link to other nodes
search_totalstores search totals for words

Tables That Relate Nodes to Users

There are three tables in many-to-many relationships between node and users:

Table NameDescription
commentstores comments and associated data
historystores a record of which users have read which nodes
node_comment_statisticsmaintains statistics of nodes and comments posts to show new and updated flags

The comment table could be in its own group. I decided against doing that in this ER diagram since I felt like it would have been a table by itself. Logically, I think of it as either being in the users or node group.

node_comment_statistics does also maintain a relationship with comment. This is a non-identifying relationship since a node can exist without any comments.

Conclusion

During this work, I noticed that the column definitions for many foreign key relationships are in-correct which would result in MySQL not allowing these constraints to actually be created. I created an issue and patch for this but it turns out Liam Morland is working on using foreign keys in core and also came across this around the same time as me.

Other issues I encountered have also been logged by Liam:

  • the node_access table references a non-existent node (relevant issue)
  • a set name exists in shortcut_set that does not exist in menu_links (relevant issue)

I would vote for foreign keys being used in Drupal core for a number of reasons, not least of which foreign keys aid a newcomer when trying to understand the schema installed by Drupal.

As I mentioned at the beginning of this post, any comments or corrections are very much welcome. I hope this information can prove useful to someone else besides me!



blog comments powered by Disqus

Published

02 August 2012

Category

planet drupal