Recently, I started working on a plugin that performs direct to Memcached replication in Drizzle. While working on this, I found that I wanted to be able to filter replication events based on schema or table names. I went ahead and implemented this in my Memcached plugin but then realized that this functionality would be better off as its own plugin as I imagine filtering of replication events will be a pretty common task people will want to perform. This led me to start working on a filtered replicator plugin for Drizzle. Before diving in to the plugin implementation, I should mention that Jay Pipes has previously written in significant detail on the replication architecture in Drizzle. I recommend reading that post from Jay if you are not familiar with replication in Drizzle before proceeding with this post.
Jay is currently working on providing documentation regarding replication in Drizzle and you can track that work on the wiki page he created. Its still a work in progress so if you really want to discover how all this works in Drizzle, I recommend having a look at the source code. Jay's work contains a copious amount of comments and is not difficult to read or understand; I highly recommend it. If you are interested in getting involved with this replication development, I'm sure Jay would be more than happy to get some contributors involved. The best way to get started is to ping the mailing list or one of the developers on #drizzle on FreeNode to indicate your interest.
Development of the Replicator Plugin As with any plugin in Drizzle, there are 3 files that are important for building the plugin:
  • plugin.ini
  • plugin.ac
  • plugin.am
Only the plugin.ini file is mandatory. This file is a standard ini-file that currently contains only one section - [plugin]. For the filtered replicator plugin, the plugin.ini file looked like:
[plugin]
name=filtered_replicator
title=Filtered Replicator
description=A simple filtered replicator which allows a user to filter out
            events based on a schema or table name
load_by_default=yes
sources=filtered_replicator.cc
headers=filtered_replicator.h
More information on the 3 files related to plugins are available on the plugin build system page on the Drizzle wiki. Since the replicator plugin does not depend on any external library, we don't need to worry about the other 2 plugin build files here.
Now, since we are developing a replicator, we need to be aware of the replicator API provided by Drizzle's core kernel. That API is defined in the drizzled/plugin/replicator.h include file. If we look in that file, we find the following class definition:
/**
 * Class which replicates Command messages
 */
class Replicator
{
public:
  Replicator() {}
  virtual ~Replicator() {}
  /**
   * Replicate a Command message to an Applier.
   *
   * @note
   *
   * It is important to note that memory allocation for the
   * supplied pointer is not guaranteed after the completion
   * of this function -- meaning the caller can dispose of the
   * supplied message.  Therefore, replicators and appliers
   * implementing an asynchronous replication system must copy
   * the supplied message to their own controlled memory storage
   * area.
   *
   * @param Command message to be replicated
   */
  virtual void replicate(Applier *in_applier,
                         drizzled::message::Command *to_replicate)= 0;

  /**
   * A replicator plugin should override this with its
   * internal method for determining if it is active or not.
   */
  virtual bool isActive() {return false;}
};
The above was developed by Jay and thanks to his awesome work (with really helpful comments), its pretty easy for us to determine what our replicator plugin needs to do. Basically, all we need to do is inherit from the Replicator class and implement the replicate() and isActive() methods and we have a simple replicator! Thus, we will have the following class:
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();
};
Now, for the moment we want to filter by schema name or table name. Thus, we need a place to store the list of schema and table names to filter. Since this is Drizzle and Drizzle is all about using the STL, we'll go with a std::vector for each of these lists. We are going to assume that the list of schemas and table names to filter by are specified as a comma-separated list so we will need a method to parse a comma-separated list and populate the appropriate vectors. Finally, we will also need methods for determining whether a table name or schema name should be filtered or not. Based on all this, our class definition will now look like:
class FilteredReplicator: public drizzled::plugin::Replicator
{
public:
  FilteredReplicator() {}

  /** Destructor */
  ~FilteredReplicator() {}

  void replicate(drizzled::plugin::Applier *in_applier,
                 drizzled::message::Command *to_replicate);

  /**
   * Returns whether the replicator is active.
   */
  bool isActive();

  /**
   * Populate the vector of schemas to filter from the
   * comma-separated list of schemas given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setSchemaFilter(const std::string &input);

  /**
   * Populate the vector of tables to filter from the
   * comma-separated list of tables given. This method
   * clears the vector first.
   *
   * @param[in] input comma-separated filter to use
   */
  void setTableFilter(const std::string &input);

private:

  /**
   * Given a comma-separated string, parse that string to obtain
   * each entry and add each entry to the supplied vector.
   *
   * @param[in] input a comma-separated string of entries
   * @param[out] filter a std::vector to be populated with the entries
   *                    from the input string
   */
  void populateFilter(const char *input,
                      std::vector &filter);

  /**
   * Search the vector of schemas to filter to determine whether
   * the given schema should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] schema_name name of schema to search for
   * @return true if the given schema should be filtered; false otherwise
   */
  bool isSchemaFiltered(const std::string &schema_name);

  /**
   * Search the vector of tables to filter to determine whether
   * the given table should be filtered or not. The parameter
   * is obtained from the Command message passed to the replicator.
   *
   * @param[in] table_name name of table to search for
   * @return true if the given table should be filtered; false otherwise
   */
  bool isTableFiltered(const std::string &table_name);

  std::vector schemas_to_filter;
  std::vector tables_to_filter;
};
Now that we have the API for our replicator plugin decided on, lets implement the replicate() function. This will perform the filtering of events. For this plugin, it looks pretty simple (which is a good thing!):
void FilteredReplicator::replicate(drizzled::plugin::Applier *in_applier,
                                   drizzled::message::Command *to_replicate)
{
  /*
   * We first check if this event should be filtered or not...
   */
  if (isSchemaFiltered(to_replicate->schema()) ||
      isTableFiltered(to_replicate->table()))
  {
    return;
  }

  /*
   * We can now simply call the applier's apply() method, passing
   * along the supplied command.
   */
  in_applier->apply(to_replicate);
}
Our method for checking whether a schema should be filtered or not simply uses the STL. For completeness, that method looks as follows:
bool FilteredReplicator::isSchemaFiltered(const string &schema_name)
{
  vector::iterator it= find(schemas_to_filter.begin(),
                            schemas_to_filter.end(),
                            schema_name);
  if (it != schemas_to_filter.end())
  {
    return true;
  }
  return false;
}
There is not much more to it than that! As you can see, developing a replicator plugin does not have to be very difficult. Thanks to Jay's awesome work, it is actually fun! I am really enjoying working on my memcached applier at the moment (so much so that I probably spend too much time thinking about it when I should be working on other things...)
System Variables in a Plugin The handling of system variables in a Drizzle plugin is not very pretty at the moment. Thankfully, Monty is working on refactoring system variables in Drizzle. You can read more about that work on the wiki page Monty created. However, for now, we are stuck with the old system. I'm going to describe what I needed to do for one system variable that specifies which schemas we should filter when filtering replication events. The system variable declaration looks as follows:
static DRIZZLE_SYSVAR_STR(filteredschemas,
                          sysvar_filtered_replicator_sch_filters,
                          PLUGIN_VAR_OPCMDARG,
                          N_("List of schemas to filter"),
                          check_filtered_schemas, /* check func */
                          set_filtered_schemas, /* update func */
                          NULL /* default */);
You can see that we specified 2 callback functions: check_filtered_schemas() and set_filtered_schemas(). These are both called when a SET command is executed on this system variable. The check_filtered_schemas() function can be used to make sure that the input is well-formed (I don't really check for that at the moment). For the moment, the check_filtered_schemas() function just copies the input string to a temporary string. Here is the code for that function (the temporary string and mutex are declared as global variables):
static int check_filtered_schemas(Session *,
                                  struct st_mysql_sys_var *,
                                  void *,
                                  struct st_mysql_value *value)
{
  char buff[STRING_BUFFER_USUAL_SIZE];
  int len= sizeof(buff);
  const char *input= value->val_str(value, buff, &len);

  if (input && filtered_replicator)
  {
    pthread_mutex_init(&sysvar_sch_lock, NULL);
    pthread_mutex_lock(&sysvar_sch_lock);
    tmp_sch_filter_string= new(std::nothrow) string(input);
    if (tmp_sch_filter_string == NULL)
    {
      pthread_mutex_unlock(&sysvar_sch_lock);
      pthread_mutex_destroy(&sysvar_sch_lock);
      return 1;
    }
    return 0;
  }
  return 1;
}
Next, we need a function to actually update the system variable. This function looks like so:
static void set_filtered_schemas(Session *,
                                 struct st_mysql_sys_var *,
                                 void *var_ptr,
                                 const void *save)
{
  if (filtered_replicator)
  {
    if (*(bool *)save != true)
    {
      filtered_replicator->setSchemaFilter(*tmp_sch_filter_string);
      /* update the value of the system variable */
      *(const char **) var_ptr= tmp_sch_filter_string->c_str();
      /* we don't need this temporary string anymore */
      delete tmp_sch_filter_string;
      pthread_mutex_unlock(&sysvar_sch_lock);
      pthread_mutex_destroy(&sysvar_sch_lock);
    }
  }
}
You can see that having system variables in a plugin that can be updated is a little bit tricky right now in Drizzle. I wouldn't spend too much time worrying about this at the moment though. Like I said, once Monty finishes his system variable refactoring, we won't have to write such ugly and hard to understand code again. I am definitely looking forward to using the refactored system variables in Drizzle!
Using the Plugin My branch with the filtered replicator plugin I developed is available on Launchpad. You can build it by pulling the branch from Launchpad:
$ cd dir/to/place/branch
$ bzr branch lp:~posulliv/drizzle/filtered-replicator
$ cd filtered-replicator
$ ./config/autorun.sh && ./configure && make
After compiling the branch, we can start playing with it. First thing we need to do is to start Drizzle. That can be accomplished easily:
$ cd /dir/with/replicator/branch
$ mkdir run
$ cd run
$ ../drizzled/drizzled --no-defaults --port=9306 \
--basedir=$PWD --datadir=$PWD \
--filtered-replicator-enable --filtered-replicator-filteredschemas='one,two' \
>> $PWD/drizzle.err 2>&1 &
The above command will start drizzled along with the filtered replicator. One of the system variables associated with this replicator is which schemas to filter replication events by. It is possible to specify these when starting the server (as well as tables to filter replication events by). You will notice that we have not enabled any applier of replication events. What does this mean? Well, it means that nothing is being done with the events that are happening! Sure, I have a replicator running that filters events based on what I specify but nothing is done with these events! I'm currently working on a Memcached applier that takes events and pushes them to a Memcached server to maintain a proactive cache but that is the topic of another blog post.
Now that we have the server up and running, lets see what system variables there are related to our replicator plugin (below, we are assuming the server is still running):
$ cd /dir/with/replicator/branch
$ cd run
$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 2009.07.1067 Source distribution (filtered-replicator)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | first,second |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle>
Lets modify the schemas we are filtering replication by (after showing the actual code that performs this, we might as well do it!):
drizzle> set global filtered_replicator_filteredschemas = 'third,fourth';
Query OK, 0 rows affected (0 sec)

drizzle> show variables like '%replicat%';
+-------------------------------------+--------------+
| Variable_name                       | Value        |
+-------------------------------------+--------------+
| default_replicator_enable           | OFF          |
| filtered_replicator_enable          | ON           |
| filtered_replicator_filteredschemas | third,fourth |
| filtered_replicator_filteredtables  |              |
| innodb_replication_delay            | 0            |
+-------------------------------------+--------------+
5 rows in set (0 sec)

drizzle>
Conclusion This plugin is still under development and I'd love any input from people. What I'd really like to know is what kind of filters would people like to be able to specify? How flexible would people want a filtered replicator to be? Right now, its only possible to filter by schema or table name but I could easily add more options if I thought they would be useful to people.

blog comments powered by Disqus

Published

24 July 2009

Category

drizzle