A few weeks ago I was reading the paper Anatomy of a Database System by Hellerstein and Stonebraker. Chapter 2 of that paper discusses process models in database systems. After reading that paper, I was interested in seeing what Drizzle does in this regard so I began looking at the source code to see. Essentially, Drizzle uses the thread per DBMS worker model that is outlined in the paper where a single multi-threaded process hosts all the DBMS worker activity. Drizzle also has the concept of a pool of threads where workers are miltiplexed over a thread pool. The really nice thing about Drizzle in this regard is that the code for implementing the pool of threads is a plugin so if anyone is interested in writing their own thread scheduler, they can simply write a plugin for it. While developing an efficient scheduler might be a challenge, the mechanism for writing a plugin is pretty easy. I think that's pretty cool.
Let's discuss how a client connectection is made to Drizzle and a query is executed. I'll provide a general overview first and then delve into more details. When the Drizzle server is started, a pool of threads is created. The intial MySQL worklog for the implementation of the thread pool mechanism can be found here. The number of threads in this pool can be specified by an administrator. The pthreads API is used for the creation of threads in Drizzle. The thread pool code also utilizes the libevent API which provides a mechanism to execute a callback function when a specific event occurs on a file descriptor. During the initialization of the thread pool, 2 callback functions are registered with libevent. These callback functions are to be executed whenever a session is added or killed. Each thread created during the thread pool initialization process has a thread body which waits for a session to process using libevent. When a new connection comes in, the thread pool code adds it to a queue for libevent processing. When a libevent callback function is invoked (more information about how and when this happens below), a session is removed from the queue and placed in one of two lists depending on the current state of the session - if the session is waiting for I/O it will be added a list indicating that; otherwise, if it is ready for processing, it will be added to a list indicating this. The body of each thread creating during the thread pool initialization is continuously running a loop which looks at the list of sessions that need processing. Whenever a session is added to that list, a thread will pop it from the list and process it. This thread will then go ahead and actually execute the command which the session wants to execute.
Now, lets delve just a little bit further into how client connections are made based on the short summary given in the previous paragraph. I'll reference relevant files and methods in the Drizzle Doxygen docs as I go along when possible. The first thing we'll look at is the main() method of the server which is executed when the server starts. This method is contained in drizzled.cc. After initializing various things, the handle_connections_sockets() method is called. This method is also in the drizzled.cc file and its purpose is to handle new connections and spawn new threads to handle them. This method contains a while loop which continuously executes during the lifetime of the server waiting for new connections to come in to the server. Within this loop, a poll() system call is performed. The poll() system call waits for an event on a file descriptor to occur. In this case, the event will be a new connection. When a new connection comes in, accept() is called to accept a connection on a socket and create a new connected socket. In the drizzled.cc file, this new socket is called new_sock (funnily enough!). Once error checking on the new socket is complete, a new Session object is allocated. If this allocation fails, then the server has reached a limit on the number of sessions that can occur. If no error occurs then the new Session object is passed as a parameter to the create_new_thread() method (also in the drizzled.cc file).
The create_new_thread() method creates a new thread to handle the incoming connection. It is in this method that control actually enters the thread pool code. This occurs when the thread_scheduler.add_connection() method is called. thread_scheduler is a struct of type scheduling_st that defines the interface the scheduler plugin. When add_connection() is called on the thread_scheduler struct it calls the add_connection() function in whichever scheduler plugin is currently loaded. Since we are talking about the thread pool plugin, it will call the add_connection() function in the pool_of_threads.cc file. The add_connection() method notifies the thread pool about a new connection. A new session_scheduler object is created for that new connection. The session_scheduler class is defined in the session_scheduler.h file. This scheduler is set as the scheduler for the Session object that was passed as a parameter to the create_new_thread() method. Next, the libevent_session_add() method is called with the Session object passed as a parameter.
The libevent_session_add() method adds the Session object to a queue for libevent processing. It signals libevent by writing a byte into the session_add pipe which will trigger the callback function libevent_add_session_callback(). This callback function pops the first Session object off the queue of objects waiting for libevent processing and adds the Session object to one of two lists: 1) sessions_need_processing or 2) sessions_waiting_for_io. Which list the Session object is added to depends on the current state of the session. Once the libevent_add_session_callback() function completes, the adding of a new connection to the pool of threads is essentially complete. A session is chosen to be executed within the body of a thread runnning in the pool of threads. Each thread in the pool of threads is running with an outer loop that is defined in the libevent_thread_proc() method. Essentially, each thread in the pool of threads is running an infinite loop that examines the session_need_processing list. When the sessions_need_processing list becomes non-empty, a thread will pop the first Session object from that list and actually go ahead and process a query in that session.
The above description is not meant to be exhaustive. Actually reading through the pool of threads code is not that difficult and a grasp of what the code is doing can be easily obtained in a short period of time. I mostly wrote this for my own purposes so I had a better understanding of how it works.
While the thread pool code works well, it is not without issues. Mark Callaghan points out that when using the thread pool model in MySQL, every command sent to the server requires a pthread mutex lock/unlock pair on LOCK_event_loop. He has also logged a bug for this. Brian Aker responded to Mark's comments by saying that to get rid of this lock, you essentially need to write your own solution. This is much easier to attempt with Drizzle due to its plugin architecture that I mentioned at the beginning of this post. As he says "We have abstracted out this problem now so you can focus on solving this problem if you want". I believe that the idea is that people can write/tune thread schedulers for their own workload since a generic scheduler will not work well for every workload. With this approach, people can easily write a scheduler which is uniquely suited to their workload.
When it comes to the thread pool code, Brian also points out that the current design does not use libevent in the most optimal manner. He says "When it comes to pool of threads I think the current design misses the point of using libevent. Currently it does not yield on IO block, so in essence all it is doing it keeping you from overwhelming the operating system's scheduler and providing a completion for a given action. For small queries this is fine, but for longer running queries this is not very good (though... most queries we see are pretty short so this part is not a huge concern). It needs to be redesigned to make better use of IO, and this is something we will work on soon". Its interesting to see how memcached uses libevent as an example of seeing how another multi-threaded application uses libevent. Steven Grimm gives a brief outline in this thread of how he implemented thread support in memcached. I know Brian is currently working on a multi-threaded scheduler for Drizzle which is almost complete. He has mentioned in the past that there is a need to design a scheduler which really understands the difference between high/low and time constrained queries. I believe this is an interesting issue to think about.
In future posts, I hope to investigate the thread pool code more. In particular, I'd like to see how libevent could be used in a more optimal way and talk about some of the design considerations for a cost-based scheduler. Also, if I have time, I hope to write about how a query is processed in Drizzle i.e. what happens after connection handling.

blog comments powered by Disqus


07 March 2009