2012年12月27日 星期四

Getting started with MySQL Proxy


The launch of MySQL Proxy has caused quite a commotion in the community. And with reason. For feature hungry people, this is undeniably the most exciting addition to MySQL set of tools.
If the last statement has left you baffled, because you don't see the added value, don't worry. This article aims at giving you the feeling of what the Proxy can do.
Get ready for a wonderful trip to Proxyland.

MySQL Proxy overview

mysql-proxy is a lightweight binary application standing between one or more MySQL clients and a server. The clients connect to the proxy with the usual credentials, instead of connecting to the server. The proxy acts as man-in-the-middle between client and server.
In its basic form, the proxy is just a redirector. It gets an empty bucket from the client (a query), takes it to the server, fills the bucket with data, and passes it back to the client.
If that was all, the proxy would just be useless overhead. There is a little more I haven't told you yet. The proxy ships with an embedded Lua interpreter. Using Lua, you can define what to do with a query or a result set before the proxy passes them along.
MySQL-Proxy overview
Figure 1. MySQL Proxy can modify queries and results.
The power of the proxy is all in its flexibility, as allowed by the Lua engine.
You can intercept the query before it goes to the server, and do everything conceivable with it:
  • pass it along unchanged (default)
  • fix spelling mistakes (ever written CRATE DATAABSE?)
  • filter it out, i.e. remove it altogether
  • rewrite the query according to some policy (enforcing strong passwords, forbidding empty ones)
  • add forgotten statements (autocommit is enabled and the user sent a BEGIN WORK? You can inject a SET AUTOCOMMIT = 0 before that)
  • Much more. If you can think of it, probably it's already possible. If it isn't, blog about it: chances are that someone will make it happen.
In the same way, you can intercept the result set. Thus you can:
  • remove, modify, or add records to the result. (Want to mask passwords, or hide information from unauthorized prying eyes?)
  • make your own result sets, including column names. For example, if you allow the user to enter a new SQL command, you can build the result set to show what was requested;
  • ignore result sets, i.e. don't send them back to the client.
  • Want to do more? It could be possible. Look at the examples and start experimenting!

Key concepts

MySQL Proxy is built with an object oriented infrastructure. The main class exposes three member functions to the public. You can override them in a Lua script to modify the proxy behavior.
  • connect_server(). Called at connection time, you can work inside this function to change connection parameters. It can be used to provide load balancing.
  • read_query(packet). This function is called before sending the query to the server. You can intervene here to change the original query or to inject more to the queue. You can also decide to skip the backend server altogether and send back to the client the result you want. (E.g. given a SELECT * FROM big_table you may answer back "big_table has 20 million records. Did you forget the WHEREclause?")
  • read_query_result(injection_packet). This function is called before sending back the result in answer for an injected query. You can do something here to decide what to do with the result set (ignore, modify, send it unchanged).
By combining these three back doors to the server you can achieve a high degree of maneuverability over the server.

Installation

Installing the proxy is quite easy. The distribution package contains just one binary (and as of 0.5.1, also some sample lua scripts). You can unpack that and copy it where you like.
For some operating system it's even easier, because there are RPM packages that will take care of everything.
If your operating system is not included in the distribution, or if you want to try the bleeding edge features as soon as they leave the factory, you may get the source from the public Subversion tree and then build the proxy yourself.
It should need just a few basic actions

 ./autogen.sh
 ./configure && make
 sudo make install 
 # will copy the executable to /usr/local/sbin

Simple query interception

As our first example, let's do a "I was there" kind of action, just to give you the feeling that you are standing where you want to be.
  1. Create a Lua file, named first_example.lua, containing the lines listed below
  2. Assuming that your database server is on the same box, launch the proxy server
  3. From a separate console, connect to the proxy server, which is like connecting to the normal server, with the difference that you will use port 4040 instead of 3306

 -- first_example.lua 
 function read_query(packet)
   if string.byte(packet) == proxy.COM_QUERY then
     print("Hello world! Seen the query: " .. string.sub(packet, 2))
   end
 end
# starting the proxy
$ mysql-proxy --proxy-lua-script=first_example.lua 
# from another console, accessing the proxy
$ mysql -u USERNAME -pPASSWORD -h 127.0.0.1 -P 4040 -e 'SHOW TABLES FROM test'
If you come back to the previous terminal window, you will see that the proxy has intercepted something for you.

Hello world! Seen the query: select @@version_comment limit 1
Hello world! Seen the query: SHOW TABLES FROM test
The first query was sent on connection by the mysql client. The second one is the one you sent.
As you can see, you are able to get in the middle, and make the proxy do something for you. For now, this something is very minimum, but we're going to see more interesting stuff in the next paragraphs.

Note on usage

Until version 0.5.0, to use a Lua script you need also to use the option --proxy-profiling, or else the read_query and read_query_result functions don't kick in.
Starting from version 0.5.1, this option is no longer necessary. The above mentioned functions are activated by default. Instead, a new option was introduced, to skip their usage. If you are using the proxy only for load balancing, you should now specify --proxy-skip-profiling.

Query rewriting

The more interesting stuff starts with query rewriting. To demonstrate this feature, let's choose a practical task. We want to catch queries with a common typing error and replace it with the correct keyword. We will look for my most frequent finger twists SLECT and CRATE.
Here is second_example.lua

 function read_query( packet )
   if string.byte(packet) == proxy.COM_QUERY then
     local query = string.sub(packet, 2)
     print ("received " .. query)
     local replacing = false
     -- matches "CRATE" as first word of the query
     if string.match(string.upper(query), '^%s*CRATE') then
         query = string.gsub(query,'^%s*%w+', 'CREATE')
         replacing = true
     -- matches "SLECT" as first word of the query
     elseif string.match(string.upper(query), '^%s*SLECT') then
         query = string.gsub(query,'^%s*%w+', 'SELECT')
         replacing = true
     end
     if (replacing) then
         print("replaced with " .. query )
         proxy.queries:append(1, string.char(proxy.COM_QUERY) .. query )
         return proxy.PROXY_SEND_QUERY
     end
   end
 end
As before, start the server with the option --proxy-lua-script=second_example.lua
and connect to it from a mysql client

 $ mysql -u USERNAME -pPASSWORD -h 127.0.0.1 -P 4040 
 Welcome to the MySQL monitor.  Commands end with ; or \g.
 Your MySQL connection id is 48
 Server version: 5.0.37-log MySQL Community Server (GPL)

 Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

 mysql> use test
 Database changed
 mysql> CRATE TABLE t1 (id int);         # Notice: TYPO!
 Query OK, 0 rows affected (0.04 sec)

 mysql> INSERT INTO t1 VALUES (1), (2);
 Query OK, 2 rows affected (0.01 sec)
 Records: 2  Duplicates: 0  Warnings: 0

 mysql> SLECT * FROM t1;                 # Notice: TYPO!
 +------+
 | id   |
 +------+
 |    1 | 
 |    2 | 
 +------+
 2 rows in set (0.00 sec)
Isn't it sweet? I made my usual mistakes, but the proxy was kind enough to fix them for me.
Let's look at what was reported

 received select @@version_comment limit 1
 received SELECT DATABASE()
 received CRATE TABLE t1 (id int)
 replaced with CREATE TABLE t1 (id int)
 received INSERT INTO t1 VALUES (1), (2)
 received SLECT * FROM t1
 replaced with SELECT * FROM t1
The first two queries are stuff the client needs for its purpose. Then came my first mistake, CRATE, which was graciously changed to CREATE, and in the end it received SLECT, and turned it into SELECT.
This script is quite crude, but it gives you an idea of the possibilities.

Query injection

Next, let's exploit one of the ad hoc features of MySQL Proxy. Query injection.
It's a unique ability of this tool. When required, it can create a queue of queries, and send them to the server, after assigning to each query an ID code.
Query injection
Figure 2. Query injection.
In the image, the server receives three queries, and of course it sends back three result sets. When an injection has taken place, the result set gets processed by another function, read_query_result, where you can deal with the result sets according to their ID. In the example, for ID 2 and 3 you just get something from SHOW STATUS and by comparing their values you can measure the impact of the main query on the server. Since you use the SHOW STATUS values only for internal calculation, you don't send that result set to the client (which is just as good, since the client is not expecting it), but you discard it.
processing the injected queries
Figure 3. Processing the injected queries.
The result set of the query sent by the client is duly returned. It's transparent for the client, but in between you managed to collect statistical results which are displayed on the proxy console.
For a full example, see the query injection tutorial in the Forge.

Macros

Macros are just another way of using the query rewriting facility.
It's one of the most striking usages of the Proxy. You can rewrite the SQL language, or make it closer to your tastes.
For instance, many people who use the mysql command line client type cd and ls instead of use and show tables. With MySQL Proxy, they can use cd and ls and get the expected result. This juicy example of macro creation and usage is available in an early blog post. Rather than repeating all of it here, I invite you to look at Your first macros with MySQL Proxy.

Creating result sets - shell commands from MySQL

The proxy receives a request from a client, and then it has to give back a result set. Most of the times, this is straightforward. Passing the query to the server, getting the result set, passing the result set to the client. But what happens if we need to return something that the server is not able to provide? Then we need to build a result set, which is composed of a set of column names, and a bi-dimensional array with the data.

Dataset creation basics

For example, if I wanted to return a warning about a deprecated feature, I could create a result set like this:

 proxy.response.resultset = {
     fields = {
         { 
            type = proxy.MYSQL_TYPE_STRING, 
            name = "deprecated feature", 
         },
         { 
            type = proxy.MYSQL_TYPE_STRING, 
            name = "suggested replacement", 
         },
     },
     rows = {
          { 
             "SHOW DATABASES", 
             "SHOW SCHEMAS" 
          }
     }
 }
 -- and then, send it to the client
 return proxy.PROXY_SEND_RESULT
The above structure, when received by the client, would be shown as

+---------------------+-----------------------+
| deprecated feature  | suggested replacement |
+---------------------+-----------------------+
| SHOW DATABASES      | SHOW SCHEMAS          |
+---------------------+-----------------------+
That's to say that you can fabricate every result set that meet your needs. For more details, see Jan Kneschke example.

Shell commands from MySQL client

And now for something completely different, let's see how to use our freshly acquired knowledge to execute shell commands through the proxy. We said already that the proxy behavior can be altered with Lua scripts. And Lua is a complete language, meaning that you can do almost everything with it, including executing shell commands.
Combine this knowledge with the ability of creating data sets, and we come up with the idea of asking for shell commands from a MySQL client and having the proxy return their results as if they were normal database records.
Running shell commands through the Proxy
Figure 4. Running shell commands through the Proxy.
Let's step though it, using the tutorial from MySQL Forge.
The shell tutorial script implements a simple syntax to ask for shell commands:

 SHELL command
for example,

 SHELL ls -lh /usr/local/mysql/data
  1. get the shell tutorial script. Save it as shell.lua;
  2. launch the proxy
  3. connect to the proxy

$ /usr/local/sbin/mysql-proxy --proxy-lua-script=shell.lua 

# from a different console
$ mysql -U USERNAME -pPASSWORD -h 127.0.0.1 -P 4040
Make sure that it works as a normal proxy to the database server.

 Welcome to the MySQL monitor.  Commands end with ; or \g.
 Your MySQL connection id is 49
 Server version: 5.0.37-log MySQL Community Server (GPL)

 Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

 mysql> use test
 Database changed
 mysql> show tables;
 +----------------+
 | Tables_in_test |
 +----------------+
 | t1             |
 +----------------+
 1 row in set (0.00 sec)

 mysql> select * from t1;
 +------+
 | id   |
 +------+
 |    1 |
 |    2 |
 +------+
 2 rows in set (0.00 sec)
Good. The normal operations work as expected. Now we test the enhanced features.

 mysql> shell df -h;
 +--------------------------------------------------------+
 | df -h                                                  |
 +--------------------------------------------------------+
 | Filesystem            Size  Used Avail Use% Mounted on |
 | /dev/md1               15G  3.9G  9.7G  29% /          |
 | /dev/md4              452G  116G  313G  27% /app       |
 | tmpfs                 1.7G     0  1.7G   0% /dev/shm   |
 | /dev/md3              253G  159G   82G  67% /home      |
 | /dev/md0               15G  710M   13G   6% /var       |
 +--------------------------------------------------------+
 6 rows in set (0.00 sec)
Hello shell! This is really a treat for advanced users. Once you have a way of accessing external commands, you can become quite creative.

 mysql> shell grep key_buffer /usr/local/mysql/my.cnf;
 +-----------------------------------------+
 | grep key_buffer /usr/local/mysql/my.cnf |
 +-----------------------------------------+
 | key_buffer=2000M                        |
 +-----------------------------------------+
 1 row in set (0.00 sec)
I know that I could check the same with SHOW VARIABLES, but since this is a value that can be set online, I just wanted to make sure that it was also in the configuration file.
And how is our memory situation?

 mysql> shell free -m;
 +---------------------------------------------------------------------------+
 | free -m                                                                   |
 +---------------------------------------------------------------------------+
 |              total       used       free     shared    buffers     cached |
 | Mem:          3280       1720       1560          0          9       1006 |
 | -/+ buffers/cache:        704       2575                                  |
 | Swap:         8189          2       8186                                  |
 +---------------------------------------------------------------------------+
 4 rows in set (0.08 sec)
That's not bad. Now that we are content with the status of the server, what about some leisure?
We could, for example, check the last entries on Planet MySQL.
Do you think I am babbling? Not at all. The command is quite long, but it works.

   wget -q -O - http://www.planetmysql.org/rss20.xml  \
      | perl -nle 'print $1 if m{<title>(.*)</title>}' \
      |head -n 21 | tail -n 20;
However, because the listing is so large, and nobody will remember that anyway, you should paste it into a shell script, and call it, for instance, last_planet.sh. And, here you are!

 mysql> shell last_planet.sh;
 +-------------------------------------------------------------------------------------+
 | last_planet.sh                                                                      |
 +-------------------------------------------------------------------------------------+
 | Top 5 Wishes for MySQL                                                              |
 | Open Source ETL tools.                                                              |
 | MySQL Congratulates FSF on GPLv3                                                    |
 | Query cache is slow to the point of being unusable - what is being done about that. |
 | About 'semi-unicode' And 'quasi Moon Stone'                                         |
 | My top 5 MySQL wishes                                                               |
 | Four more open source startups to watch                                             |
 | More on queue... Possible Solution...                                               |
 | MySQL as universal server                                                           |
 | MySQL Proxy. Playing with the tutorials                                             |
 | Open source @ Oracle: Mike Olson speaks                                             |
 | Quick musing on the &quot;Queue&quot; engine.                                       |
 | Distributed business organization                                                   |
 | Ideas for a MySQL queuing storage engine                                            |
 | MySQL Test Creation Tool Design Change                                              |
 | Queue Engine, and why this won' likely happen...                                    |
 | What?s your disk I/O thoughtput?                                                    |
 | My 5+ Wish List?                                                                    |
 | Top 5 best MySql practices                                                          |
 | Packaging and Installing the MySQL Proxy with RPM                                   |
 +-------------------------------------------------------------------------------------+
 20 rows in set (1.48 sec)
Shell access, and web contents from MySQL client! wow!

A word of caution

Having shown that you can access the shell from a MySQL connection does not imply automatically that you should always do it. Shell access is a security vulnerability, and if you want to use this feature in your server, do it for internal purposes only. Do not allow shell access to applications open to normal users. That would be asking for trouble (and finding it really fast).
You can use the shell to view things, but you could also use it to erase items.

 mysql> shell ls *.lua*;
 +---------------------+
 | ls *.lua*           |
 +---------------------+
 | first_example.lua   |
 | first_example.lua~  |
 | second_example.lua  |
 | second_example.lua~ |
 +---------------------+
 4 rows in set (0.03 sec)

 mysql> shell rm *~;
 Empty set (0.00 sec)

 mysql> shell ls *.lua*;
 +--------------------+
 | ls *.lua*          |
 +--------------------+
 | first_example.lua  |
 | second_example.lua |
 +--------------------+
 2 rows in set (0.01 sec)
Be very careful with shell access!
Be aware that the shell access you get through the Proxy is referred to the host where the Proxy is running. If you install the Proxy on the same host, it will coincide with the database server, but don't take it for granted.

Customized logging

I left this example for the end because, in my experience, this is the most interesting one and it has a practical, immediate use. Logs on demand are available in MySQL 5.1. But if you are stuck with MySQL 5.0, then the proxy can give you a hand.

Simple logging

To enable logging of queries into something that looks like a general log, the task is easy. Write this small portion of code into a simple_logs.lua file (or download the snippet from MySQL Forge).

 local log_file = 'mysql.log'
 local fh = io.open(log_file, "a+")

 function read_query( packet )
   if string.byte(packet) == proxy.COM_QUERY then
     local query = string.sub(packet, 2)
     fh:write( string.format("%s %6d -- %s \n", 
         os.date('%Y-%m-%d %H:%M:%S'), 
         proxy.connection["thread_id"], 
         query)) 
     fh:flush()
   end
 end
Then start the proxy with it, and connect to the proxy from some concurrent sessions.
This script will log all queries to a text file named mysql.log. After a few sessions, the log file would look like this:

 2007-06-29 11:04:28     50 -- select @@version_comment limit 1 
 2007-06-29 11:04:31     50 -- SELECT DATABASE() 
 2007-06-29 11:04:35     51 -- select @@version_comment limit 1 
 2007-06-29 11:04:42     51 -- select USER() 
 2007-06-29 11:05:03     51 -- SELECT DATABASE() 
 2007-06-29 11:05:08     50 -- show tables 
 2007-06-29 11:05:22     50 -- select * from t1 
 2007-06-29 11:05:30     51 -- show databases 
 2007-06-29 11:05:30     51 -- show tables 
 2007-06-29 11:05:33     52 -- select count(*) from user 
 2007-06-29 11:05:39     51 -- select count(*) from columns 
The log contains date, time, connection ID, and query. Simple and effective for such a short script.
Notice that there are three sessions, and their commands are not sorted by session, but by the time they were executed.
The pleasant aspect is that you don't need to restart the server to activate the general log. All you need to do is to point your applications to the port 4040 instead of 3306, and you have enabled a simple but functional logging.
Come to think of it, you don't need to modify or restart your applications either. You can achieve the same result without touching server or applications. Simply start the proxy on the same box where the server is located, and activate an iptables rule to redirect traffic from port 3306 to 4040 (Courtesy of Patrizio Tassone)

sudo iptables -t nat -I PREROUTING \
   -s ! 127.0.0.1 -p tcp \
   --dport 3306 -j \
   REDIRECT --to-ports 4040
redirecting traffic
Figure 5. Redirecting traffic from port 3306 to 4040.
Now you have logging enabled, and you don't have to restart the server or to touch your applications!
When you are done, and you don't need logs anymore, remove the rule (-D instead of -I) and kill the proxy.

sudo iptables -t nat -D PREROUTING \
   -s ! 127.0.0.1 -p tcp \
   --dport 3306 -j \
   REDIRECT --to-ports 4040

More customized logging

The simple and effective logging script from the previous section is tempting, but it's really basic. We have had a glimpse at the Proxy internals, and we have seen that we can get better information, and these logs can be much more interesting than a bare list of queries.
For example, we would like to report if a query was successful or rejected as syntax error, how many rows were retrieved, how many rows were affected.
We know all the elements to reach this goal. The script will be a bit longer, but not much.

 -- logs.lua
 assert(proxy.PROXY_VERSION >= 0x00600,
  "you need at least mysql-proxy 0.6.0 to run this module")

 local log_file = os.getenv("PROXY_LOG_FILE")
 if (log_file == nil) then
   log_file = "mysql.log"
 end

 local fh = io.open(log_file, "a+")
 local query = "";
In the global part of the script, we check that we're using an appropriate version of the Proxy, since we are using features that are not available in version 0.5.0.
Then we set the file name, taking it from a environment variable, or assigning the default value.

 function read_query( packet )
   if string.byte(packet) == proxy.COM_QUERY then
     query = string.sub(packet, 2)
     proxy.queries:append(1, packet )
     return proxy.PROXY_SEND_QUERY
   else
       query = ""
   end
 end
The first function does little work. It appends the query to the proxy queue, so that the next function will be triggered when the result is ready.

 function read_query_result (inj)
   local row_count = 0
   local res = assert(inj.resultset)
   local num_cols = string.byte(res.raw, 1)
   if num_cols > 0 and num_cols < 255 then
     for row in inj.resultset.rows do
       row_count = row_count + 1
     end
   end
   local error_status =""
   if res.query_status and (res.query_status < 0 ) then
       error_status = "[ERR]"
   end
   if (res.affected_rows) then
       row_count = res.affected_rows
   end
   --
   -- write the query, adding the number of retrieved rows
   --
   fh:write( string.format("%s %6d -- %s {%d} %s\n", 
     os.date('%Y-%m-%d %H:%M:%S'), 
     proxy.connection["thread_id"], 
     query, 
     row_count,
     error_status))
   fh:flush()
 end
In this function we can check if we are dealing with a data manipulation query or a select query. If there are rows, the function counts them, and the result is printed in braces to the log file. If there are affected rows, then this is the number that is reported. We also check if there was an error, in which case the information is returned in brackets, and finally all gets written to the log file. Here is an example:

 2007-06-29 16:41:10     33 -- show databases {5} 
 2007-06-29 16:41:10     33 -- show tables {2} 
 2007-06-29 16:41:12     33 -- Xhow tables {0} [ERR]
 2007-06-29 16:44:27     34 -- select * from t1 {6} 
 2007-06-29 16:44:50     34 -- update t1 set id = id * 100 where c = 'a' {2} 
 2007-06-29 16:45:53     34 -- insert into t1 values (10,'aa') {1} 
 2007-06-29 16:46:07     34 -- insert into t1 values (20,'aa'),(30,'bb') {2} 
 2007-06-29 16:46:22     34 -- delete from t1 {9}
The first, second, and fourth line says that the queries returned respectively 5, 2, and 6 rows. The third one says that the query returned an error. The fifth row reports that 2 rows were affected by the UPDATEcommand. The following lines all report the number of affected rows for INSERT and DELETE statements.

Note on the examples

The examples provided with this article have been tested with a few different operating systems. The code is still in alpha stage, though, so it may happen that data structures, options, and interfaces change, until the feature set is stabilized.

What's next

At the end of this long excursus, I feel that I have barely scratched the surface. MySQL Proxy is this, and much more. There are features that I have not touched, and that should require appropriate coverage, with some benchmarking. Also, I did not get into much detail with the architecture. Somebody will cover that as well.
Expect more articles about MySQL Proxy, covering load balancing, replication specific features, benchmarks, and especially a MySQL Proxy cookbook, as soon as the community gathers enough recipes to justify the title.
As a last item for this article, I would like to say THANKS, Jan Kneschke, for creating MySQL Proxy!

沒有留言:

張貼留言

歡迎熱愛 Puzzle and Dragons 的玩家一起上來討論及研究各種降臨打法。

進擊的 Puzzle and Dragons Facebook 專頁現已開幕 ~ 歡迎大家上去追查各種新舊貼。 Enjoy your Puzzle and Dragons