MEMORY engine and Handler% status variables via gdb

While working through my previous blog post, rows_examined UNION Handler% … ?!?, I learned about some unexpected behavior of the MEMORY (HEAP) storage engine and how it interacts with Handler_read_rnd_next. In particular, even a “deleted” row (one that never really made it into the table to begin with) still must be read over when scanning the table, thereby incrementing Handler_read_rnd_next when one might not expect that to happen. I thought this would be an interesting opportunity to use gdb to really watch what the server was doing when this situation is encountered. I’m far from being a gdb expert. In fact, I’ve probably only used it a dozen times or so. Nevertheless, I found it pretty straightforward to get things set up and to understand how to make my way around. (I’ve probably missed out on some shortcuts and some useful functionality here; hopefully I’m not leading people too far astray!) Fortunately, mysqld binaries are “not stripped” these days. That means that debugging symbols are included in the binary, which gives gdb the ability to associate the activities of a running program with the lines in the source code used to build it. Verify that this is the case using file, which should say “not stripped” at the end of its output:

$ file ./bin/mysqld
./bin/mysqld: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Because we’ll need (and want!) to look at the server source code while we do this, we need to download a copy of the source. It is crucial that the source code you have exactly matches the source code used to compile the version of the server you’re using. I’m using MariaDB for this testing, so I need to get the version of MariaDB source used to build the version of MariaDB I’m using. I happen to be using MariaDB 5.2.4, so I grabbed mariadb-5.2.4.tar.gz from http://downloads.askmonty.org/MariaDB/5.2.4/. It’s also possible to use bzr to grab a copy of the source, if you’re inclined to set that all up, or even to browse the source online on Launchpad. I know that I need revision #2894, because that’s the highest revision number in the ChangeLog for MariaDB 5.2.4. You can also see the “Tag” of mariadb-5.2.4 on this revision in the Launchpad revision history. Once you have obtained the source code that corresponds to the version of MySQL, MariaDB, or Percona server (SkySQL supports all of them!) that you’re running, fire up the server, find its PID, and attach gdb. Make sure nobody else is using the server while you’re doing this; I’m not taking any steps to narrow the scope of my gdb session to a particular thread. (On Ubuntu starting with 10.10, you either need to run gdb as root or you need to alter the setting of /proc/sys/kernel/yama/ptrace_scope.)

$ gdb -p $( cat ./data/ub-dev.pid )
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
[...snip...]
(gdb)

When you attach to a running process with gdb, it stops the process. You can then set breakpoints, print data structures and variables, et cetera. We have a mission, though, and that is to try to look at what happens when the MEMORY engine causes Handler_read_rnd_next to be incremented. In order to find any of this, it’s important to remember that the MEMORY was historically called HEAP, and it retains that name in the server source code. Let’s see what we can find out about Handler_read_rnd_next:

$ pwd
/home/kolbe/dev/src/mariadb-5.2.4
$ grep -r Handler_read_rnd_next * | grep -v mysql-test
sql/mysqld.cc:  {"Handler_read_rnd_next",    (char*) offsetof(STATUS_VAR, ha_read_rnd_next_count), SHOW_LONG_STATUS},

I eliminated results that match “mysql-test” because there are lots of tests that look at the value of Handler_read_rnd_next and those are of no interest to us. We see that Handler_read_rnd_next only shows up once in the entire source tree, and that it appears to act as a more human-friendly alias for a lower-level status counter called ha_read_rnd_next_count. The various storage engines are implemented by way of code that lies in their respective subdirectory of the storage subdirectory of the source tree. We are interested only in the MEMORY engine here, which is called “heap” internally (and historically). Let’s look for ha_read_rnd_next_count, but confine our search to the storage/heap directory. I’ll use some options to grep to get some context around any matches.

$ grep -n -A 10 -B 10 ha_read_rnd_next_count storage/heap/*
storage/heap/ha_heap.cc-338-  return error;
storage/heap/ha_heap.cc-339-}
storage/heap/ha_heap.cc-340-
storage/heap/ha_heap.cc-341-int ha_heap::rnd_init(bool scan)
storage/heap/ha_heap.cc-342-{
storage/heap/ha_heap.cc-343-  return scan ? heap_scan_init(file) : 0;
storage/heap/ha_heap.cc-344-}
storage/heap/ha_heap.cc-345-
storage/heap/ha_heap.cc-346-int ha_heap::rnd_next(uchar *buf)
storage/heap/ha_heap.cc-347-{
storage/heap/ha_heap.cc:348:  ha_statistic_increment(&SSV::ha_read_rnd_next_count);
storage/heap/ha_heap.cc-349-  int error=heap_scan(file, buf);
storage/heap/ha_heap.cc-350-  table->status=error ? STATUS_NOT_FOUND: 0;
storage/heap/ha_heap.cc-351-  return error;
storage/heap/ha_heap.cc-352-}
storage/heap/ha_heap.cc-353-
storage/heap/ha_heap.cc-354-int ha_heap::rnd_pos(uchar * buf, uchar *pos)
storage/heap/ha_heap.cc-355-{
storage/heap/ha_heap.cc-356-  int error;
storage/heap/ha_heap.cc-357-  HEAP_PTR heap_position;
storage/heap/ha_heap.cc-358-  ha_statistic_increment(&SSV::ha_read_rnd_count);

We can see that the ha_heap::rnd_next function is the only place the HEAP engine modifies the ha_read_rnd_next_count. This function is pretty straightforward. Let’s set a breakpoint right after heap_scan is called, so that we can see what’s inside the file variable after the new row has been read.

(gdb) break ha_heap.cc:350
Breakpoint 1 at 0x7d0458: file ha_heap.cc, line 350.

We’ll also cd inside of gdb to the location of the matching file so that we can look at the source code while we’re in the debugger.

(gdb) cd /home/kolbe/dev/src/mariadb-5.2.4/storage/heap
Working directory /home/kolbe/dev/src/mariadb-5.2.4/storage/heap.

Alright, back on the MySQL side, we need some sample data to work with. You’ll want to be using GNU screen or a separate terminal window or something in order to be able to easily look at both the MySQL client and your gdb session. Before we’ll be able to execute any statements in MySQL, we have to tell gdb to let the server continue running.

(gdb) continue
Continuing.

Now, some sample data and a SELECT to kick things off.

mariadb 5.2.4> create table h1 (id int) engine=memory;
Query OK, 0 rows affected (0.03 sec)

mariadb 5.2.4> insert into h1 values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mariadb 5.2.4> select * from h1;

You’ll notice that the SELECT statement doesn’t return. That’s because the server hit the breakpoint we set in gdb!

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;

Now, we can take a look around at some data structures to see what is going on. The ha_heap::rnd_next function had called heap_scan(file, buf), so let’s see what file and buf are.

(gdb) print buf
$1 = <optimized out>
(gdb) print file
$2 = (HP_INFO *) 0x435c4a0

Well, buf isn’t something we’ll be able to use, because the compiler decided it wasn’t actually going to bring that into scope here. file, on the other hand, is a pointer to a HP_INFO struct (see include/heap.h). We can dereference and print it right at the gdb command-line:

(gdb) set print pretty on
(gdb) print *file
$4 = {
  s = 0x432a490, 
  current_ptr = 0x438bd30 "37501", 
  current_hash_ptr = 0x0, 
  current_record = 0, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "321b", 
  recbuf = 0x435c768 "321b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '00', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

A couple things look useful right away, current_record and current_ptr especially. s also looks interesting (and it is!), and we can print it to see what it is, just like we did with file:

(gdb) print file->s
$5 = (HP_SHARE *) 0x432a490
(gdb) print *file->s
[...snip...]

It turns out file->s is a pretty large data structure, so I’m not going to include it here. You can browse the definition of HP_SHARE in include/heap.h. We won’t be looking into it during this exercise. We saw earlier that the return value of heap_scan was assigned to error:

(gdb) print error
$9 = 0

You don’t have to take my word for it, though; gdb will gladly print the source code that surrounds the current execution context:

(gdb) list
345
346     int ha_heap::rnd_next(uchar *buf)
347     {
348       ha_statistic_increment(&SSV::ha_read_rnd_next_count);
349       int error=heap_scan(file, buf);
350       table->status=error ? STATUS_NOT_FOUND: 0;
351       return error;
352     }
353
354     int ha_heap::rnd_pos(uchar * buf, uchar *pos)

We’ll keep an eye on error going forward. For now, though, let’s continue execution of the program.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;

It stops again right away. There are several rows in the table, and this function is called for every additional row the server reads. Let’s make sure error is still 0 and let’s see if anything has changed in file since we last looked at it:

(gdb) print error
$12 = 0
(gdb) print *file
$11 = {
  s = 0x432a490, 
  current_ptr = 0x438bd40 "37502", 
  current_hash_ptr = 0x0, 
  current_record = 1, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "321b", 
  recbuf = 0x435c768 "321b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '00', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

Two things jump out, current_ptr is now "37502" (it was "37501" before), and current_record is 1 (it was 0 before). Let’s keep going.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$13 = 0
(gdb) print *file
$14 = {
  s = 0x432a490, 
  current_ptr = 0x438bd50 "37503", 
  current_hash_ptr = 0x0, 
  current_record = 2, 
  next_block = 3, 
  lastinx = -1, 
  errkey = -1, 
  mode = 2, 
  opt_flag = 0, 
  update = 50, 
  lastkey = 0x435c768 "321b", 
  recbuf = 0x435c768 "321b", 
  last_find_flag = HA_READ_KEY_EXACT, 
  parents = {0x0 <repeats 65 times>}, 
  last_pos = 0x0, 
  lastkey_len = 0, 
  implicit_emptied = 0 '00', 
  lock = {
    owner = 0x43026d0, 
    next = 0x0, 
    prev = 0x432a5e0, 
    lock = 0x432a590, 
    cond = 0x0, 
    status_param = 0x0, 
    debug_print_param = 0x436ec00, 
    type = TL_READ, 
    priority = 0
  }, 
  open_list = {
    prev = 0x0, 
    next = 0x4373550, 
    data = 0x435c4a0
  }
}

OK, same as before. There was no error, current_ptr is now "37503" and current_record is 2.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$15 = 137

Hey, we have an error! What went wrong? Well, nothing. We got to the end of the table. Take a look at include/my_base.h:

389-#define HA_ERR_OLD_FILE             132     /* old databasfile */
390-#define HA_ERR_NO_ACTIVE_RECORD 133 /* No record read in update() */
391-#define HA_ERR_RECORD_DELETED       134     /* A record is not there */
392-#define HA_ERR_RECORD_FILE_FULL 135 /* No more room in file */
393-#define HA_ERR_INDEX_FILE_FULL      136     /* No more room in file */
394:#define HA_ERR_END_OF_FILE  137     /* end in next/prev/first/last */
395-#define HA_ERR_UNSUPPORTED  138     /* unsupported extension used */
396-#define HA_ERR_TO_BIG_ROW   139     /* Too big row */
397-#define HA_WRONG_CREATE_OPTION      140     /* Wrong create option */
398-#define HA_ERR_FOUND_DUPP_UNIQUE 141        /* Dupplicate unique on write */
399-#define HA_ERR_UNKNOWN_CHARSET       142    /* Can't open charset */

The next time we let the program continue, we get our result set back on the client.

(gdb) c
Continuing.
mariadb 5.2.4> select * from h1;
+------+
| id   |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (10 min 29.13 sec)

Our breakpoint was hit 4 times, one for each row in the table and one final time for EOF. But I set out trying to figure out what would happen in the case of a “deleted” row in the MEMORY table. Let’s test that out, now.

mariadb 5.2.4> delete from h1 where id=2;

Oops, that triggers our breakpoint! We can disable it like this:

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350
        breakpoint already hit 28 times
(gdb) disable breakpoint 1
(gdb) c
Continuing.

Back in the client, you should see that the row was deleted successfully:

mariadb 5.2.4> delete from h1 where id=2;
Query OK, 1 row affected (48.27 sec)

We want to SELECT from the table again, but our breakpoint is disabled. Instead of disabling it, we could have just kept hitting continue to keep gdb working through the file, or even done continue 4 to get all the way through it, since we know that we hit that breakpoint 4 times scanning through our 3-row table. We’ve lost our gdb command line now, though, and we won’t get it back until a breakpoint is hit or we manually interrupt the program we’re debugging. (The normal way to do that is to hit Ctrl-C, but for whatever reason that doesn’t seem to work most of the time when debugging mysqld.) Another technique is to send mysqld the TRAP signal (you may need to do it more than once!). Once we have the (gdb) command prompt back, we’ll re-enable our breakpoint.

$ kill -TRAP $( cat ./data/ub-dev.pid )
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7f8e7a7c6740 (LWP 26113)]
0x00007f8e78ec4913 in select () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) enable breakpoint 1
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007d0458 in ha_heap::rnd_next(unsigned char*) at ha_heap.cc:350
        breakpoint already hit 28 times
(gdb) c
Continuing.

Now, let’s SELECT from the table again and look at some of the data that was interesting the first time through.

mariadb 5.2.4> flush status;
Query OK, 0 rows affected (0.00 sec)

mariadb 5.2.4> select * from h1;
Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$16 = 0
(gdb) print file->current_ptr
$17 = (uchar *) 0x438bd30 "37501"
(gdb) print file->current_record
$18 = 0
(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$19 = 134
(gdb) print file->current_ptr
$20 = (uchar *) 0x438bd40 ""
(gdb) print file->current_record
$21 = 1

The first row came through as we expected, but the second resulted in an error. The snippet of include/ha_base.h I included above shows that 134 corresponds to HA_ERR_RECORD_DELETED. There it is, our deleted record (id=2), just where we would’ve expected to find it.

(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$22 = 0
(gdb) print file->current_ptr
$23 = (uchar *) 0x438bd50 "37503"
(gdb) print file->current_record
$24 = 2
(gdb) c
Continuing.

Breakpoint 1, ha_heap::rnd_next (this=0x436f4e8, buf=<optimized out>) at ha_heap.cc:350
350       table->status=error ? STATUS_NOT_FOUND: 0;
(gdb) print error
$25 = 137
(gdb) print file->current_record
$27 = 3
(gdb) c
Continuing.

We can see that there are still 4 calls to ha_heap::rnd_next (each of which increments Handler_read_rnd_next), even though only 2 rows are returned to the client.

mariadb 5.2.4> select * from h1;
+------+
| id   |
+------+
|    1 |
|    3 |
+------+
2 rows in set (4 min 4.66 sec)

mariadb 5.2.4> show status like 'Handler_read_rnd_next%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_rnd_next | 4     |
+-----------------------+-------+
1 row in set (0.00 sec)

SHOW STATUS hits our breakpoint, too, which means SHOW STATUS is creating a temporary MEMORY table! Again, you’ll either need to continue through the breakpoint or disable it. So, there it is. We just used gdb to look at the activities of a running MariaDB server (though as I said, this will apply equally to MySQL and Percona servers) as it retrieved rows from a (non-indexed) MEMORY table. I was actually surprised how straightforward it was to do this. I look forward to spending more time tracing things like this to gain a better understanding of the server source, and I hope that this might encourage others to do the same!