Comments - Migrating from 'innodb' checksum to 'crc32' checksum

11 months, 1 week ago Marko Mäkelä

As you can read in MDEV-25105, no action is needed when you are migrating to the MariaDB 10.2 default innodb_checksum_algorithm=crc32 or the MariaDB 10.5 default (MDEV-19534) innodb_checksum_algorithm=full_crc32 that was introduced in MariaDB 10.4.

The full_crc32 checksum was introduced, because the old approach to checksums was insecure. In the old design, any page of a data file can use any of the previously supported checksum algorithms. The older checksums were extremely slow to compute and insecure; as a work-around someone got the idea to make it possible to disable page checksums. The crc32 checksum is a little better if you ignore MDEV-17958, but it does not cover the entire data page and it is being computed as an insecure way (an exclusive OR of several CRC-32C checksums).

Before MariaDB 10.4, the most secure option was to set innodb_checksum_algorithm=strict_crc32 so that only the crc32 checksum is allowed on any data pages. A corrupted page could accidentally carry a checksum that is valid when using one of the other algorithms.

The full_crc32 checksum format is identified by a flag in the first page of the data file. If a file was created in this format, all its data pages must always use the same checksum, no matter what innodb_checksum_algorithm was later set to. Another guarantee is that all unused parts of data pages will be filled with NUL bytes.

Data files that were created before MySQL 5.1.48 or MariaDB 5.1.48 could contain arbitrary garbage in unused parts of data pages. Even though we try to keep this in mind, sometimes bugs like MDEV-27800 happen. The only way to convert files to the secure innodb_checksum_algorithm=full_crc32 format is to rebuild the tables. I believe that the system tablespace can only be converted by dumping and reloading the entire database.

It is not feasible to implement logic in a tool like innochecksum to try to determine which parts of each page might be unused and would need to be zeroed out, because this cannot always be determined by looking at an individual data page in isolation from others.

 
11 months, 1 week ago Keshan Nageswaran

Thank you for the detailed explanation.

The issue that I am facing right now. I am trying to upgrade MariaDB from 10.4.10 to 10.11.4. The checksum algorithm used in 10.4.10 is innodb. The checksum we are trying to use in 10.11.4 is full_crc32. We take physical backups in 10.4.10. The restore process includes innochecksum utility to compare checksums generated with stored checksums to identify any damages to data pages.

When I try restore of the backup taken in 10.4.10 with innodb checksum to MariaDB 10.11.4 with full_crc32 checksum. Checksum comparison fails for each innodb tables in my database at the first page complaining as following;

{
Fail: page::0 invalid
Exceeded the maximum allowed checksum mismatch count::1 current::0
Error: Page 0 checksum mismatch, can't continue.
}

As mentioned previously we use innochecksum utility to make this comparison. The initial understanding is that innochecksum utility has removed deprecated checksum algorithms from use (innnodb, none). Hence, it will not be supported. MDEV-25105

Then we decided to migrate the checksum algorithm to a algorithm(crc32) supported in innochecksum utility and MariaDB 10.4.10 (full_crc32 was not considered as it was not introduced in 10.4.10 version), then, take physical backups. Hence, the physical backups taken would be compatible with MariaDB 10.11.4 if we use crc32/full_crc32 algorithm.

My followup questions:

  • how can I rebuild all innodb tables during migration of innodb_checksum_algorithm configuration to have all blocks updated and not wait until it gets updated over time?
  • Ideal scenario that I would like to achieve is to use 10.4.10 backups in 10.11.4 with the migrated checksum algorithm (innodb->full_crc32). How can I achieve it?
  • I had the same understanding as you mentioned that no change required. What could be the reason for the failure during checksum validation at first page?
  • Is it because mismatch of checksum algorithms used (backup has innodb and validating server has crc32/full_crc32 assigned in configuration) ?
  • In that case, what is recommended? How we can check for integrity of data during restore?
  • can write option specified in innochecksum utility be used to rewrite the new checksum algorithm for all .ibd tables if there is no way to rebuild all blocks at once?
 
11 months, 1 week ago Marko Mäkelä

I realize that it might be helpful to add back the innodb and none checksum algorithms to the innochecksum utility. This would require some code refactoring, duplicating some InnoDB source code instead of compiling that code with special options for inclusion in <<code>innochecksum<</code>>.

On the other hand, you could use the innochecksum utility from an older release to verify that the page checksums are valid and to convert them to the crc32 checksum. If you do not care about the verification step, then you should be able to blindly rewrite the page checksums to the crc32 format.

Still, I think that the best solution would be to rebuild all InnoDB tables, simply by OPTIMIZE TABLE. To see which tables are not in the full_crc32 format yet, try the following trick from MDEV-32230: SELECT name FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES WHERE row_format IS NULL;

I am afraid that the only way to convert the InnoDB system tablespace to the full_crc32 format would be to rebuild the entire database from a logical dump. The next best thing that you could do is to use innochecksum to convert each page to the crc32 checksum, and then start the server with innodb_checksum_algorithm=strict_full_crc32 so that new files will be created in the full_crc32 format, and for any old-format files only the crc32 page checksums will be allowed.

 
11 months, 1 week ago Keshan Nageswaran

Thank you for this Marko. The solutions provided are of real help for us. One last question, does the innodchecksum utility support full_crc32 validation?

"and then start the server with innodb_checksum_algorithm=strict_full_crc32 so that new files will be created in the full_crc32 format, and for any old-format files only the crc32 page checksums will be allowed."

In my understanding from the above statement, I believe we would end up having mix of crc32(old pages) and full_crc32(new pages), will this cause any issues during checksum validation with innodchecksum utility?

 
11 months, 1 week ago Ian Gilfillan

No, innochecksum does not fully support full_crc32 - see MDEV-26907. If innochecksum is a required part of your workflow, you could stick with crc32.

 
11 months, 1 week ago Keshan Nageswaran

Noted and Thank you Ian!

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.