Dealing with ROWIDs When Migrating to MariaDB

spacer

ROWID is an Oracle proprietary feature and all Oracle DBAs use them at times, and you also often see them used in applications. ROWID is often misunderstood and misused. The first half of this blog aims to explain what ROWIDs are and how they work, and the second half discusses – importantly – some ideas on how to migrate applications that rely on ROWID from Oracle to MariaDB.

What is a ROWID?

A ROWID is three things in Oracle:

  1. It is a value that references a particular row.
  2. It is a data type.
  3. And, it is a hidden column of every table. Yes, every table has a ROWID column, but it is hidden and will only show in a SELECT statement if requested.

ROWID values

A ROWID value is actually a reference to the physical location of a row in the Oracle database, which means that if a row has to be moved around for some reason, the ROWID will change. Current implementations of Oracle maintain a ROWID that is 10 bytes long, but you usually see the ASCII representation of it, which is 18 characters.

ROWID attributes

Let me explain a few important attributes of ROWIDs:

  • Oracle has supported ROWIDs basically always.(that’s not an attribute?)
  • The format of a ROWID has changed a bit over time, but not much.
  • The ROWID is physical and identifies the OBJECT of the row (which in the cases we look at here is always a table), the TABLESPACE of the row, the BLOCK where the row is stored and the ROW NUMBER inside the block.(Looking at the format, we might assume that ROWIDs are unique not only for a table but across tables. Dangerously, however, this assumption is wrong!)

Dissecting a ROWID

This section really isn’t strictly necessary, and if you don’t think that knowing the internals of a ROWID is going to make you popular at parties, you might want to skip it. If you’re bold enough to dive into the weeds, then here we go! A ROWID consists of 10 bytes, but it is usually displayed as an 18 characters long Base-64 encoded string. However, please note that I am only discussing old-style proper ROWIDs here, not UROWIDs (discussed later in this blog), and I am also discussing ROWIDs the way they look now…which has been the same since Oracle 8i.

ROWID Example

Let us have a look at an example

SQL> SELECT rowid FROM customer WHERE cust_id = 1;
ROWID
------------------
AAAm0HAAEAAAlMEAAA

The first 6 characters, AAAm0H, tells us which table this is, but as I am not going to bore you with how Base64-encoding works (which, trust me, is really not a good conversational ice breaker at parties), I will instead use a set of functions that Oracle provides as part of the ROWID package. Let’s extract the object from this ROWID and see if it matches the table we are working on:

SQL> SELECT DBMS_ROWID.ROWID_OBJECT('AAAm0HAAEAAAlMEAAA') object FROM dual;
OBJECT
----------
158983
SQL> SELECT object_name FROM user_objects WHERE data_object_id = 158983;
OBJECT_NAME
--------------------------------------------------------------------------------
CUSTOMER

That looks right. Let us now figure out in which file this row is stored:

SQL> SELECT DBMS_ROWID.ROWID_RELATIVE_FNO('AAAm0HAAEAAAlMEAAA') object FROM dual;
OBJECT
----------
4
SQL> SELECT file_name FROM dba_data_files ddf JOIN user_tables ut
ON ddf.tablespace_name = ut.tablespace_name
WHERE ut.table_name = 'CUSTOMER'
AND ddf.relative_fno = 4;
FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/XE/users.dbf

The last two fields of the ROWID are more difficult to illustrate, but I trust you get the point by now.

ROWID for Clustered Tables

In Oracle, a table cluster is a set of tables that share a common key, called a cluster key. The syntax is fairly self-explanatory:

SQL> CREATE CLUSTER cust_clu(cust_id NUMBER);
SQL> CREATE INDEX cust_ix ON CLUSTER cust_clu;
SQL> CREATE TABLE customer(cust_id NUMBER NOT NULL PRIMARY KEY,
cust_name VARCHAR(50)) CLUSTER cust_clu(cust_id);
SQL> CREATE TABLE orders(order_id INT NOT NULL PRIMARY KEY, cust_id INT NOT NULL)
CLUSTER cust_clu(cust_id);

Here we created a cluster and then two tables in that cluster. In a table cluster like this, the physical position of the cluster key is the ROWID, so both tables will share the same ROWIDs. Like this:

SQL> INSERT INTO customer VALUES(1, 'Joe Bloggs');
SQL> INSERT INTO orders VALUES(1, 1);
SQL> SELECT rowid FROM customer;
ROWID
------------------
AAAmz1AAEAAAlK0AAB
SQL> SELECT rowid FROM orders;
ROWID
------------------
AAAmz1AAEAAAlK0AAB

As you can see, the two rows in the two tables have the same ROWID because they share the same cluster key.

ROWID for Index Organized Tables

Mention IOT to an Oracle DBA and he/she will not think “Internet of Things”, he/she will think Index Organized Tables. IOT in Oracle are tables that are pre-sorted according to the primary key. In a normal table, indexes and data are stored separately, with indexes being sorted and referencing data that is stored elsewhere. With IOT, a table is like a big index in-and-of-itself, where all columns are included and all rows are sorted according to the said primary key.

What does this mean? Well, it basically means that the physical location of a row may change as said rows need to be sorted somewhere else, typically due to an UPDATE being processed that affects one of the primary key columns. Let’s look at an example. First, let’s create an IOT (a table, not an “Internet of Things”) and put some data in it:

SQL> CREATE TABLE order_item(order_id NUMBER NOT NULL,
item_id NUMBER NOT NULL,
product_id NUMBER NOT NULL,
amount NUMBER NOT NULL,
PRIMARY KEY(order_id, item_id)) ORGANIZATION INDEX;
SQL> INSERT INTO order_item VALUES(1, 1, 1, 5);
SQL> INSERT INTO order_item VALUES(1, 2, 2, 1);
SQL> INSERT INTO order_item VALUES(2, 1, 1, 15);
SQL> INSERT INTO order_item VALUES(3, 1, 3, 1);
SQL> SELECT order_id, item_id, rowid FROM order_item;
ORDER_ID    ITEM_ID ROWID
---------- ---------- ------------------
1          1 *BAECUzMCwQICwQL+
1          2 *BAECUzMCwQICwQP+
2          1 *BAECUzMCwQMCwQL+
3          1 *BAECUzMCwQQCwQL+

Then, we assume that for order_id 2  we want to change item_id from 1 to 2, maybe we are to add another line as line 1 or so, and look at what happens to our ROWID:

SQL> UPDATE order_item SET item_id = 2 WHERE order_id = 2;
SQL> SELECT order_id, item_id, rowid FROM order_item;
  ORDER_ID    ITEM_ID ROWID
---------- ---------- ------------------
         1          1 *BAECUzMCwQICwQL+
         1          2 *BAECUzMCwQICwQP+
         2          2 *BAECUzMCwQMCwQP+
         3          1 *BAECUzMCwQQCwQL+

Have a look at the ROWID column for the row where order_id is 2, you see that it has changed as the row has moved. This may not be much of an issue, except that it is a good reason for why you should not store the ROWID in a referencing table. Also, if you SELECT a ROWID to use it in a later UPDATE, then you need to make sure that you lock that row, else this may also be an issue.

ROWID for Partitioned Tables

Just like with ROWID for Index Organized Tables, rows in partitioned tables can move around when columns that are part of the partitioning key are updated.

The UROWID

The UROWID datatype or Universal ROWID, is a data type that arrived along with Oracle version 9. Whereas ROWID only worked for native Oracle tables and some other cases, UROWID is much more flexible and can reference rows in any kind of tables. The disadvantage is that UROWID is cumbersome to use.

Advantages and disadvantages of using ROWID and UROWID

The main advantage of using ROWID, or UROWID for that matter, is that it is the fastest way to access a particular row as this, at least for normal tables, is the physical address of a row. This led to applications using ROWIDs, like this in PL/SQL:

DECLARE
r ROWID;
BEGIN
SELECT rowid INTO r FROM orders WHERE orderid = 1 FOR UPDATE;
UPDATE orders SET cust_id = 2 WHERE rowid = r;
END;

The FOR UPDATE clause is to ensure that the row isn’t touched between the SELECT and the UPDATE. This is a perfectly legit way of using ROWID. But once it was accepted that using ROWID is the “best” (in this case “best” means fastest) way of accessing a row, ROWID began being used for things it shouldn’t be used for. Like this:

 

ROWID-blog-img1
 

Is this good? If we have gotten all data from an order and want to look up the customer, it is going to be as fast as possible, right? And the same goes for a JOIN between these two tables, real fast.

No, actually this is a seriously bad idea. Rows may move, for example, by virtue of operational tasks. If the customer table is TEMPORAL, the ROW might change and this will not work. You really can no longer use PARTITIONING for the customer table if you expect to use a column that is likely to change in the partitioning expression. Oracle developers, please NEVER do this, i.e. do not store a ROWID in a table for any reason, unless you know really well what you are doing and have considered all the drawbacks of a design like this.

An annoying aspect of this is that if you do go ahead and use ROWID like this, it is likely to work just fine for a while. Issues will not be caught in testing or anything.

ROWID in Summary

Now you know more than you will probably ever need to know about what a ROWID is. But there are also some things you can learn from this. Like that the ROWID is deeply rooted in how Oracle works. Changing the format of a ROWID would be a major undertaking in terms of software development for Oracle. And if a ROWID contains the data it contains, it also imposes a bunch of restrictions on the workings of Oracle. Like if a table uses some format that isn’t necessarily using blocks like we are used to, that is maybe possible but many things will be very different and there will be some severe restrictions. Or tables in a remote database to the current instance? And Indexes, as you might have guessed now, use a ROWID to reference the indexed rows, this is also affected by all this. But ROWID is what it is, and it is restrictive, so why not migrate to something else? Like MariaDB?

Migrating ROWID to MariaDB

So now we know about what a ROWID is and how it is used. Now you may be wondering  how do I migrate applications that use ROWID?

Starting in MariaDB 10.3, we added an Oracle mode that enables compatibility with a subset of PL/SQL.

While MariaDB supports most common functions in it’s Oracle mode, MariaDB doesn’t support ROWID specifically, but there are some different ways of dealing with ROWID for applications that have been migrated from Oracle to MariaDB. We will now have a look at a few different means of working with this and the pros and cons of these means.

Common to all these examples is that we are emulating the attribute of the ROWID pseudo-column of being hidden unless asked for by using the INVISIBLE column attribute introduced in MariaDB 10.3.

Example Schema and Code  to Migrate

In the following examples, I use the very basic customer table  with some data to illustrate my point. In Oracle syntax, it  looks like this:

CREATE SEQUENCE cust_seq;
CREATE TABLE customer(cust_id NUMBER NOT NULL PRIMARY KEY,
cust_name VARCHAR2(50) NOT NULL);
INSERT INTO customer VALUES(cust_seq.NEXTVAL, 'Joe Bloggs');
INSERT INTO customer VALUES(cust_seq.NEXTVAL, 'John Doe');

Note that the cust_id column is using the NUMBER datatype, which is not a good practice as this means that any kind of numeric values can be used in this column. This is not an uncommon shortcut in Oracle though, and we assume that the values actually used are INTEGERS, in particular as we have a SEQUENCE generating them.

The PL/SQL code that we are looking to migrate looks like this:

CREATE OR REPLACE PROCEDURE set_cust_name(id NUMBER, name VARCHAR2)
AS
  cust_row VARCHAR2(18);
BEGIN
  SELECT rowid INTO cust_row FROM customer WHERE cust_id = id FOR UPDATE;
  IF cust_row IS NOT NULL THEN
     UPDATE customer SET cust_name = name WHERE rowid = cust_row;
  END IF;
END;

The goal here is to be able to retain the PL/SQL code. If the procedure would use the Oracle proprietary ROWID data types, then this needs to be changed to a BIGINT in MariaDB. One issue is that if we use BIGINT to substitute ROWID it will not fit into a VARCHAR(18) if that is used for storing an Oracle ROWID. If this is an issue, i.e. if the substitute ROWID will get very high, then I suggest migrating to use a BIGINT for the ROWID instead of extending the VARCHAR to the required VARCHAR(20). In all honesty though, I would not think that this is an issue except in some extreme edge cases.

Using AUTO_INCREMENT to generate ROWIDs

You can have only one auto_increment column per table but as we assume here that we use Oracle style sequences to generate primary keys, auto_increment is free for us to use for something else, in this case as a substitute key, possibly unique. To be able to use auto_increment in InnoDB in MariaDB, the column also has to be indexed and this makes sense, because in MariaDB, this ROWID will be synthetic and actually a physical location but just an integer value, so to look up a row we need an index for this to be reasonably fast.

The corresponding table definition in MariaDB would then be like this:

CREATE TABLE customer(cust_id INTEGER NOT NULL PRIMARY KEY,
cust_name VARCHAR(50) NOT NULL,
rowid BIGINT NOT NULL AUTO_INCREMENT INVISIBLE UNIQUE);

This will allow the procedure above to run unmodified. If the ROWID datatype is used, then only this needs to be changed to a BIGINT. In general, you are likely better off using BIGINT here anyway, but that is a different story.

Pros

  • This is easy to implement.
  • Low overhead due to the auto_increment feature being very effective.

Cons

  • The PRIMARY KEY column cannot be an auto_increment
  • The ROWID must be materialized.

Reusing the PRIMARY KEY as a ROWID

Another way of dealing with a generated ROWID is to reuse the primary key as it stands as a ROWID by using MariaDB VIRTUAL COLUMNS:

CREATE TABLE customer(cust_id INTEGER NOT NULL PRIMARY KEY,
cust_name VARCHAR(50) NOT NULL,
rowid INTEGER AS (cust_id) INVISIBLE UNIQUE);

Note that we have to index the ROWID column here, despite the fact that we know this is just a copy of the PRIMARY KEY column. We know this, but the optimizer cannot know it.

Pros

  • Easy to implement and use
  • The ROWID cannot be updated.

Cons

  • The UNIQUE index has to be materialized.
  • Requires that the PRIMARY KEY is a single column and that it has an appropriate data type.

Using a SEQUENCE to generate ROWID

If we assume a table has been migrated to MariaDB to use and AUTO_INCREMENT column as the PRIMARY KEY already, then we can use a SEQUENCE for the ROWID, this is the opposite of the first case above, more or less

CREATE SEQUENCE cust_rowid_seq;
CREATE TABLE customer(cust_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
cust_name VARCHAR(50) NOT NULL,
rowid BIGINT NOT NULL DEFAULT (NEXT VALUE FOR cust_rowid_seq) INVISIBLE UNIQUE);

Pros

  • Completely independent from the PRIMARY KEY.
  • Very flexible

Cons

  • Not as good performance as when using AUTO_INCREMENT
  • Both value and the index need to be materialized

Using UUID or UUID_SHORT as a ROWID

The functions UUID and UUID_SHORT generate a semi-unique value that can potentially be used to replace a ROWID. The issue is that it is guaranteed to be unique, but if the application can deal with it, this is an option:

CREATE TABLE customer(cust_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
cust_name VARCHAR(50) NOT NULL,
rowid BIGINT UNSIGNED NOT NULL DEFAULT (UUID_SHORT()) INVISIBLE UNIQUE);

You can use UUID() and SYS_GUID() also, but you have to change the datatype accordingly in that case. (SYS_GUID() was added with MariaDB Server 10.6.) There are some restrictions with regards to uniqueness for all of these functions, so I recommend you read up on that before using this method to generate a ROWID substitute.

Pros

  • The value is opaque, displaying no attribute that might tempt someone to use them for any other reason beyond as a ROWID substitute.
  • There are no concurrency issues with generating a UUID

Cons

  • The generated value is a long string in the case of UUID() and SYS_GUID().
  • There are constraints in terms of how unique the generated values are.

Updating the ROWID

Some examples above allow the ROWID to be updated. This is typically not a big deal as the ROWID column has a unique index on it, so an update can only be done to some alternative UNIQUE value. If you still think this is an issue though, you can always create a trigger to prohibit the ROWID to be updated.

CREATE TRIGGER cust_rowid_trg BEFORE UPDATE ON customer FOR EACH ROW
BEGIN
IF new.rowid != old.rowid THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ROWID should not be updated';
END IF;
END;

ROWID Questions and Answers

Q: Can you create an index on the ROWID column in a table?

A: No you cannot, and it would be meaningless all the same. An index just points to a row using said ROWID. In other words, even if you could index a ROWID, all it would do is to slow access down.

Q: Is ROWID in the SQL-standard?

A: No, it is not. The SQL-Standard has different means of achieving the most common use for ROWID, which is UPDATE or DELETE of a previously SELECTed row. This is by using the WHERE CURRENT OF cursor construct.

Q:  What is the difference between ROWNUM and ROWID?

A: These are two completely different things. ROWNUM is the order the row is fetched in a SELECT statement and is transient and cannot be used to identify a row. ROWID is the physical (in some cases logical) location of a row in the database.

Q: Must you use the ROWID datatype for ROWIDs?

A: No, that is not necessary, the most common representation beyond the ROWID type is to use a VARCHAR2(18). A ROWID is a 10-byte number (which when represented as a Byte64 encoded string ends up being 18 characters) for current versions of Oracle. Do not assume this though, unless you know what you are doing as this isn’t covering UROWID.

Q: Can you update the ROWID of a row in a table?

A: No, this makes no sense at all.

 

Happy SQL’ing
/Karlsson