Comments - MySQL "Wishlist" Session from an online travel agency

12 years, 9 months ago Michael Widenius
> DROP TABLE is slow
  • Was a bug in MySQL with compressed tables that is fixed by Oracle and is already in MySQL 5.1.
> Scaling replication
  • MySQL 5.6 is working on this. As soon as this code is stable we can use this as a base for MWL#181: Parallel replication of group-committed transactions (Project hours not easy to specify until 5.6 is stable).
> UTF-8 issues, MEMORY engine limitations

I added MWL#203 "Add efficient support for BLOB and VARCHAR to HEAP/MEMORY"

This should solve both this issue and also other issues with data going to disk that doesn't have to.

I estimate this to be around 60 hours development time + 16 hour testing.

> The "Adminlogging problem"
> 
> Many writes we have need to track the previous values so that they
> are being shipped to an auditing database elsewhere on a different
> machine. Many ways to solve that problem exist, most suck.
>
> Using RBR binlog, parse that output: We are currently parsing
> "mysqlbinlog -vv" output. A binlog parsing library exposed to the
> usual scripting languages would make that much easier. Also, the
> binlog does not contain all data needed for an audit log
> (application level user information needs to be embedded somehow).

We have 'kind of' a 'C' library for this that is used by mysqlbinlog; It's far from easy to make the binary log available for scripting languages as the structure is quite complex and each event (and there are many of these) have their own format. Any ideas of how to do this are appreciated (even a suggested interface would help).

It's however relatively easy to conditionally add more data to the binary log. What information is it that you would like us to do?

Feel free to add what you like to MWL#204

> We can log the statement including all comments in the statement
> once, and have each row point back to the statement that caused this
> row to be changed. You'd have the change from the RBR record, and
> could get the higher level app user from a comment in the
> statement.

> I like. Binlog is written anyway, so no additional overhead. -- Kris

MariaDB already have the statement stored in the binary log so this may already be partly solved. See http://kb.askmonty.org/en/annotate_rows_log_event

If not, please update MWL#204 with what you need!

> Specialized statements? "SELECT FROM DELETE/SELECT FROM UPDATE":
> Write statements returning a result set of old values. Actually
> INSERT FROM DELETE/INSERT FROM UPDATE would be needed to make it
> transactional.

I tried to find an existing definition for this (I am sure some other database has this) but could not find it.

The best I have come up with so far is (for single table delete's):

is to use the DB2 syntax:

DELETE from (SELECT column_list from single_table WHERE ... ORDER BY)

another option would be

DELETE FROM table_name WHERE ... [LIMIT #]] SELECT expression_list

I have added the above to: MWL#205 DELETE with result set

Doing UPDATE's is a bit harder from a syntax point of view. Lets think about that when the DELETE problem is solved. The LAST_VALUE() function Eric is working on may solve this.

> MySQL 5.5. provides an Audit plugin API for this. Check it out,
> write UDP sender into eventlogging as such a plugin. It is easy, and
> does exactly what we want. Also, this is the least ugly way to solve
> this.

That is also an option.

> Replication synchronisation (Global Unique Id)

> At the moment we build our replication hierarchies at most 2 levels
> deep, and use a host language wrapper around SHOW MASTER STATUS and
> SELECT MASTER_POS_WAIT() to create waypoints where needed. This does
> not work across more than 1 hop (2 levels), as master binlog
> positions do not translate past one slave.  A mechanism that allows
> to create such waypoints across any number of hops would be useful
> from a correctness perspective. I would also allow us more
> flexibility in how we build our replication hierarchies.

We created a long time ago MWL#31 "global transaction id" to solve this issue, but that is for the moment on hold waiting (we are waiting for a developer at another company to post his solution that he has worked a long time one before continuing).

Lately Kristian has been working on a sub part of this: MWL#175 "Unique ID in binlog event metadata"

The above should likely solve your problems. Please comment!

> Large table upload causes replication lag
>
> Some events are known independent events, and can cause large
> replication lag due to their sheer size. For example, sometimes we
> upload large new tables to a master, and the data download then
> delays normal processing in all slaves. It would be nice for a
> mechanism to exist to leverage additional connections to download
> this out-of-band.

I have now created MWL#206 for all replication task that gives more parallelism.

I have added the following tasks as sub tasks for the above:

  • MWL#207 Replicate LOAD DATA INFILE 'out of band'
  • MWL#208 Replicate ALTER TABLE in parallel
> We shard data. In some cases, for some users which do care less
> about speed and more about reporting, it would be convenient to have
> a way to unshard data. Is FederatedX a solution? Can be have
> replication so that a slave can be slave to multiple masters
> provided the data sets of the masters are nonoverlapping?

Kristian has created MWL#201 "Multi-source replication" to handle this.

> Making slaves in a non-sucky way
> Right now we mylvmbackup a slave, restore it to a new box, recover
> the InnoDB and connect it to the master. We'd like to have a command
> that can be issues to an empty database, will download the data from
> a master (or a co-slave), and then attach the new slave to the
> master (using the configuration provided on the command line or by
> the co-slave). At the SQL level. With progress report.

> And please have the data move at disk speed, no index creation, ship
> binary index data from the co-slave or master. This can be fast! --

For this I don't have a simple solution. I have however recorded this into MWL#209 "Setting up a new slave easily" to allow Kristian to comment.

> Stable snapshot
> 
> "FLUSH TABLES WITH READ LOCK" can still break, if a checkpoint
> occurs under the lock.
> Can we have FLUSH TABLES WITH READ LOCK AND NO CHECKPOINT and a
> FORCE CHECKPOINT for InnoDB, or have FLUSH TABLES force this
> checkpoint when applied to InnoDB?

MariaDB 5.3 supports already the command:

FLUSH TABLE WITH READ LOCK AND DISABLE CHECKPOINT.

> Server restart required (Replication)
>
> About every replication config change requires a server restart to
> be picked up. Please fix!

I have added MWL#210 "Make all replication variables changeable without server restart"

The main problem is that some of the variables are very hard to change, so it's best to take these one by one.

I have added all replication variables to the MWL#210. Can you mark those that are most important to make changeable?

> RBR conflict resultion SP
> In RBR, when there is any kind of RBR problem, call a SP. The SP
> gets all required data made available somehow, can define a
> resolution policy. 4 Outcomes: Choose old, choose new, fail (error)
> and synthesize (merge) new row.

I have added this as MWL#211 conflict resolution for RBR replication

This one is however not an easy task. Kristian needs to write a high level description for this.

> Compressed binlog
> 
> Write the binlog optionally in a compressed fashion at the event level.

I have created MWL#21 "Compressed binary log" for this.

> Async binlog shipping

Already in MySQL 5.5

> On binlog rotation, run a SP asynchronously. Note than execution of
> for example a shell command through a UDF must not lock anything! 
> Provide sufficient environmental data to the SP (binlog names old
> and new, binlog position for old and new log, etc).

I have added MLW#213 Run SP on binary log rotation. (Not much to add except of the above)

The main problem is to run this asynchronously. We do have a thread designed for things like this, so it may not be that hard.

> Use tables, not files
> 
> Get rid of master.info, relay.info. These should be InnoDB tables in mysql.*.

This is already done in MySQL 5.6

> Less critical wishes
> Async query API so that we can map/reduce at the client level

Kristian has done an excellent specification for this at: MWL#192 "optional nonblocking client API"

Would that satisfy your needs?

> Provide an API in mysqlclient.so so that we can write application
> side map/reduce query-shards-in-parallel actions.

To be able to do this, I would need a spec for you of the API.

> Make Indexes first level objects
>
> An index should be able to exist as an object that is not tied to a
> physical table. That would allow for indices that are global to a
> partition, and for functional indexes or indices on virtual columns.

This is close to impossible to do. Functional indexes are notable easier to do.

I have recorded this in MWL#214 "Make Indexes first level objects" I also added: MWL#215 "Functional indexes"

> ALTER TABLE progress report
>
> ALTER TABLE can take a long time. We know how to go into $datadir
> and find the secret files and observe them, but we'd rather see
> progress in SHOW PROCESSLIST.
> done while you wait (*closes > emacs*). -- Monty

I got a little to ambitious in trying to do it right, but this is now done in MariaDB 5.3

It's documented here http://kb.askmonty.org/en/progress-reporting and I expect that a lot of MySQL clients will support this soon!

> Fast ALTER TABLE
>
> Can we have proper fast ALTER TABLE? That would include online index
> drop/create, and versioned schema plus version numbers at the page
> level or the row level. Rows would be upgraded to the most recent
> schema version not at ALTER TABLE time, but whenever their row/page
> is being dirtied for whatever reason. -- Kris, Herald.

In MariaDB 5.5 there is as far as I know some work done to make ALTER TABLE fast with InnoDB and with ADD/DROP INDEX. We have to come back to this question when we have MariaDB 5.5 ready and can look at the code.

In Aria it would be easy to add columns and drop not key columns instantly. For other storage engines this is probably harder.

> There is the facebook hack for online ALTER TABLE.

I hope that for the near future the facebook solution should be good enough... (at least it's a workable solution).

> Nested transactions / In TXN flag
>
> "I would like to have nested transactions. They would make some
> things I need to solve a lot easier (layering in code, lower layers
> with their own TXNs being called by upper layers with or without a
> TXN going on)."

The MWL#137 covers this.

Before doing this, we would need to do a deeper investigation to create a proper low level description of this. The current estimate for doing the low level design is 24 hours.

> "I would like to see a viable definition of nested transactions. I
> would like to know what it means when an inner TXN does a COMMIT and
> then the outer TXN does a ROLLBACK. Are there any other ways to
> solve your layering problem?"

Does the MWL#137 cover this ? Why can't you do SAVEPOINTS instead ?

> Another group asked about solutions for the same layering problem,
> but in a different way: "We would like to have a flag or status
> variable that tells code if it currently is inside a transaction,
> and optionally, if that is an implied (autocommit = 0) or explicit
> (autocommit = 1 and BEGIN WORK was issued) transaction."

Based on our discussion we have added to MariaDB a variable '@@in_transaction' which will be 1 if one has executed either a BEGIN statement or if one has accessed a table and autocommit is not enabled.

Hopefully this should solve your problem.

> Compressed MyISAM/Aria
>
> We are using MyISAM in some places for mostly append-only tables and
> very large sizes (20 TB and more target size). We are considering
> Aria for this, if Aria is comparable in space efficiency to MyISAM,
> and has nicer recovery after an unplanned shutdown.

The new page format with auto recovery uses a little more space than MyISAM but not much.

When compressing it with aria_pack it uses exactly the same space as MyISAM.

> MyISAM tables we are using are being compressed with
> myisampack/myisamchk when they are finished with being written
> to. This process is at the moment cumbersome, as it involves a FLUSH
> TABLES and external commands being run outside of the DB server.
>
> Having an SQL command that online packs MyISAM/Aria tables would be
> useful, as this would handle open file handles and other stuff more
> nicely, and also can be controlled from the client using only SQL.

This is relatively easy to do. MWL#216

> GET_LOCK() improvements
>
> "At the moment GET_LOCK() in limited to one lock per
> connection. Acquiring a second lock while holding one releases the
> initial lock. This has been done in order to avoid deadlocking
> problems, we presume.

Correct.

> Having a deadlock detection system in place, and being able to pick
> up multiple locks at once or gradually would be useful."
> 
> There are voices (mine, for one) which state that if you are using
> GET_LOCK instead of InnoDB row locks then you are doing it
> wrong. Using GET_LOCK more is a step into the wrong direction,
> technology-wise.

As I helped solve one of your GET_LOCK() usages (for priority queues) I would like to know if the above is still relevant.

I did add one task regarding this: MWL#217 Get list of active GET_LOCK

> "Some large join take ages to execute. Probably due to tmp tables to disk."
> 
> Probably better of with Hash Joins? Show query examples, please. -- Monty

Another issue could be if you are using blobs, in which case the join will use disk based tables. In this case the Aria storage engine for internal temp tables may help. Making MEMORY tables capable of handling BLOB's and big VARCHAR (MWL#203) may also help.

> PERFORMANCE_SCHEMA, MySQL 5.5 stuff plus addons. WL Item for better
> EXPLAIN exists. -- Monty
> 
> Please link this item here. -- Kris

The worklog items we have for explain are:

  • MWL#51 Add EXPLAIN for UPDATE/DELETE
  • MWL#182 Explain running statements
  • MWL#110 Make EXPLAIN always show materialization separately (done)
  • MWL#111 Make EXPLAIN show where subquery predicates are in the WHERE clause

I added an overview task MWL#218 "Better EXPLAIN" to link to all EXPLAIN tasks.

I was of the impression that Igor had yet another explain task. I will ask him about this.

> "Better statistics for queries."

Added MWL#219 "Query statistics" that should handle most of the requested things.

> Optimizer woes
> 
> "Unquoted numbers cause weird query plans." -- Ruud
> 
> Yes, they have to, thanks to numeric casting in SQL. This is not
> really solveable. -- Monty

An example:

SELECT * from t1 where character_key=10;

The above key would match the following strings:

'9.999990000000099999', '10', ' 10', '10a', '10.01' etc

In theory MariaDB could internally replace the above comparison of character key with:

where character_key>"" and character_key <CHAR(ASCII(32)+1) and character key >="9.99999" and character_key<= "10.5" and character_key=10;

The above would however only work reasonable for constants, not when comparing a integer key with a character key.

> In MongoDB (gasp!) it can be made such that a query that uses a full
> scan instead of an index is an error.

In MariaDB/MySQL you can use the following variables to catch bad queries:

SQL_SAFE_UPDATES=1   ; Don't allow UPDATE/DELETE that doesn't use keys.
SQL_SELECT_LIMIT=#   ; Automatic upper level limit for SELECT
SQL_MAX_JOIN_SIZE=#  ; Give an error for queries that may touch many rows.

Saying that table scan is not allowed is not practical as if there is tables with only a few rows a table scan is the best access method.

MySQL also can log queries not using an index in the slow log, which is good. But, can we make such implied casts somehow cause an error optionally, that would make that SQL easier to find in Development.

If the above variables is not good enough, we could add a variable
SQL_FORCE_INDEX_USAGE= 0 | 1 | 2

0 is normal.
1 would give an error if you are not specifying a usable index for all tables.
  It would still allow MariaDB to do a table scan if that's the best method.
2 Give an error for any SELECT's that would use a table scan on a normal
  (not internal temporary table)

I have recorded this in MWL#220 Disallow table scan in SELECT

> Using memcached as a backend

Can you please remind me of what this was ?

a) Creating a storage engine that can access memcache
b) Storing results of selects in memcache
c) Access a storage engine with the memcache protocol (this is implemented
   in InnoDB in MySQL 5.5)

Something else?

> Moving ibd files physically

Added

> InnoDB should be able to do this natively, via ALTER TABLE ... DETACH/ATTACH.

I added task: MWL#221 "move InnoDB data files between servers"

But I don't have anyone available that could do this task just now.

> More data load issues
> 
> SELECT INTO OUTFILE LOCAL? We do have LOAD DATA INFILE LOCAL.
> 
> Yes, but as implemented it is a security risk, also messes with
> firewall config due to the way it connects. -- Kris
> Does not. -- Monty
> 
> Ok, I will write this up in a better way. -- Kris

Please explain what the issue is. LOAD DATA LOCAL INFILE works as follows:

  • The client sends the LOAD DATA LOCAL INFILE command to the server on the connection.
  • The server sends instead of an 'ok' packet back a packet that contains the requested file name.
  • The client recognizes this special packet and opens the file and sends the data to the server trough the normal connection it uses to pass queries.
> About every second innodb config change requires a server restart to
> be picked up. Please fix!

MWL#222 "Make all innodb variables changeable without server restart"

It would be nice to get a list of the variables that you find most important. (As XtraDB is not our most core competence, fixing these takes a little more efforts than other tasks).

> INFORMATION_SCHEMA
> 
> Faster I_S. Consider a server with one million tables. Make I_S not
> suck on this.

MWL#223 "Handle millions of tables (add data dictionary)"

> GRANTS, ROLES, LDAP
>
> We would be able to specify users as roles (e.g. developer, webuser,
> admin) and then assign a set of roles to users. One role is the
> default role picked up on login, the others can be picked up and
> dropped with a command.

We have already tried to do this twice in MySQL Ab. Maybe its time to get it right this time.

MWL#198.

I will ask Sergei to add some more meat for this worklog.

> The database should be able to work fine with one million users, one
> million roles. Do not keep all data in memory, load the required
> info on login into a cache (size changeable without restart). On
> login, load the requested data, if it is not in cache already.

I created: MWL#224 "Handle millions of users & roles"

I will ask Sergei to do some estimates of this.

> Views
> 
> Views are parsed for every access, never cached.

MLW#225 Cache VIEW's

> TABLESPACE
> 
> Proper tablespace management a la Oracle. -- SimonM
> 
> Why? Please state benefits, spec it. -- Monty
> 
> Simon replied:
> 
> My thoughts: We have database instances which have data ranging
> from: 20G to 2TB and some archive db servers containing up to 40TB
> of data target size.

Thanks for a clear explanation. However adding true table spaces to InnoDB is probably one-two man years of works and not practical for us to do.

> Tablespaces, if done right, can expose an API internally, that all
> storage engines can use instead of talking to the filesystem
> directly. That would unify the actual storage management of all
> storage engines.

The problem is that file systems will always evolve faster than any internal implementation of table spaces, so the benefit will not be that great.

> Get rid of MyISAM in mysql.*
> 
> You should be able to use any general storage engine in mysql.*, e.g. InnoDB.

MWL#226 "Replace the MYISAM mysql.* tables with Aria / XtraDB"

For this task the time estimate is very approximative. We would need to do some testing to be able to give a correct estimate.

> Get rid of log tables in mysql.*
>
> The mysql.* schema is a config repository. It was a mistake to place
> potential log tables in here, e.g. the general log and slow query
> log tables.

Yes, they are not good, but you don't need and should not use them (performance is terrible) so don't think this should be a big problem

> Timezone issues
> 
> Currently, timezones are part of the session, not part of the
> column. Data types exist that take timezone data from the session
> into account (broken!), other data types never do that.

MWL#227 Add time types that contains timezone

> Dynamic ENUM

MWL#228 Dynamic ENUM

This is a very specific task for you. We can do it but I would prioritize this a lower than other more general tasks.

> Multiple Buffer pool
> 
> Innodb multiple buffer pools, but size of each, and to which pool
> will a page go configured by the dba (instead of random/hash as in
> 5.5, at table level).

I added MWL#229, but this is not a trivial task...

> InnoDB temp tables
>
> Discussed internally: is it not possible to use > Innodb temporary
> tables or similar for this perhaps in an explicit > tempdb (like
> Sybase)?

I assume the questions is one could use InnoDB temporary tables as internal temporary tables inside MariaDB?

This is not trivial to do because:

The interface for the internal temporary tables is a bit different than for normal tables (not much, but still).

The internal temporary tables can do things that not normal engines can:

  • Do unique over all fields (no key length limitations)
  • Handle NULL differently from normal context (in GROUP BY NULL are equal).
  • The memory for MEMORY tables is more dynamic; It's allocated when needed and freed for other things when not needed.
  • MEMORY has a low footprint (except for VARCHAR) and is optimized for non-transactional usage. I don't know how InnoDB temporary tables are optimized for not transactional and not crash-safe usage.
> That is: tables are always temporary, cleaned up on server restart,
> writes to tables do not need to be (should not be) flushed.  That
> would make the MEMORY engine thing from above irrelevant.

I think that it would be notable more work to use InnoDB for internal temporary tables than fixing the MEMORY engine. With InnoDB there is also a chance that at the end we would notice that it doesn't work...

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.