Comments - Data Sampling: Techniques for Efficiently Finding a Random Row
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.
This has long bugged me about SQL.
Could one write a UDF that would use an RNG to select rows by their internal row number to avoid a full table scan?
I've also noticed that RAND() appears to be biased. It will often return from within a small subset of records over and over, which seems to change every day, leading me to suspect they're using a Linear Congruential RNG seeded with the date.
There is no hidden "row number". A UDF can't do better than use the PRIMARY KEY. For MyISAM, the location of a row is the byte offset into the .MYD; there is no list of these to probe. For InnoDB, one should provide a PK; if you don't there is an inaccessible 6-byte number that acts sort of like AUTO_INCREMENT; again, not practical.