Comments - Data Sampling: Techniques for Efficiently Finding a Random Row

 
3 years, 8 months ago Darcey Lloyd

1. Get no' of rows in table 2. Gen rand number(s) between 1 and no of rows (no duplicates) 3. Return row(s)

No sorting should be required, why would that even be an idea? Amazing that these are still the solutions in use... Should have been internal feature long ago.

Just add SELECTRAND * FROM xyz WHERE x = y LIMIT 10 etc

 
8 years, 2 months ago Jan Steinman

This has long bugged me about SQL.

Could one write a UDF that would use an RNG to select rows by their internal row number to avoid a full table scan?

I've also noticed that RAND() appears to be biased. It will often return from within a small subset of records over and over, which seems to change every day, leading me to suspect they're using a Linear Congruential RNG seeded with the date.

 
7 years, 10 months ago Rick James

There is no hidden "row number". A UDF can't do better than use the PRIMARY KEY. For MyISAM, the location of a row is the byte offset into the .MYD; there is no list of these to probe. For InnoDB, one should provide a PK; if you don't there is an inaccessible 6-byte number that acts sort of like AUTO_INCREMENT; again, not practical.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.