What does SELECT do in the background ?

Hi. I am new in the MariaDB (and RDBMS) world. I am writing an API layer "on top" of MariaDB and Sphinx, to help porting applications written for another (rather specific and non-standard) backend. In order to make this API layer work in a way that is not totally inefficient with MariaDB, I would need some understanding of what happens in the background when one does a SELECT. Basically, the question is : if I execute the following statement :

SELECT col1,col2,col3,col4,...,coln FROM table1 WHERE (condition) ..

and it happens that 1000 rows match the (condition), does MariaDB immediately retrieve the corresponding row data and build a full data representation of the result set in memory ? or does it wait (in some kind of "lazy" mode) until I really start retrieving the rows contents before it actually retrieves the data from the table ?

In other words, imagine that there are a lot of columns in the table, and some of them are very large (such as "medium text" or "long text"), would it be more efficient to first do a SELECT with only 1 column, and then when I am sure of the result set, do the "real" SELECT with the real columns needed ?

Or is this a stupid question in the context of MariaDB and RDBMS (and why) ?

Answer Answered by Stephane VAROQUI in this comment.

Your SQL query will be place into a thread that will live the time of your connection into the database server.

That thread will parse the query and compute an execution plan (QP) and run it.

The QP ask statistics to the underlying tables and indexes to decide ( brut force) what plan cost less.

You can see the plan running :

EXPLAIN SELECT col1,col2,col3,col4,...,coln FROM table1 WHERE (condition) ..

You need to take care that your conditions are covered by an index that will promote a plan that using the index have direct access to those 1000 thousand records

Without the index the plan will push all table records to the SQL thread that will reject records that don't match your conditions. DBA are calling this a full table scan and try to limit this with good indexing.

The result will be put into a network buffer and send to your client. This is costly CPU as well so thread does not really stop working until the plan is finished.

In many case resultset need to be sorted: ORDER BY, GROUP BY, DISTINCT, that are not covered by an index. In that case the result will be fully BUFFERED into a memory table or a disk base table (MyISAM engine type) when the result can't be put in memory : sizing or data type reasons (BLOB does not like memory tables and vis versa)

if you wan't to force the server to buffer the resulset into a temporary table just add SQL_BUFFER_RESULT after the SELECT. This will free up a table lock while the resultset is being sent to the client.

Some API calls also enable this same server buffering feature.

In MySQL your thread is living in what is call the SQL layer and your data and indexes are manage by what is call the STORAGE ENGINE layer. Each storage engine use some memory buffers to cache indexes and table data. so the first execution of your query will load data from disk and then retrive it later from memory if the all dataset feat into those buffer. Monitoring the hit ratio of those buffers is a requirement for the overall performance and setting them correctly is a must do : read about innodb_buffer_pool_size , and key_buffer. There are mainly the only variables that need to be change according to memory of your hardware.

There is an other cache call the query cache (QC) this one stay in the SQL LAYER and will store the resultset of your query until the undelying tables get some changes. If the query come again the result will be served from the QC and not get executed again.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.