Comments - What does SELECT do in the background ?
Content reproduced on this site is the property of its respective owners,
and this content is not reviewed in advance by MariaDB. The views, information and opinions
expressed by this content do not necessarily represent those of MariaDB or any other party.
Your SQL query will be place into a thread that will live the time of your connection into the database server.
That thread will parse the query and compute an execution plan (QP) and run it.
The QP ask statistics to the underlying tables and indexes to decide ( brut force) what plan cost less.
You can see the plan running :
EXPLAIN SELECT col1,col2,col3,col4,...,coln FROM table1 WHERE (condition) ..
You need to take care that your conditions are covered by an index that will promote a plan that using the index have direct access to those 1000 thousand records
Without the index the plan will push all table records to the SQL thread that will reject records that don't match your conditions. DBA are calling this a full table scan and try to limit this with good indexing.
The result will be put into a network buffer and send to your client. This is costly CPU as well so thread does not really stop working until the plan is finished.
In many case resultset need to be sorted: ORDER BY, GROUP BY, DISTINCT, that are not covered by an index. In that case the result will be fully BUFFERED into a memory table or a disk base table (MyISAM engine type) when the result can't be put in memory : sizing or data type reasons (BLOB does not like memory tables and vis versa)
if you wan't to force the server to buffer the resulset into a temporary table just add SQL_BUFFER_RESULT after the SELECT. This will free up a table lock while the resultset is being sent to the client.
Some API calls also enable this same server buffering feature.
In MySQL your thread is living in what is call the SQL layer and your data and indexes are manage by what is call the STORAGE ENGINE layer. Each storage engine use some memory buffers to cache indexes and table data. so the first execution of your query will load data from disk and then retrive it later from memory if the all dataset feat into those buffer. Monitoring the hit ratio of those buffers is a requirement for the overall performance and setting them correctly is a must do : read about innodb_buffer_pool_size , and key_buffer. There are mainly the only variables that need to be change according to memory of your hardware.
There is an other cache call the query cache (QC) this one stay in the SQL LAYER and will store the resultset of your query until the undelying tables get some changes. If the query come again the result will be served from the QC and not get executed again.
That is a very nice overview, exactly the kind of information I was looking for. Plenty of information there to digest.
Many thanks