Algoritmi per le Join basati sui blocchi

Stai visualizzando una vecchia versione di questo article. Visualizza la versione più recente.

Nelle versioni di MariaDB/MySQL precedenti alla 5.3 era implementato solo un algoritmo per le Join basato sui blocchi: il Block Nested Loop (BNL), loop sui blocchi annidati. Può essere utilizzato per le Inner Join. MariaDB 5.3 e successivi migliorano l'implementazione delle Join BNL e fornisce una varietà di algoritmi basati sui blocchi adatti alle Inner Join, Outer Join e Semi-Join. Questi algoritmi in MariaDB impiegano un join buffer che immagazzina i record del primo operando join prima di iniziare a cercare le corrispondenze nel secondo operando.

Questa pagina documenta i vari algoritmi di Join basati sui blocchi.

  • Block Nested Loop (BNL) join
  • Block Nested Loop Hash (BNLH) join
  • Block Index join known as Batch Key Access (BKA) join
  • Block Index Hash join known as Batch Key Access Hash (BKAH) join

Block Nested Loop Join

La differenza più importante tra l'implementazione del BNL in MariaDB 5.3 e le precedenti versioni è che il primo usa un nuovo formato per i record scritti nel buffer delle join. Questo nuovo formato ha i seguenti vantaggi:

  • un uso più efficiente dello spazio nel buffer per i valori NULL e per i tipi di lunghezza variabile (come VARCHAR);
  • supporto per i cosiddetti buffer incrementali delle Join, risparmiando spazio nel buffer per le Join a più vie;
  • l'algoritmo viene usato anche per le outer join e le semi-join.

Come funziona il Block Nested Loop Join

L'algoritmo effettua un'operazione join fra le tabelle t1 e t2 secondo il seguente schema.
I record del primo operando sono scritti nel buffer delle join uno ad uno, finché il buffer è pieno. I record nel secondo operando sono letti dalla tabella fisica o temporanea uno ad uno. Per ogni record r2 letto dalla tabella t2, viene scensito il buffer delle join; per ogni record r1 del buffer in cui r2 corrisponde a r1, la concatenazione dei campi interessati di r1 e r2 viene aggiunta al risultato della join parziale corrispondente.
Per leggere i record di t2 viene eseguita una scansione completa della tabella, una scansione completa di un indice o una lettura di un intervallo nell'indice. Aolo i record che soddisfano la consizione relativa alla tabella t2 vengono controllati, per sapere corrispondono ai record del buffer delle join.
Quando la scansione della tabella t2 termina, un'altra porzione dei record del primo operando riempe il buffer, e corrisponde a quei record che sono stati estratti da t2.
Il buffer riempe nuovamente e scansisce il recondo operando, cercando corrispondenze nel buffer delle join, e via così fino a quando i record nel primo operando non sono esauriti.
Il numero totale delle scansioni del secondo operando equivale al numero di scritture nel buffer delle join.

Uso più efficiente dello spazio nel buffer delle join

Per i valori NULL non viene usato alcuno spazio all'interno del buffer delle join.
Tutti i valori di tipi di lunghezza variabile non vengono più riempiti con degli zeri.

Incremental join buffers

If we have a query with a join of three tables t1,t2,t3 such that table t1 is joined with table t2 and the result of this join operation is joined with table t3 then two join buffers can be used to execute the query. The first join buffer B1 is used to store the records comprising interesting fields of table t1, while the second join buffer B2 contains the records with fields from the partial join of t1 and t2. The interesting fields of any record r1 from B1 are copied into B2 for any record record r1,r2 from the partial join of t1 and t2. One could suggest storing in B2 just a pointer to the position of the r1 fields in B1 together with the I interesting fields from t2. So for any record r2 matching the record r1 the buffer B2 would contain a reference to the fields of r1 in B1 and the fields of r2. In this case the buffer B2 is called incremental. Incremental buffers allow to avoid copying field values from one buffer into another. They also allow to save a significant amount of buffer space if for a record from t1 several matches from t2 are expected.

Using join buffers for simple outer joins and semi-joins

If a join buffer is used for a simple left outer join of tables t1 and t1 t1 LEFT JOIN t2 ON P(t1,t2) then each record r1 stored in the buffer is provided with a match flag. Initially this flag is set off. As soon as the first match for r1 is found this flag is set on. When all matching candidates from t2 have been check, the record the join buffer are scanned and for those of them that still have there match flags off null-complemented rows are generated. The same match flag is used for any record in the join buffer is a semi-join operation t1 SEMI JOIN t2 ON P(t1,t2) is performed with a block based join algorithm. When this match flag is set to on for a record r1 in the buffer no matches from table t2 for record r1 are looked for anymore.

Using join buffers for nested outer joins and semi-joins

Block Hash Join

Block based hash join algorithm is a new option to be used for join operations in MariaDB 5.3. It can be employed in the cases when there are equi-join sub-condition for the joined tables, in the other words when equalities of the form t2.f1= e1(t1),...,t2.fn=en(t1) can be extracted from the full join condition. As any block based join algorithm this one used a join buffer filled with the records of the first operand and looks through the records of the second operand to find matches for the records in the buffer.

How Block Hash Join works

For each refill of the join buffer and each record r1 from it the algorithm builds a hash table with the keys constructed over the values e1(r1),...en(r1). Then the records of t2 are looked through. For each record r2 from t2 that the condition pushed to the table t2 a hash key over the fields r2.f1,..., r2.fn is calculated to probe into the hash table. The probing returns those records from the buffer to which r2 matches. As for BNL join algorithm this algorithm scans the second operand as many time as many refills of the buffer occur. Yet it has to look only through the records of one bucket in the hash table when looking for the records to which a record from t2 matches, not through all records in the join buffer as BNL join algorithm does. The implementation of this algorithm in MariaDB builds the hash table with hash keys at the very end of the join buffer. That's why the number of records written into the buffer at one refill is less then for BNL join algorithms. However a much shorter list of possible matching candidates makes this the block hash join algorithm usually much faster then BNL join.

Batch Key Access Join

Batch Keys Access join algorithm performs index look-ups when looking for possible matching candidates provided by the second join operand. With this respect the algorithm behave itself as the regular join algorithm. Yet BKA performs index look-ups for a batch of the records from the join buffer. For conventional database engines like InnoDB/MyISAM it allows to fetch matching candidates in an optimal way. For the engines with remote data store such as FederateX/Spider the algorithm allows to save on transfers between the MySQL node and the data store nodes.

How Batch Keys Access Join works

The implementation of the algorithm in 5.3 heavily exploits the multi-range-read interface and its properties. The interface hides the actual mechanism of fetching possible candidates for matching records from the table to be joined. As any block based join algorithm the BKA join repeatedly fills the join buffer with records of the first operand and for each refill it finds records from the join table that could match the records in the buffer. To find such records it asks the MRR interface to perform index look-ups with the keys constructed over all records from the buffer. Together with each key the interface receives a return address - a reference to the record over which this key has been constructed. The actual implementation functions of the MRR interface organize and optimize somehow the process of fetching the records of the joined table by the received keys. Each fetched record r2 is appended with the return address associated with the key by which the record has been found and the result is passed to the BKA join procedure. The procedure takes the record r1 from the join buffer by the return address, joins it with r2 and checks the join condition. If the condition is evaluated to true the joined records is sent to the result stream of the join operation. So for each record returned by the MRR interface only one record from the join buffer is accessed. The number of records from table t2 fetched by the BKA join is exactly the same as for the regular nested loops join algorithm. Yet BKA join allows to optimize the order in which the records are fetched.

Interaction of BKA join with with the MRR functions

BKA join interacts with the MRR functions respecting the following contract. The join procedure calls the MRR function multi_range_read_init passing it the callback functions that allows to initialize reading keys for the records in the join buffer and to iterate over these keys. It also passes the parameters of the buffer for MRR needs allocated within the join buffer space. Then BKA join repeatedly calls the MRR function multi_range_read_next. The function works as an iterator function over the records fetched by index look-ups with the keys produced by a callback function set in the call of multi_range_read_init. A call of the function multi_range_read_next returns the next fetched record through the dedicated record buffer, and the associated reference to the matched record from the join buffer as the output parameter of the function.

Managing usage of block-based join algorithms

Currently 4 different types of block-based join algorithms are supported. For a particular join operation each of them can be employed with a regular (flat) join buffer or with an incremental join buffer.

Three optimizer switches - 'join_cache_incremental', 'join_cache_hashed', 'join_cache_bka' – and the system variable 'join_cache_level' control what of 8 variants of the block-based algorithms can be used for join operations.

If join_cache_bka is off then BKA and BKAH join algorithms are not allowed. If join_cache_hashed is off then BNLH and BKAH join algorithms are not allowed. If join_cache_incremental is off then no incremental variants of the block-based join algorithms are allowed.

By default the switches join_cache_incremental, join_cache_hashed, join_cache_bka are set to on. However it does not mean that by default any of block-based join algorithms is allowed to be used. All of them are allowed only if the system variable join_cache_level is set to 8. This variable can take an integer value in the interval from 0 to 8.

If the value is set to 8 no block-based algorithm can be used for a join operation. The values from 1 to 8 correspond to the following variants of block-based join algorithms:

  • 1 – flat BNL
  • 2 – incremental BNL
  • 3 – flat BNLH
  • 4 – incremental BNLH
  • 5 – flat BKA
  • 6 – incremental BKA
  • 7 – flat BKAH
  • 8 – incremental BKAH

If the value of join_cache_level is set no N any of block-based algorithms with the level greater than N is disallowed.

So if join_cache_level is set to 5 no usage of BKAH is allowed and usage of incremental BKA is not allowed either while usage of all remaining variants are controlled by the settings of the optimizer switches join_cache_incremental, join_cache_hashed, join_cache_bka.

By default join_cache_level is set to 1. In other words only usage of flat BNL is allowed.

By default block-based algorithms can be used only for regular (inner) join operations. To allow them for outer join operations (left outer joins and right outer joins) the optimizer switch 'outer_join_with_cache' has to be set to 'on'. Setting the optimizer switch 'semijoin_with_cache' to 'on' allows using these algorithms for semi-join operations.

Currently only incremental variants of the block-based join algorithms can be used for nested outer joins and nested semi-joins.

Size of join buffers

Maximum size of join buffers used by block-based algorithms is controlled by the setting of the system variable 'join_buffer_size'. This value must be big enough in order the join buffer employed for a join operation to contain all interesting fields at least of one joined record.

MariaDB 5.3 introduces the system variable 'join_cache_space_limit' that limits the total memory used for join buffers in a query.

To optimize the usage of the join buffers within the limit set by 'join_cache_space_limit' one should use the optimizer switch 'optimize_join_buffer_size' set to 'on'.

To use BKA/BKAH join algorithms for InnoDB/MyISAM one must set the optimizer switch 'mrr' to 'on'. When using these algorithms for InnoDB/MyISAM the overall performance of the join operations can be dramatically improved if the optimizer switch 'mrr_sort_keys' is set 'on'.

Commenti

Sto caricando i commenti......
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.