About SphinxSE

Stai visualizzando una vecchia versione di questo article. Visualizza la versione più recente.

Abilitare SphinxSE in MariaDB

A partire da MariaDB 5.2.2, lo Storage Engine Sphinx è incluso nei sorgenti, nei binari e nei pacchetti di MariaDB. SphinxSE è compilato come plugin .so caricabile dinamicamente. Per utilizzarlo occorre eseguire un'unica volta: INSTALL PLUGIN:

MariaDB [(none)]> INSTALL PLUGIN sphinx SONAME 'ha_sphinx.so';
Query OK, 0 rows affected (0.00 sec)

Una volta installato, SphinxSE apparirà nell'elenco degli Storage Engine installati:

MariaDB [(none)]> show engines;
+------------+---------+--------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                    | Transactions | XA   | Savepoints |
+------------+---------+--------------------------------------------+--------------+------+------------+
...
| SPHINX     | YES     | Sphinx storage engine 0.9.9                | NO           | NO   | NO         |
...
+------------+---------+--------------------------------------------+--------------+------+------------+

Questo deve essere fatto una sola volta.

Nota: SphinxSE è solo la parte di Sphinx che funge da Storage Engine. Per poterlo utilizzare in MariaDB, occorre installare Sphinx.

A dispetto del suo nome, SphinxSE di per sé non gestisce i dati. In effetti non è altro che un client built-in che permette a MariaDB di comunicare con searchd, eseguire ricerche e ottenere i risultati. Tutta la parte di indicizzazione e di ricerca avviene fuori da MariaDB.

Ecco alcune applicazioni di SphinxSE:

  • port più semplice delle applicazioni FTS MariaDB/MySQL a Sphinx
  • permettere l'uso di Sphinx con linguaggi di programmazione che non dispongono ancora delle API necessarie
  • ottimizzazioni che riguardano l'elaborazione del set dei risultati di Sphinx dal lato di MariaDB (esempi; JOIN con la tabella dei documenti originali, ulteriori filtri operati da MariaDB, etc...)

Using SphinxSE

Basic Usage

To search via SphinxSE, you would need to create a special ENGINE=SPHINX "search table", and then SELECT from it with full text query put into the WHERE clause for query column.

Here is an example create statement and search query:

CREATE TABLE t1
(
    id          INTEGER UNSIGNED NOT NULL,
    weight      INTEGER NOT NULL,
    query       VARCHAR(3072) NOT NULL,
    group_id    INTEGER,
    INDEX(query)
) ENGINE=SPHINX CONNECTION="sphinx://localhost:9312/test";

SELECT * FROM t1 WHERE query='test it;mode=any';

The first three columns of the search table must have a type of INTEGER UNSINGED or BIGINT for the 1st column (document id), INTEGER or BIGINT for the 2nd column (match weight), and VARCHAR or TEXT for the 3rd column (your query), respectively. This mapping is fixed; you can not omit any of these three required columns, or move them around, or change types. Also, the query column must be indexed; all the others must be kept unindexed. Column names are ignored so you can use arbitrary ones.

Additional columns must be either INTEGER, TIMESTAMP, BIGINT, VARCHAR, or FLOAT. They will be bound to the attributes provided in the Sphinx result set by name, so their names must match the attribute names specified in sphinx.conf. If there's no such attribute name in the Sphinx search results, the additional columns will have NULL values.

Special "virtual" attribute names can also be bound to SphinxSE columns. _sph_ needs to be used instead of @ for that. For instance, to obtain the values of '@groupby', '@count', or '@distinct' virtual attributes, use '_sph_groupby', '_sph_count' or '_sph_distinct' column names, respectively.

The CONNECTION string parameter is used to specify the default searchd host, port, and indexes for queries issued using this table. If no connection string is specified in CREATE TABLE, index name '*' (ie. search all indexes) and 'localhost:9312' are assumed. The connection string syntax is as follows:

CONNECTION="sphinx://HOST:PORT/INDEXNAME"

You can change the default connection string later like so:

ALTER TABLE t1 CONNECTION="sphinx://NEWHOST:NEWPORT/NEWINDEXNAME";

You can also override all these parameters per-query.

Search Options

As seen in the example above, both query text and search options should be put into the 'WHERE' clause of the search query column (i.e. the 3rd column); the options are separated by semicolons (';') and separate names from values using an equals sign ('='). Any number of options can be specified. Available options are:

  • query - query text;
  • mode - matching mode. Must be one of "all", "any", "phrase", "boolean", or "extended". Default is "all";
  • sort - match sorting mode. Must be one of "relevance", "attr_desc", "attr_asc", "time_segments", or "extended". In all modes besides "relevance" attribute name (or sorting clause for "extended") is also required after a colon:
... WHERE query='test;sort=attr_asc:group_id';
... WHERE query='test;sort=extended:@weight desc, group_id asc';
  • offset - offset into result set, default is 0;
  • limit - amount of matches to retrieve from result set, default is 20;
  • index - names of the indexes to search:
... WHERE query='test;index=test1;';
... WHERE query='test;index=test1,test2,test3;';
  • minid, maxid - min and max document ID to match;
  • weights - comma-separated list of weights to be assigned to Sphinx full-text fields:
... WHERE query='test;weights=1,2,3;';
  • filter, !filter - comma-separated attribute name and a set of values to match:
# only include groups 1, 5 and 19
... WHERE query='test;filter=group_id,1,5,19;';

# exclude groups 3 and 11
... WHERE query='test;!filter=group_id,3,11;';
  • range, !range - comma-separated attribute name, min and max value to match:
# include groups from 3 to 7, inclusive
... WHERE query='test;range=group_id,3,7;';

# exclude groups from 5 to 25
... WHERE query='test;!range=group_id,5,25;';
  • maxmatches - per-query max matches value:
... WHERE query='test;maxmatches=2000;';
  • groupby - group-by function and attribute:
... WHERE query='test;groupby=day:published_ts;';
... WHERE query='test;groupby=attr:group_id;';
  • groupsort - group-by sorting clause:
... WHERE query='test;groupsort=@count desc;';
  • indexweights - comma-separated list of index names and weights to use when searching through several indexes:
... WHERE query='test;indexweights=idx_exact,2,idx_stemmed,1;';
  • comment - a string to mark this query in query log (mapping to $comment parameter in Query() API call):
... WHERE query='test;comment=marker001;';
  • select - a string with expressions to compute (mapping to SetSelect() API call):
... WHERE query='test;select=2*a+3*b as myexpr;';

Note: It is much more efficient to allow Sphinx to perform sorting, filtering, and slicing of the result set than to raise max matches count and use 'WHERE', 'ORDER BY', and 'LIMIT' clauses on the MariaDB side. This is for two reasons:

  1. Sphinx does a number of optimizations and performs better than MariaDB/MySQL on these tasks.
  2. Less data would need to be packed by searchd, and transferred and unpacked by SphinxSE.

SHOW ENGINE SPHINX STATUS

Starting with version 0.9.9-rc1, additional query info besides the result set can be retrieved with the 'SHOW ENGINE SPHINX STATUS' statement:

mysql> SHOW ENGINE SPHINX STATUS;
+--------+-------+-------------------------------------------------+
| Type   | Name  | Status                                          |
+--------+-------+-------------------------------------------------+
| SPHINX | stats | total: 25, total found: 25, time: 126, words: 2 | 
| SPHINX | words | sphinx:591:1256 soft:11076:15945                | 
+--------+-------+-------------------------------------------------+
2 rows in set (0.00 sec)

This information can also be accessed through status variables. Note that this method does not require super-user privileges.

mysql> SHOW STATUS LIKE 'sphinx_%';
+--------------------+----------------------------------+
| Variable_name      | Value                            |
+--------------------+----------------------------------+
| sphinx_total       | 25                               | 
| sphinx_total_found | 25                               | 
| sphinx_time        | 126                              | 
| sphinx_word_count  | 2                                | 
| sphinx_words       | sphinx:591:1256 soft:11076:15945 | 
+--------------------+----------------------------------+
5 rows in set (0.00 sec)

JOINs with SphinxSE

You can perform JOINs on a SphinxSE search table and tables using other engines. Here's an example with "documents" from example.sql:

mysql> SELECT content, date_added FROM test.documents docs
-> JOIN t1 ON (docs.id=t1.id) 
-> WHERE query="one document;mode=any";
+-------------------------------------+---------------------+
| content                             | docdate             |
+-------------------------------------+---------------------+
| this is my test document number two | 2006-06-17 14:04:28 | 
| this is my test document number one | 2006-06-17 14:04:28 | 
+-------------------------------------+---------------------+
2 rows in set (0.00 sec)

mysql> SHOW ENGINE SPHINX STATUS;
+--------+-------+---------------------------------------------+
| Type   | Name  | Status                                      |
+--------+-------+---------------------------------------------+
| SPHINX | stats | total: 2, total found: 2, time: 0, words: 2 | 
| SPHINX | words | one:1:2 document:2:2                        | 
+--------+-------+---------------------------------------------+
2 rows in set (0.00 sec)

Building snippets (excerpts) via MariaDB

Starting with version 0.9.9-rc2, SphinxSE also includes a UDF function that lets you create snippets through MariaDB. The functionality is fully similar to the BuildExcerprts API call but is accessible through MariaDB+SphinxSE.

The binary that provides the UDF is named sphinx.so and is automatically built and installed to the proper location along with SphinxSE itself. Register the UDF using the following statement:

CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'sphinx.so';

The function name must be 'sphinx_snippets', you can not use an arbitrary name. Function arguments are as follows:

Prototype: function sphinx_snippets ( document, index, words, [options] );

Document and words arguments can be either strings or table columns. Options must be specified like this: <code>'value' AS option_name</code>. For a list of supported options, refer to the BuildExcerprts() API call. The only UDF-specific additional option is named 'sphinx' and lets you specify searchd location (host and port).

Usage examples:

SELECT sphinx_snippets('hello world doc', 'main', 'world',
    'sphinx://192.168.1.1/' AS sphinx, true AS exact_phrase,
    '[b]' AS before_match, '[/b]' AS after_match)
FROM documents;

SELECT title, sphinx_snippets(text, 'index', 'mysql php') AS text
    FROM sphinx, documents
    WHERE query='mysql php' AND sphinx.id=documents.id;

More Information

More information on Sphinx and SphinxSE is available on the Sphinx website.

Commenti

 
Nessun Commento
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.
Back to Top