Mroonga Overview



Once Mroonga has been installed (see About Mroonga), its basic usage is similar to that of a regular fulltext index.



For example:


CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy)) ENGINE=Mroonga;

INSERT INTO ft_mroonga(copy) VALUES ('Once upon a time'),
    ('There was a wicked witch'), ('Who ate everybody up');

SELECT * FROM ft_mroonga WHERE MATCH(copy) AGAINST('wicked');
+--------------------------+
| copy                     |
+--------------------------+
| There was a wicked witch |
+--------------------------+

Score

Mroonga can also order by weighting. For example, first add another record:

INSERT INTO ft_mroonga(copy) VALUES ('She met a wicked, wicked witch');

Records can be returned by weighting, for example, the newly added record has two occurences of the word 'wicked' and a higher weighting:

SELECT *, MATCH(copy) AGAINST('wicked') AS score FROM ft_mroonga 
   WHERE MATCH(copy) AGAINST('wicked') ORDER BY score DESC;
+--------------------------------+--------+
| copy                           | score  |
+--------------------------------+--------+
| She met a wicked, wicked witch | 299594 |
| There was a wicked witch       | 149797 |
+--------------------------------+--------+

Parser

Mroonga permits you to set a different parser for searching by specifying the parser in the CREATE TABLE statement as a comment or, in older versions, changing the value of the mroonga_default_parser system variable.

For example:

CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy) COMMENT 'parser "TokenDelimitNull"') 
  ENGINE=Mroonga;, 

or

SET GLOBAL mroonga_default_parser = 'TokenBigramSplitSymbol';

The following parser settings are available:

SettingDescription
offNo tokenizing is performed.
TokenBigramDefault value. Continuous alphabetical characters, numbers or symbols are treated as a token.
TokenBigramIgnoreBlankSame as TokenBigram except that white spaces are ignored.
TokenBigramIgnoreBlankSplitSymbolSame as TokenBigramSplitSymbol. except that white spaces are ignore.
TokenBigramIgnoreBlankSplitSymbolAlphaSame as TokenBigramSplitSymbolAlpha except that white spaces are ignored.
TokenBigramIgnoreBlankSplitSymbolAlphaDigitSame as TokenBigramSplitSymbolAlphaDigit except that white spaces are ignored.
TokenBigramSplitSymbolSame as TokenBigram except that continuous symbols are not treated as a token, but tokenised in bigram.
TokenBigramSplitSymbolAlphaSame as TokenBigram except that continuous alphabetical characters are not treated as a token, but tokenised in bigram.
TokenDelimitTokenises by splitting on white spaces.
TokenDelimitNullTokenises by splitting on null characters (\0).
TokenMecabTokenise using MeCab. Required Groonga to be buillt with MeCab support.
TokenTrigramTokenises in trigrams but continuous alphabetical characters, numbers or symbols are treated as a token.
TokenUnigramTokenises in unigrams but continuous alphabetical characters, numbers or symbols are treated as a token.

Examples

TokenBigram vs TokenBigramSplitSymbol

TokenBigram failing to match partial symbols which TokenBigramSplitSymbol matches, since TokenBigramSplitSymbol does not treat continuous symbols as a token.

DROP TABLE ft_mroonga;
CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy) COMMENT 'parser "TokenBigram"') 
  ENGINE=Mroonga;
INSERT INTO ft_mroonga(copy) VALUES ('Once upon a time'),   
  ('There was a wicked witch'), 
  ('Who ate everybody up'), 
  ('She met a wicked, wicked witch'), 
  ('A really wicked, wicked witch!!?!');
SELECT * FROM ft_mroonga WHERE MATCH(copy) AGAINST('!?');
Empty set (0.00 sec)

DROP TABLE ft_mroonga;
CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy) COMMENT 'parser "TokenBigramSplitSymbol"') 
  ENGINE=Mroonga;
INSERT INTO ft_mroonga(copy) VALUES ('Once upon a time'),   
  ('There was a wicked witch'), 
  ('Who ate everybody up'), 
  ('She met a wicked, wicked witch'), 
  ('A really wicked, wicked witch!!?!');
SELECT * FROM ft_mroonga WHERE MATCH(copy) AGAINST('!?');
+-----------------------------------+
| copy                              |
+-----------------------------------+
| A really wicked, wicked witch!!?! |
+-----------------------------------+

TokenBigram vs TokenBigramSplitSymbolAlpha

DROP TABLE ft_mroonga;
CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy) COMMENT 'parser "TokenBigram"') 
  ENGINE=Mroonga;
INSERT INTO ft_mroonga(copy) VALUES ('Once upon a time'),   
  ('There was a wicked witch'), 
  ('Who ate everybody up'), 
  ('She met a wicked, wicked witch'), 
  ('A really wicked, wicked witch!!?!');
SELECT * FROM ft_mroonga WHERE MATCH(copy) AGAINST('ick');
Empty set (0.00 sec)

DROP TABLE ft_mroonga;
CREATE TABLE ft_mroonga(copy TEXT,FULLTEXT(copy) COMMENT 'parser "TokenBigramSplitSymbolAlpha"') 
  ENGINE=Mroonga;
INSERT INTO ft_mroonga(copy) VALUES ('Once upon a time'),   
  ('There was a wicked witch'), 
  ('Who ate everybody up'), 
  ('She met a wicked, wicked witch'), 
  ('A really wicked, wicked witch!!?!');
SELECT * FROM ft_mroonga WHERE MATCH(copy) AGAINST('ick');
+-----------------------------------+
| copy                              |
+-----------------------------------+
| There was a wicked witch          |
| She met a wicked, wicked witch    |
| A really wicked, wicked witch!!?! |
+-----------------------------------+

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.