MySQL Full Text Search
? Available only for MyISAM tables
? Natural Language Search and boolean search
? Query Expansion support
? ft_min_word_len – minimum 3 char per word by default
? Extensive stop word list by default
– Containing «zero», «talk» etc
? Frequency based ranking
– Distance between words is not counted
MySQL Full Text Search Example
mysql> CREATE TABLE articles (
-> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
-> title VARCHAR(200),
-> body TEXT,
-> FULLTEXT (title,body)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO articles (title,body) VALUES
-> ('MySQL Tutorial','DBMS stands for DataBase ...'),
-> ('How To Use MySQL Well','After you went through a ...'),
-> ('Optimizing MySQL','In this tutorial we will show ...'),
-> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
-> ('MySQL vs. YourSQL','In the following database comparison ...'),
-> ('MySQL Security','When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM articles
-> WHERE MATCH (title,body)
-> AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)
MySQL Full Text Search Performance
? Index (and better data) should fit in memory
– And key_buffer_size sized appropriately
? Stop words list should contain most frequent
keywords
– But make sure you do not need to search by them
? Lowering ft_min_word_len decreases
performance
– Low length keywords are often frequent
? OPTIMIZE TABLE
– updates/deletes fragment index
Avoid Count and Limit
? Avoid counting number of matches
– select title from cont where match (title,content) against ("democratic president candidate") limit 10;
? Takes 0.25 sec
– select count(*) from cont where match (title,content) against ("democratic president candidate")
? Takes 16 min 13.41 sec
? Avoid Sorting
– By separate field as «date» or by self computed relevance in boolean search
– select title,match(title) against ("post office" in boolean mode)*10 + match(content) against("post office" in boolean mode) relevance from
cont where match (title,content) against ("post office" in boolean mode)
order by relevance desc limit 10;
? Takes 3 min 38.35 sec
Be careful with WHERE and LIMIT
? Be care careful with extra where clause if it filters out a lot of matches.
– select title from cont where match (title,content) against ("global service
company") and date_added>unix_timestamp("2006-07-18 18:00:00")
limit 10;
? Takes 26 min 43.59 sec
? Large LIMIT offset slow down dramatically
– select title from cont where match (title,content) against ("computer game industry") limit 1000,10;
? Takes 9.99 sec
GROUP BY and Phrase Search
? GROUP BY requires all results so it is slow as well
– select count(*) cnt,date(from_unixtime(date_added)) from cont where
match (title,content) against ("computer game industry") group by date(from_unixtime(date_added));
? Takes 26 min 38.76 sec
– For example if we want to graph number of matches over time.
? Phrase Search makes Boolean Search pretty
slow
– select title from cont where match (title,content) against ('"bill gates will"' in boolean mode) limit 10;
? Takes 18.08 sec
– MySQL has to look at data rows which match words to check if there is
phrase match so it is especially bad for frequent words in rare
combination