Sanal.org
till 01th May 2006
Categories
.NET (40)
Art (1)
Blogs (1)
Computers (6)
Information (13)
Internet (11)
Movies (5)
Music (19)
MY SQL (5)
People (16)
Russian (20)
Security (15)
SEO (3)
SQL SERVER (8)
Technology (6)
Turkey (2)
Video (7)
More
TOP 50
Turkish News


Archive
April (2008)
March (2008)
February (2008)
January (2008)
December (2007)
Novamber (2007)
October (2007)
September (2007)
August (2007)
July (2007)
June (2007)
May (2007)
April (2007)
March (2007)
February (2007)
January (2007)
December (2006)
Novamber (2006)
October (2006)
September (2006)
August (2006)
July (2006)
June (2006)
May (2006)
April (2006)
March (2006)
February (2006)
January (2006)
December (2005)
October (2005)
April (2005)
December (1969)

189

ROR
Previous1Next

High Performance Full Text Search for Database Content (MYSQL)


MySQL Full Text Search

? Available only for MyISAM tables
? Natural Language Search and boolean search
? Query Expansion support
? ft_min_word_len – minimum 3 char per word by default
? Extensive stop word list by default
– Containing «zero», «talk» etc
? Frequency based ranking
– Distance between words is not counted

MySQL Full Text Search Example

mysql> CREATE TABLE articles (
    ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
    ->   title VARCHAR(200),
    ->   body TEXT,
    ->   FULLTEXT (title,body)
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO articles (title,body) VALUES
    -> ('MySQL Tutorial','DBMS stands for DataBase ...'),
    -> ('How To Use MySQL Well','After you went through a ...'),
    -> ('Optimizing MySQL','In this tutorial we will show ...'),
    -> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
    -> ('MySQL vs. YourSQL','In the following database comparison ...'),
    -> ('MySQL Security','When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM articles
    -> WHERE MATCH (title,body)
    -> AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

 

MySQL Full Text Search Performance

? Index (and better data) should fit in memory
– And key_buffer_size sized appropriately
? Stop words list should contain most frequent
keywords
– But make sure you do not need to search by them
? Lowering ft_min_word_len decreases
performance
– Low length keywords are often frequent
? OPTIMIZE TABLE
– updates/deletes fragment  index


Avoid Count and Limit

? Avoid counting number of matches
– select title from cont where match (title,content) against ("democratic president candidate") limit 10;
? Takes 0.25 sec
– select count(*)  from cont where match (title,content) against  ("democratic president candidate")
? Takes 16 min 13.41 sec
? Avoid Sorting
– By separate field as «date» or by self computed relevance in boolean search
– select title,match(title) against ("post office" in boolean mode)*10 + match(content) against("post office" in boolean mode) relevance  from
cont where match (title,content) against ("post office" in boolean mode)
order by relevance desc  limit 10;
? Takes 3 min 38.35 sec

Be careful with WHERE and LIMIT
? Be care careful with extra where clause if it filters out a lot of matches.
– select title  from cont where match (title,content) against ("global service
company") and date_added>unix_timestamp("2006-07-18 18:00:00")
limit 10;
? Takes  26 min 43.59 sec
? Large LIMIT offset slow down dramatically
– select title  from cont where match (title,content) against ("computer game industry") limit 1000,10;
? Takes  9.99 sec

GROUP BY and Phrase Search
? GROUP BY requires all results so it is slow as well
– select count(*) cnt,date(from_unixtime(date_added))   from cont where
match (title,content) against ("computer game industry") group by date(from_unixtime(date_added));
? Takes 26 min 38.76 sec
– For example if we want to graph number of matches over time.
? Phrase Search makes Boolean Search pretty
slow
– select title   from cont where match (title,content) against ('"bill gates will"' in boolean mode) limit 10;
? Takes 18.08 sec
– MySQL has to look at data rows which match words to check if there is
phrase match so it is especially bad for frequent words in rare
combination




NMEA DATA


Disclaimer

This site is based on personal research and is believed to be accurate but there is no guarantee that any of the information is correct or suitable for any purpose. I have been told by the NMEA folks that my information is old and out of date. The current version of NMEA at the time I wrote this is 3.01 which is not described here. This site is for historical information and is not intended to be used for any official purpose. For official data please contact the NMEA web site. Please see the bottom of this article for the sources of this data.

Introduction

The National Marine Electronics Association (NMEA) has developed a specification that defines the interface between various pieces of marine electronic equipment. The standard permits marine electronics to send information to computers and to other marine equipment. A full copy of this standard is available for purchase at their web site. None of the information on this site comes from this standard and I do not have a copy. Anyone attempting to design anything to this standard should obtain an official copy.

GPS receiver communication is defined within this specification. Most computer programs that provide real time position information understand and expect data to be in NMEA format. This data includes the complete PVT (position, velocity, time) solution computed by the GPS receiver. The idea of NMEA is to send a line of data called a sentence that is totally self contained and independent from other sentences. There are standard sentences for each device category and there is also the ability to define proprietary sentences for use by the individual company. All of the standard sentences have a two letter prefix that defines the device that uses that sentence type. (For gps receivers the prefix is GP.) which is followed by a three letter sequence that defines the sentence contents. In addition NMEA permits hardware manufactures to define their own proprietary sentences for whatever purpose they see fit. All proprietary sentences begin with the letter P and are followed with 3 letters that identifies the manufacturer controlling that sentence. For example a Garmin sentence would start with PGRM and Magellan would begin with PMGN.

Each sentence begins with a '$' and ends with a carriage return/line feed sequence and can be no longer than 80 characters of visible text (plus the line terminators). The data is contained within this single line with data items separated by commas. The data itself is just ascii text and may extend over multiple sentences in certain specialized instances but is normally fully contained in one variable length sentence. The data may vary in the amount of precision contained in the message. For example time might be indicated to decimal parts of a second or location may be show with 3 or even 4 digits after the decimal point. Programs that read the data should only use the commas to determine the field boundaries and not depend on column positions. There is a provision for a checksum at the end of each sentence which may or may not be checked by the unit that reads the data. The checksum field consists of a '*' and two hex digits representing an 8 bit exclusive OR of all characters between, but not including, the '$' and '*'. A