Till now I've always had problems getting SQL Server 2005 Full Text Indexing (FTI) to perform well in real world scenarios. Recently I found it was because I wasn't implementing it correctly, so I'll post this tip which will hopefully help myself and others get it right next time.
The query is a user entered keyword search on text columns using the FTI predicates and T-SQL predicates on other relational columns. The query is over a moderately large number of and returns paged results.
FTI is basically the ability to search from text columns where the text has been indexed with natural language, relevance and ranking considerations built in. XML and other data type can also be index natively, but I haven't used that. FTI in SQL Server 2005 uses the Microsoft Search Service. This service is not actually part of SQL Server itself, it is also used by Exchange and Sharepoint. I think you can use the service directly with a Sharepoint SDK.
To use Full Text Indexing in SQL server you must enable full text indexing to the entire database and the table and columns containing the searchable text. It can all be setup from SQL Management Studio, but it does create a new physical folder in addition to the normal database MDF and LDF files. Once this has been done additional FTI predicates can be used on the selected columns.
As SQL Server Express doesn't support FTI, I think it is worth considering optionally supporting normal T-SQL similar to applications support querying using the FTI and functions. Its nice to be able develop or even deploy with SQL Server Express even though the text searching is not indexed using the Search Services. But this really depends on circumstances, I just wanted to note supporting both is an option.
I found a query similar to this example was taking ages over a large amount of data. It was unacceptable, we had to render all pages in less than 4 seconds under moderate load and we expected to get less than two seconds. We we getting search times alone in double figures for some keywords.
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER ([DocumentInformation].PublishDate ASC) as RowNumber, FROM Document JOIN DocumentInformation ON DocumentInformation.DocumentId = Document.DocumentId JOIN UserAccount Owner ON Document.OwnerId = UserAccount.UserId JOIN UserAccount Publisher ON Document.PublisherId = Publisher.UserId WHERE [Document].Published = 1 AND [Owner].IsActive = 1 AND [Publisher].IsActive = 1 AND CONTAINS([Document].DocumentText, "FTI*" ) ) x WHERE RowNumber BETWEEN 1 AND 10 ORDER BY RowNumber
Below has an additional inner query on the document table alone, using just the CONTAINS predicates on the document text. The additional tables are joined to the results and the normal T-SQL predicates are added.
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER ([DocumentInformation].PublishDate ASC) as RowNumber, FROM (SELECT *, DocumentId FROM dbo.Document WHERE CONTAINS(([Documemnt].ProductKeywords, "FTI*" )) p JOIN DocumentInformation ON [DocumentInformation].DocumentId = p.DocumentId JOIN UserAccount Owner ON Document.OwnerId = UserAccount.UserId JOIN UserAccount Publisher ON Document.PublisherId = Publisher.UserId WHERE [Document].Published = 1 AND [Owner].IsActive = 1 AND [Publisher].IsActive = 1 ) x WHERE RowNumber BETWEEN 1 AND 10 ORDER BY RowNumber
I gave this a go after reading an FTI Best Practice article that indicated the searchable text rows couldn't be excluded from a search due to the search service running in a separate process. The result of this subtle change was amazing, the searches were obviously much quicker and returned the same result set. Under load test we found this change alone improved the average execution time of all pages on the website by a massive 30%.
In addition to this the FTI best practices document also recommends the following optimizations to improve full text performance.
We ended up implementing all the recommendations above to try and get the most out of the FTI service. After all this we got search times down to about a second under light load, this was cool but we had grander plans for an in memory search using of expression trees for search filters and LINQ for querying the in memory objects. I hope to chronicle this in future posts as initial performance testing shows much quicker search times and a significant capacity load increase by removing searching from SQL Server which was previously our performance bottleneck.
In SQL Server 2008 they have apparently integrated FTI into the SQL Server process which appears to be the cause of my original performance problem. But so far I've only heard bad things, hopefully these issues will be cleaned up before I start using it.