UESPWiki:Administrator Noticeboard/Archives/Search Issues and Improvements
This is an archive of past UESPWiki:Administrator Noticeboard/Archives discussions. Do not edit the contents of this page, except for maintenance such as updating links. |
The addition of the search log as quickly revealed a number of short comings compared to the current site search, the biggest of which is that some 75% of all searches performed on the site return no results. A good amount of this may be due to the "Search Titles Only" user preference defaulting to TRUE which is probably not the best value for a new or anonymous user trying to find something. You can see a lot of searches in the Special:SearchLog (patroller permission required) which should have valid results but return none due to this.
For now I've set the searches to always ignore the user's "Search Titles Only" preference and always search for both titles and text. This will consume a bit more resources but shouldn't be near enough to make a noticeable difference. We'll see how this changes the number of empty search results (much better in just the past 5 minutes).
The search log has also revealed the relatively poor performance of some searches. The first time a search for a term is performed (or after the MySQL query cache is invalidated) the search can easily take up to 10 seconds to perform. This is just the time of database query itself so the time seen by the user will be even more. There are a number of better search engines for MediaWiki, notably Lucene and Sphinx, and I've installed Sphinx on content3 for testing purposes. Feel free to visit http://content3.uesp.net/wiki/Main_Page and try it out. It is currently just the default installation and as such only looks for text matches and not article titles.
The main benefits of Sphinx (and/or Lucene as they seem to have comparable feature sets):
-
- Faster...much faster. Even a "slow" Sphinx query takes only 50ms with most performing in under 10ms which is 10-1000x faster than the current search.
- The ability to customize the "Did you Mean..." search feature. For example, try searching for "lycanthorp" on content3 (note the misspelling). Custom words can be edited from within the Wiki.
- Customized stop word list. The current MySQL full text search contains an almost silly amount of stop words which are ignored in searches. A much smaller number of stop words (the, a, and, etc...) would probably make more sense in our case.
- Ability to run on searches on dedicated hardware. Sphinx can be setup to run on the content servers or on a separate server altogether making scaling much easier if/when needed.
The one notable downside to Sphinx/Lucene is that pages must be re-indexed in order for their content to be available for search results. The UESP Wiki is not too large so this re-indexing is relatively quick but page edits may still take minutes to hours to be returned in searches depending on how the re-indexing is setup to take place.
Comments and suggestions are welcome. There still remains a good amount of work to get Sphinx/Lucene in shape to replace the search full time, in particular the ability to search for titles and some namespace awareness like the existing search/go feature. Likely I will simply reuse the existing title search and use Sphinx for the text search. -- Daveh 01:00, 27 April 2011 (UTC)
- Sphinx for text searching works fine for me. I can't use the new search log tool myself, but if the results are that bad then I think a new search engine is worth the time investment. Legoless 01:30, 27 April 2011 (UTC)
-
- The title search being the default was something Nephele changed ages ago when the site was experiencing severe slowdowns - I can't find the discussion but it's archived somewhere. The new one is lightning fast, but the number of results it's returning is a bit frightening "throat of the world" returns 1000 results, because it seems to be searching for each word rather than the term. That's going to make improving Lore pages a lot trickier. In general, though, it looks like a much better search system than the current one. rpeh •T•C•E• 18:43, 27 April 2011 (UTC)