Wednesday, March 28, 2012

Normalizing full-text search rank?

I read many good postings here about the weird logic of full-text
ranking, and I hope I finally got it (if it is possible at all ;).
Yet... We biult an application to search in our database in varoius
ways, incl. full-text search. I want to show users only "relevant"
matches, i.e. matches that score above some threshold. For other
(simple) search options one can easily return a score between 0 and
100% - for example, macthing 4 of 5 parameters searched will count
80%. How about full-text search, though? I can't simply normalize the
range 0-1000 saying rank 0 = 0%, rank 1000 = 100%, since even exact
matches (e.g. searching for "salt" in a row containing just that one
single word) return ranks as low as 160.
Any ideas, thoughts, considerations are welcome!
This can't really be done. The problem is that rank changes per search
arguement (SARG)/token. So searching on a word like "salt" might return a
set of relatively low rankings, but searching on a word which occurs
relatively rarely in your docuement set like "anthropormorphological" would
return a higher rank.
Words/SARGs/Tokens which occur relatively rarely have higher resolving power
than Words/SARGs/Tokens which occur frequently. For instance consider you
searching for a friend named John Smith. John Smith is a very common name,
so you need something that distinguishes your John Smith from all other John
Smith's - ie something relatively rare - like a middle name. Ranking weights
common Words/SARGs/Tokens lower, than Words/SARGs/Tokens which occur more
rarely as these Words/SARGs/Tokens has higher resolving power and hence
relevance.
Ranking is an attempt to measure relevance. It is a statistical measure and
is useful for ordering search results. Your actual "hit" could be the first,
second, of 15th result.
The best thing to do, is to order your results, pick the highest ranked
item, divide all other ranks by this value and multiply by 100.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"istoynev" <istoynev@.hotmail.com> wrote in message
news:3b49ac82.0405142252.1a80edab@.posting.google.c om...
> I read many good postings here about the weird logic of full-text
> ranking, and I hope I finally got it (if it is possible at all ;).
> Yet... We biult an application to search in our database in varoius
> ways, incl. full-text search. I want to show users only "relevant"
> matches, i.e. matches that score above some threshold. For other
> (simple) search options one can easily return a score between 0 and
> 100% - for example, macthing 4 of 5 parameters searched will count
> 80%. How about full-text search, though? I can't simply normalize the
> range 0-1000 saying rank 0 = 0%, rank 1000 = 100%, since even exact
> matches (e.g. searching for "salt" in a row containing just that one
> single word) return ranks as low as 160.
> Any ideas, thoughts, considerations are welcome!
|||Thanks, Hilary!
I also thought about this trick, although it has bad bitter
performance impact, I am afraid... But it might be the only feasible
solution.
"Hilary Cotter" <hilaryk@.att.net> wrote in message news:<OiOcxB0OEHA.3300@.TK2MSFTNGP09.phx.gbl>...
> The best thing to do, is to order your results, pick the highest ranked
> item, divide all other ranks by this value and multiply by 100.
|||Again, performance with a TSQL solution would be expensive, if it was done on the client it would not be that expensive.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
sql

No comments:

Post a Comment