Wednesday, March 7, 2012

Noise word blues

Hi,
I'm having a noise word problem that I don't seem to see in any of the other
postings.
Here is the setup:
SQL Server 2000 Standard Edition SP3 running on Win 2003 Server.
In order to avoid the normal noise word problems we edited the noise word
file for english by just putting a space in it and saving the file and
restarting SQL Server and the search service (reboot of machine actually).
However there are still some characters that when used with a CONTAINS still
give back the noise word error. The ones we found tend to be special single
characters such as "?" or "_".
An example would be: WHERE CONTAINS(*, '"Pick" AND "?" AND "List"')
Is there any way to get these to be ignored as well or is there a
comprehensive list of these characters so we could parse them out on in the
application code? We've tried neutral word breaking and using FREETEXT and
that didn't work either.
Thanks in advance for any help,
Wayne Antinore
Unfortunately what you have to do is parse your content and replace these
characters with a token, i.e. replace "?" with " QuestionMark " and then in
your queries when someone is querying on a ? expand this to a query on
QuestionMark.
You will need to replace the ? mark with a token which will not be searched
on and you will need to surround this token with white space for it to work
correctly.
When you return the data to the client make sure you replace the token back
with ?, or have non marked up content somewhere and return this content
instead.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Wayne Antinore" <wantinore@.veramark.com> wrote in message
news:uZNyLtirFHA.2212@.TK2MSFTNGP15.phx.gbl...
> Hi,
> I'm having a noise word problem that I don't seem to see in any of the
other
> postings.
> Here is the setup:
> SQL Server 2000 Standard Edition SP3 running on Win 2003 Server.
> In order to avoid the normal noise word problems we edited the noise word
> file for english by just putting a space in it and saving the file and
> restarting SQL Server and the search service (reboot of machine actually).
> However there are still some characters that when used with a CONTAINS
still
> give back the noise word error. The ones we found tend to be special
single
> characters such as "?" or "_".
> An example would be: WHERE CONTAINS(*, '"Pick" AND "?" AND "List"')
> Is there any way to get these to be ignored as well or is there a
> comprehensive list of these characters so we could parse them out on in
the
> application code? We've tried neutral word breaking and using FREETEXT
and
> that didn't work either.
> Thanks in advance for any help,
> Wayne Antinore
>
|||Thanks Hilary,
Yuck!, but thanks anyway at least now I know there isn't a quick fix that
I'm missing
Wayne
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:OJv7yJjrFHA.2272@.TK2MSFTNGP11.phx.gbl...
> Unfortunately what you have to do is parse your content and replace these
> characters with a token, i.e. replace "?" with " QuestionMark " and then
> in
> your queries when someone is querying on a ? expand this to a query on
> QuestionMark.
> You will need to replace the ? mark with a token which will not be
> searched
> on and you will need to surround this token with white space for it to
> work
> correctly.
> When you return the data to the client make sure you replace the token
> back
> with ?, or have non marked up content somewhere and return this content
> instead.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Wayne Antinore" <wantinore@.veramark.com> wrote in message
> news:uZNyLtirFHA.2212@.TK2MSFTNGP15.phx.gbl...
> other
> still
> single
> the
> and
>

No comments:

Post a Comment