Showing posts with label unicode. Show all posts
Showing posts with label unicode. Show all posts

Friday, March 23, 2012

Non-Unicode to Unicode Data conversion

Hi all, we are now planning to upgrade our application from a
non-unicode version to a unicode version. The application's backend is
a SQL Server 2000 SP3.

The concern is, existing business data are stored using collation
"Chinese_PRC_CI_AS", i.e. Simplified Chinese. So I thought we need to
extract these data out to the new SQL Server which is using Unicode (I
assume it means converting them to nchar, nvarchar type of fields for I
don't enough information from the application side, or is there a
general unicode collation that will make even char and varchar types to
store data as Unicode?).

The problem is what's the best and most efficient way to do this data
conversion?
bcp? DTS? or others?

thanks a lotNew MSSQL DBA (boscong88@.gmail.com) writes:
> Hi all, we are now planning to upgrade our application from a
> non-unicode version to a unicode version. The application's backend is
> a SQL Server 2000 SP3.
> The concern is, existing business data are stored using collation
> "Chinese_PRC_CI_AS", i.e. Simplified Chinese. So I thought we need to
> extract these data out to the new SQL Server which is using Unicode (I
> assume it means converting them to nchar, nvarchar type of fields for I
> don't enough information from the application side, or is there a
> general unicode collation that will make even char and varchar types to
> store data as Unicode?).

You will have to move to nchar/nvarchar.

> The problem is what's the best and most efficient way to do this data
> conversion?
> bcp? DTS? or others?

One idea would be to create a new database on the same server, with
the (var)char columns changed to n(var)char columns, and then insert
data over. In this case you will get a conversion from the multi-byte
character set you use today. You would then move that database to the
new server with detach/attach or backup/restore.

You would not create indexes, constraints and triggers in the new
database, until you have copied the data.

Using BCP meand that you have to bounce over disk. Then again, bulk-
load is faster so it could still be faster. Here I cannot really say
that you will get a conversion, although I believe that you would.
(I have never converted Chinese text from double-byte to Unicode, so
I don't really know what works and what does not.)

As for DTS, I don't know DTS at all, so I can't say whether it's good or
not.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||New MSSQL DBA (boscong88@.gmail.com) writes:
> Hi all, we are now planning to upgrade our application from a
> non-unicode version to a unicode version. The application's backend is
> a SQL Server 2000 SP3.
> The concern is, existing business data are stored using collation
> "Chinese_PRC_CI_AS", i.e. Simplified Chinese. So I thought we need to
> extract these data out to the new SQL Server which is using Unicode (I
> assume it means converting them to nchar, nvarchar type of fields for I
> don't enough information from the application side, or is there a
> general unicode collation that will make even char and varchar types to
> store data as Unicode?).

You will have to move to nchar/nvarchar.

> The problem is what's the best and most efficient way to do this data
> conversion?
> bcp? DTS? or others?

One idea would be to create a new database on the same server, with
the (var)char columns changed to n(var)char columns, and then insert
data over. In this case you will get a conversion from the multi-byte
character set you use today. You would then move that database to the
new server with detach/attach or backup/restore.

You would not create indexes, constraints and triggers in the new
database, until you have copied the data.

Using BCP meand that you have to bounce over disk. Then again, bulk-
load is faster so it could still be faster. Here I cannot really say
that you will get a conversion, although I believe that you would.
(I have never converted Chinese text from double-byte to Unicode, so
I don't really know what works and what does not.)

As for DTS, I don't know DTS at all, so I can't say whether it's good or
not.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks for your help.

I am wondering I can do an in-place conversion by altering the char,
varchar, text columns of the original non-unicode DB to nchar, nvarchar
and ntext columns using the ALTER TABLE ... ALTER COLUMN commands.

I've done a bit of testing and it seems working. Just wondering
whether there are any catch of doing so. (of couse, will backup
everything before changes).|||Thanks for your help.

I am wondering I can do an in-place conversion by altering the char,
varchar, text columns of the original non-unicode DB to nchar, nvarchar
and ntext columns using the ALTER TABLE ... ALTER COLUMN commands.

I've done a bit of testing and it seems working. Just wondering
whether there are any catch of doing so. (of couse, will backup
everything before changes).|||New MSSQL DBA (boscong88@.gmail.com) writes:
> I am wondering I can do an in-place conversion by altering the char,
> varchar, text columns of the original non-unicode DB to nchar, nvarchar
> and ntext columns using the ALTER TABLE ... ALTER COLUMN commands.
> I've done a bit of testing and it seems working. Just wondering
> whether there are any catch of doing so. (of couse, will backup
> everything before changes).

I can't see any problems with this.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||New MSSQL DBA (boscong88@.gmail.com) writes:
> I am wondering I can do an in-place conversion by altering the char,
> varchar, text columns of the original non-unicode DB to nchar, nvarchar
> and ntext columns using the ALTER TABLE ... ALTER COLUMN commands.
> I've done a bit of testing and it seems working. Just wondering
> whether there are any catch of doing so. (of couse, will backup
> everything before changes).

I can't see any problems with this.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Monday, March 12, 2012

non unicode

Hi,
Is DT_STR Unicode? From SSIS I am using a data conversion component
and defining the data type as DT_STR and saving the output in excel
file. I am getting an error that 'can not convert from Unicode to non-
unicode'. I thought I have already taken care of that in data
conversion.
Could you explain exactly which data conversion you prepared? The DT_STR was
specified for the input field or for the output? In this last case, which is
the input field data type?
Gilberto Zampatti
"SB" wrote:

> Hi,
> Is DT_STR Unicode? From SSIS I am using a data conversion component
> and defining the data type as DT_STR and saving the output in excel
> file. I am getting an error that 'can not convert from Unicode to non-
> unicode'. I thought I have already taken care of that in data
> conversion.
>
|||On Apr 20, 2:10 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> Could you explain exactly which data conversion you prepared? The DT_STR was
> specified for the input field or for the output? In this last case, which is
> the input field data type?
> Gilberto Zampatti
>
> "SB" wrote:
> - Show quoted text -
Hi,
DT_STR should be an input field because I am getting data from a table
using ole db source.
|||OK; supposing that input field is actually non-unicode (DT_STR could be the
right data type), verify the transformation task to be sure about the output
data type. SQL 2005 uses UNICODE by default, and you should explicitly
declare non unicode data types.
If you correct the input field definition specifying Unicode (DT_WSTR), you
should avoid such error message.
The fact that you are accessing input table via OLE DB does'nt change the
fact that QSL 2005 uses Unicode by default.
Gilberto Zampatti
"SB" wrote:

> On Apr 20, 2:10 pm, Gilberto Zampatti
> <GilbertoZampa...@.discussions.microsoft.com> wrote:
> Hi,
> DT_STR should be an input field because I am getting data from a table
> using ole db source.
>
|||On Apr 20, 3:28 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> OK; supposing that input field is actually non-unicode (DT_STR could be the
> right data type), verify the transformation task to be sure about the output
> data type. SQL 2005 uses UNICODE by default, and you should explicitly
> declare non unicode data types.
> If you correct the input field definition specifying Unicode (DT_WSTR), you
> should avoid such error message.
> The fact that you are accessing input table via OLE DB does'nt change the
> fact that QSL 2005 uses Unicode by default.
> Gilberto Zampatti
>
> "SB" wrote:
>
>
> - Show quoted text -
Thanks.
For output I have created a excel table using the statement:
CREATE TABLE `Excel Destination` (
`NAME` VARCHAR(),
)
I have tried string, text etc but no success.
Excel Destination editor:
Data Access Mode:
Table or view
Name of the excel sheet:
Excel Destination

non unicode

Hi,
Is DT_STR Unicode? From SSIS I am using a data conversion component
and defining the data type as DT_STR and saving the output in excel
file. I am getting an error that 'can not convert from Unicode to non-
unicode'. I thought I have already taken care of that in data
conversion.Could you explain exactly which data conversion you prepared? The DT_STR was
specified for the input field or for the output? In this last case, which is
the input field data type?
Gilberto Zampatti
"SB" wrote:

> Hi,
> Is DT_STR Unicode? From SSIS I am using a data conversion component
> and defining the data type as DT_STR and saving the output in excel
> file. I am getting an error that 'can not convert from Unicode to non-
> unicode'. I thought I have already taken care of that in data
> conversion.
>|||On Apr 20, 2:10 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> Could you explain exactly which data conversion you prepared? The DT_STR w
as
> specified for the input field or for the output? In this last case, which
is
> the input field data type?
> Gilberto Zampatti
>
> "SB" wrote:
> - Show quoted text -
Hi,
DT_STR should be an input field because I am getting data from a table
using ole db source.|||OK; supposing that input field is actually non-unicode (DT_STR could be the
right data type), verify the transformation task to be sure about the output
data type. SQL 2005 uses UNICODE by default, and you should explicitly
declare non unicode data types.
If you correct the input field definition specifying Unicode (DT_WSTR), you
should avoid such error message.
The fact that you are accessing input table via OLE DB does'nt change the
fact that QSL 2005 uses Unicode by default.
Gilberto Zampatti
"SB" wrote:

> On Apr 20, 2:10 pm, Gilberto Zampatti
> <GilbertoZampa...@.discussions.microsoft.com> wrote:
> Hi,
> DT_STR should be an input field because I am getting data from a table
> using ole db source.
>|||On Apr 20, 3:28 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> OK; supposing that input field is actually non-unicode (DT_STR could be th
e
> right data type), verify the transformation task to be sure about the outp
ut
> data type. SQL 2005 uses UNICODE by default, and you should explicitly
> declare non unicode data types.
> If you correct the input field definition specifying Unicode (DT_WSTR), yo
u
> should avoid such error message.
> The fact that you are accessing input table via OLE DB does'nt change the
> fact that QSL 2005 uses Unicode by default.
> Gilberto Zampatti
>
> "SB" wrote:
>
>
>
> - Show quoted text -
Thanks.
For output I have created a excel table using the statement:
CREATE TABLE `Excel Destination` (
`NAME` VARCHAR(),
)
I have tried string, text etc but no success.
Excel Destination editor:
Data Access Mode:
Table or view
Name of the excel sheet:
Excel Destination|||On Apr 20, 3:28 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> OK; supposing that input field is actually non-unicode(DT_STR could be the
> right data type), verify the transformation task to be sure about the outp
ut
> data type. SQL 2005 usesUNICODEby default, and you should explicitly
> declare nonunicodedata types.
> If you correct the input field definition specifyingUnicode(DT_WSTR), you
> should avoid such error message.
> The fact that you are accessing input table via OLE DB does'nt change the
> fact that QSL 2005 usesUnicodeby default.
> Gilberto Zampatti
>
> "SB" wrote:
>
>
>
> - Show quoted text -
Hi,
Is it alright if I declare a datatype as nvarchar 'create table (name
nvarchar(20))' and then treat it as if varchar for example functions,
inserts and join with other varchar columns etc? Except giving a
larger string size I don't have much use for unicode. They might as
well keep this type internal so that we didn't have to worry about it.|||On Apr 23, 10:41 am, SB <othell...@.yahoo.com> wrote:
> On Apr 20, 3:28 pm, Gilberto Zampatti
>
>
> <GilbertoZampa...@.discussions.microsoft.com> wrote:
>
>
>
>
>
> Hi,
> Is it alright if I declare a datatype as nvarchar 'create table (name
> nvarchar(20))' and then treat it as if varchar for example functions,
> inserts and join with other varchar columns etc? Except giving a
> larger string size I don't have much use for unicode. They might as
> well keep this type internal so that we didn't have to worry about it.- Hi
de quoted text -
> - Show quoted text -
That question was not related to SSIS package (as you have told ssis
is unicode). FWIW I was able to get the data (ssis) imported when I
used unicode for input and output data using a data conversion in
between from varchar to unicode string.

non unicode

Hi,
Is DT_STR Unicode? From SSIS I am using a data conversion component
and defining the data type as DT_STR and saving the output in excel
file. I am getting an error that 'can not convert from Unicode to non-
unicode'. I thought I have already taken care of that in data
conversion.Could you explain exactly which data conversion you prepared? The DT_STR was
specified for the input field or for the output? In this last case, which is
the input field data type?
Gilberto Zampatti
"SB" wrote:
> Hi,
> Is DT_STR Unicode? From SSIS I am using a data conversion component
> and defining the data type as DT_STR and saving the output in excel
> file. I am getting an error that 'can not convert from Unicode to non-
> unicode'. I thought I have already taken care of that in data
> conversion.
>|||On Apr 20, 2:10 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> Could you explain exactly which data conversion you prepared? The DT_STR was
> specified for the input field or for the output? In this last case, which is
> the input field data type?
> Gilberto Zampatti
>
> "SB" wrote:
> > Hi,
> > Is DT_STR Unicode? From SSIS I am using a data conversion component
> > and defining the data type as DT_STR and saving the output in excel
> > file. I am getting an error that 'can not convert from Unicode to non-
> > unicode'. I thought I have already taken care of that in data
> > conversion.- Hide quoted text -
> - Show quoted text -
Hi,
DT_STR should be an input field because I am getting data from a table
using ole db source.|||OK; supposing that input field is actually non-unicode (DT_STR could be the
right data type), verify the transformation task to be sure about the output
data type. SQL 2005 uses UNICODE by default, and you should explicitly
declare non unicode data types.
If you correct the input field definition specifying Unicode (DT_WSTR), you
should avoid such error message.
The fact that you are accessing input table via OLE DB does'nt change the
fact that QSL 2005 uses Unicode by default.
Gilberto Zampatti
"SB" wrote:
> On Apr 20, 2:10 pm, Gilberto Zampatti
> <GilbertoZampa...@.discussions.microsoft.com> wrote:
> > Could you explain exactly which data conversion you prepared? The DT_STR was
> > specified for the input field or for the output? In this last case, which is
> > the input field data type?
> > Gilberto Zampatti
> >
> >
> >
> > "SB" wrote:
> > > Hi,
> > > Is DT_STR Unicode? From SSIS I am using a data conversion component
> > > and defining the data type as DT_STR and saving the output in excel
> > > file. I am getting an error that 'can not convert from Unicode to non-
> > > unicode'. I thought I have already taken care of that in data
> > > conversion.- Hide quoted text -
> >
> > - Show quoted text -
> Hi,
> DT_STR should be an input field because I am getting data from a table
> using ole db source.
>|||On Apr 20, 3:28 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> OK; supposing that input field is actually non-unicode (DT_STR could be the
> right data type), verify the transformation task to be sure about the output
> data type. SQL 2005 uses UNICODE by default, and you should explicitly
> declare non unicode data types.
> If you correct the input field definition specifying Unicode (DT_WSTR), you
> should avoid such error message.
> The fact that you are accessing input table via OLE DB does'nt change the
> fact that QSL 2005 uses Unicode by default.
> Gilberto Zampatti
>
> "SB" wrote:
> > On Apr 20, 2:10 pm, Gilberto Zampatti
> > <GilbertoZampa...@.discussions.microsoft.com> wrote:
> > > Could you explain exactly which data conversion you prepared? The DT_STR was
> > > specified for the input field or for the output? In this last case, which is
> > > the input field data type?
> > > Gilberto Zampatti
> > > "SB" wrote:
> > > > Hi,
> > > > Is DT_STR Unicode? From SSIS I am using a data conversion component
> > > > and defining the data type as DT_STR and saving the output in excel
> > > > file. I am getting an error that 'can not convert from Unicode to non-
> > > > unicode'. I thought I have already taken care of that in data
> > > > conversion.- Hide quoted text -
> > > - Show quoted text -
> > Hi,
> > DT_STR should be an input field because I am getting data from a table
> > using ole db source.- Hide quoted text -
> - Show quoted text -
Thanks.
For output I have created a excel table using the statement:
CREATE TABLE `Excel Destination` (
`NAME` VARCHAR(),
)
I have tried string, text etc but no success.
Excel Destination editor:
Data Access Mode:
Table or view
Name of the excel sheet:
Excel Destination|||On Apr 20, 3:28 pm, Gilberto Zampatti
<GilbertoZampa...@.discussions.microsoft.com> wrote:
> OK; supposing that input field is actually non-unicode(DT_STR could be the
> right data type), verify the transformation task to be sure about the output
> data type. SQL 2005 usesUNICODEby default, and you should explicitly
> declare nonunicodedata types.
> If you correct the input field definition specifyingUnicode(DT_WSTR), you
> should avoid such error message.
> The fact that you are accessing input table via OLE DB does'nt change the
> fact that QSL 2005 usesUnicodeby default.
> Gilberto Zampatti
>
> "SB" wrote:
> > On Apr 20, 2:10 pm, Gilberto Zampatti
> > <GilbertoZampa...@.discussions.microsoft.com> wrote:
> > > Could you explain exactly which data conversion you prepared? The DT_STR was
> > > specified for the input field or for the output? In this last case, which is
> > > the input field data type?
> > > Gilberto Zampatti
> > > "SB" wrote:
> > > > Hi,
> > > > Is DT_STRUnicode? From SSIS I am using a data conversion component
> > > > and defining the data type as DT_STR and saving the output in excel
> > > > file. I am getting an error that 'can not convert fromUnicodeto non-
> > > >unicode'. I thought I have already taken care of that in data
> > > > conversion.- Hide quoted text -
> > > - Show quoted text -
> > Hi,
> > DT_STR should be an input field because I am getting data from a table
> > using ole db source.- Hide quoted text -
> - Show quoted text -
Hi,
Is it alright if I declare a datatype as nvarchar 'create table (name
nvarchar(20))' and then treat it as if varchar for example functions,
inserts and join with other varchar columns etc? Except giving a
larger string size I don't have much use for unicode. They might as
well keep this type internal so that we didn't have to worry about it.|||On Apr 23, 10:41 am, SB <othell...@.yahoo.com> wrote:
> On Apr 20, 3:28 pm, Gilberto Zampatti
>
>
> <GilbertoZampa...@.discussions.microsoft.com> wrote:
> > OK; supposing that input field is actually non-unicode(DT_STR could be the
> > right data type), verify the transformation task to be sure about the output
> > data type. SQL 2005 usesUNICODEby default, and you should explicitly
> > declare nonunicodedata types.
> > If you correct the input field definition specifyingUnicode(DT_WSTR), you
> > should avoid such error message.
> > The fact that you are accessing input table via OLE DB does'nt change the
> > fact that QSL 2005 usesUnicodeby default.
> > Gilberto Zampatti
> > "SB" wrote:
> > > On Apr 20, 2:10 pm, Gilberto Zampatti
> > > <GilbertoZampa...@.discussions.microsoft.com> wrote:
> > > > Could you explain exactly which data conversion you prepared? The DT_STR was
> > > > specified for the input field or for the output? In this last case, which is
> > > > the input field data type?
> > > > Gilberto Zampatti
> > > > "SB" wrote:
> > > > > Hi,
> > > > > Is DT_STRUnicode? From SSIS I am using a data conversion component
> > > > > and defining the data type as DT_STR and saving the output in excel
> > > > > file. I am getting an error that 'can not convert fromUnicodeto non-
> > > > >unicode'. I thought I have already taken care of that in data
> > > > > conversion.- Hide quoted text -
> > > > - Show quoted text -
> > > Hi,
> > > DT_STR should be an input field because I am getting data from a table
> > > using ole db source.- Hide quoted text -
> > - Show quoted text -
> Hi,
> Is it alright if I declare a datatype as nvarchar 'create table (name
> nvarchar(20))' and then treat it as if varchar for example functions,
> inserts and join with other varchar columns etc? Except giving a
> larger string size I don't have much use for unicode. They might as
> well keep this type internal so that we didn't have to worry about it.- Hide quoted text -
> - Show quoted text -
That question was not related to SSIS package (as you have told ssis
is unicode). FWIW I was able to get the data (ssis) imported when I
used unicode for input and output data using a data conversion in
between from varchar to unicode string.

Non localized unicode collating

I noticed that SQL Server 7 had a specific ID for collating unicode in binary ordering. I could not find find a collation in SQL Server 2000 that was clearly a non localized binary ordering.
Does anyone know what collation I want for this purpose?
Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
Dinesh
SQL Server MVP
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000 that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>
|||I believe I also saw those. But it was not clear to me if that's what we are looking for. I do appreciate you looking.
|||The "*_BIN" collations like Latin1_General_BIN do provide non-localized
sorting and comparisons of Unicode data. For example, the collation
Greek_BIN behaves has the following characteristics:
- Unicode data is sorted and compared using a binary collation that only
takes into account the code point of Unicode characters (for example an
"x" is not assigned any linguistic meaning but is simply sorted as
U+0078). No two Unicode characters are considered equal.
- Non-Unicode data is stored using Greek code page 1253.
- Non-Unicode data is also sorted using a binary sort order that only
considers the character's code point value. No two non-Unicode
characters are considered equal.
In other words, the "Greek_" in "Greek_BIN" simply refers to the code
page that is used to store non-Unicode data. It doesn't change how the
binary collation sorts or compares Unicode data.
Binary sorting for non-Unicode characters is fairly intuitive; for
example, code point 0x49 sorts before 0x4a, which sorts before 0x4b, etc.
Binary Unicode sorting, however, also depends only on the character code
points, but due to an implementation quirk does not order characters in
strict code point order. The actual binary Unicode collation behavior is
discussed briefly in 816039 FIX: Code Point Comparison Semantics for
SQL_Latin1_General_Cp850_BIN (http://support.microsoft.com/?id=816039)
and is also demonstrated by the script below. Note that this behavior is
also how SQL 7.0 worked in binary Unicode string comparisons.
drop table #t1
declare @.c1 nchar(1)
declare @.c2 nchar(1)
declare @.c3 nchar(1)
declare @.c4 nchar(1)
declare @.c5 nchar(1)
set @.c1 = nchar (0x03C0)
set @.c2 = nchar (0x00F0)
set @.c3 = nchar (0x01E0)
set @.c4 = nchar (0x01E1)
set @.c5 = nchar (0xABCD)
create table #t1 (c1 nchar(2) collate Latin1_General_BIN)
insert into #t1 values (@.c1 + @.c1)
insert into #t1 values (@.c1 + @.c2)
insert into #t1 values (@.c2 + @.c1)
insert into #t1 values (@.c2 + @.c2)
insert into #t1 values (@.c3 + @.c4)
insert into #t1 values (@.c4 + @.c3)
insert into #t1 values (@.c5 + @.c5)
select c1, convert (varbinary (4), c1) as bin_string,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0xFF00) / 256) as byte2_bin,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0x00FF)) as byte1_bin,
convert (varbinary(2), substring (c1, 2, 100)) as remaining_bytes_bin
from #t1 order by c1
HTH,
Bart
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no
rights.
Thread-Topic: Non localized unicode collating
thread-index: AcQ4brcfkye8UvuwSXaGffp7bpZtog==
X-WN-Post: microsoft.public.sqlserver.server
From: "=?Utf-8?B?TWljaGFlbCBCYXVlcnM=?="
<anonymous@.discussions.microsoft.com>
References: <BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com>
Subject: RE: Non localized unicode collating
Date: Wed, 12 May 2004 15:16:04 -0700
Lines: 3
Message-ID: <3FDBAF61-39D9-4BD7-BFDF-A5E330B354F1@.microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain;
charset="Utf-8"
Content-Transfer-Encoding: 7bit
X-Newsreader: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
Newsgroups: microsoft.public.sqlserver.server
Path: cpmsftngxa10.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:341014
NNTP-Posting-Host: tk2msftcmty1.phx.gbl 10.40.1.180
X-Tomcat-NG: microsoft.public.sqlserver.server
I believe I also saw those. But it was not clear to me if that's what we
are looking for. I do appreciate you looking.
From: "Dinesh T.K" <tkdinesh@.nospam.mail.tkdinesh.com>
Subject: Re: Non localized unicode collating
Date: Thu, 13 May 2004 03:24:28 +0530
Newsgroups: microsoft.public.sqlserver.server
Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
Dinesh
SQL Server MVP
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000
that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>

Non localized unicode collating

I noticed that SQL Server 7 had a specific ID for collating unicode in binar
y ordering. I could not find find a collation in SQL Server 2000 that was c
learly a non localized binary ordering.
Does anyone know what collation I want for this purpose?Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
Dinesh
SQL Server MVP
--
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000 that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>|||I believe I also saw those. But it was not clear to me if that's what we ar
e looking for. I do appreciate you looking.|||The "*_BIN" collations like Latin1_General_BIN do provide non-localized
sorting and comparisons of Unicode data. For example, the collation
Greek_BIN behaves has the following characteristics:
- Unicode data is sorted and compared using a binary collation that only
takes into account the code point of Unicode characters (for example an
"x" is not assigned any linguistic meaning but is simply sorted as
U+0078). No two Unicode characters are considered equal.
- Non-Unicode data is stored using Greek code page 1253.
- Non-Unicode data is also sorted using a binary sort order that only
considers the character's code point value. No two non-Unicode
characters are considered equal.
In other words, the "Greek_" in "Greek_BIN" simply refers to the code
page that is used to store non-Unicode data. It doesn't change how the
binary collation sorts or compares Unicode data.
Binary sorting for non-Unicode characters is fairly intuitive; for
example, code point 0x49 sorts before 0x4a, which sorts before 0x4b, etc.
Binary Unicode sorting, however, also depends only on the character code
points, but due to an implementation quirk does not order characters in
strict code point order. The actual binary Unicode collation behavior is
discussed briefly in 816039 FIX: Code Point Comparison Semantics for
SQL_Latin1_General_Cp850_BIN (http://support.microsoft.com/?id=816039)
and is also demonstrated by the script below. Note that this behavior is
also how SQL 7.0 worked in binary Unicode string comparisons.
drop table #t1
declare @.c1 nchar(1)
declare @.c2 nchar(1)
declare @.c3 nchar(1)
declare @.c4 nchar(1)
declare @.c5 nchar(1)
set @.c1 = nchar (0x03C0)
set @.c2 = nchar (0x00F0)
set @.c3 = nchar (0x01E0)
set @.c4 = nchar (0x01E1)
set @.c5 = nchar (0xABCD)
create table #t1 (c1 nchar(2) collate Latin1_General_BIN)
insert into #t1 values (@.c1 + @.c1)
insert into #t1 values (@.c1 + @.c2)
insert into #t1 values (@.c2 + @.c1)
insert into #t1 values (@.c2 + @.c2)
insert into #t1 values (@.c3 + @.c4)
insert into #t1 values (@.c4 + @.c3)
insert into #t1 values (@.c5 + @.c5)
select c1, convert (varbinary (4), c1) as bin_string,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0xFF00) / 256) as byte2_bin,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0x00FF)) as byte1_bin,
convert (varbinary(2), substring (c1, 2, 100)) as remaining_bytes_bin
from #t1 order by c1
HTH,
Bart
--
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no
rights.
Thread-Topic: Non localized unicode collating
thread-index: AcQ4brcfkye8UvuwSXaGffp7bpZtog==
X-WN-Post: microsoft.public.sqlserver.server
From: "examnotes"
<anonymous@.discussions.microsoft.com>
References: <BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com>
Subject: RE: Non localized unicode collating
Date: Wed, 12 May 2004 15:16:04 -0700
Lines: 3
Message-ID: <3FDBAF61-39D9-4BD7-BFDF-A5E330B354F1@.microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain;
charset="Utf-8"
Content-Transfer-Encoding: 7bit
X-Newsreader: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
Newsgroups: microsoft.public.sqlserver.server
Path: cpmsftngxa10.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:341014
NNTP-Posting-Host: tk2msftcmty1.phx.gbl 10.40.1.180
X-Tomcat-NG: microsoft.public.sqlserver.server
I believe I also saw those. But it was not clear to me if that's what we
are looking for. I do appreciate you looking.
From: "Dinesh T.K" <tkdinesh@.nospam.mail.tkdinesh.com>
Subject: Re: Non localized unicode collating
Date: Thu, 13 May 2004 03:24:28 +0530
Newsgroups: microsoft.public.sqlserver.server
Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
Dinesh
SQL Server MVP
--
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000
that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>

Friday, March 9, 2012

Non localized unicode collating

I noticed that SQL Server 7 had a specific ID for collating unicode in binary ordering. I could not find find a collation in SQL Server 2000 that was clearly a non localized binary ordering.
Does anyone know what collation I want for this purpose?Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
--
Dinesh
SQL Server MVP
--
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000 that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>|||I believe I also saw those. But it was not clear to me if that's what we are looking for. I do appreciate you looking|||The "*_BIN" collations like Latin1_General_BIN do provide non-localized
sorting and comparisons of Unicode data. For example, the collation
Greek_BIN behaves has the following characteristics:
- Unicode data is sorted and compared using a binary collation that only
takes into account the code point of Unicode characters (for example an
"x" is not assigned any linguistic meaning but is simply sorted as
U+0078). No two Unicode characters are considered equal.
- Non-Unicode data is stored using Greek code page 1253.
- Non-Unicode data is also sorted using a binary sort order that only
considers the character's code point value. No two non-Unicode
characters are considered equal.
In other words, the "Greek_" in "Greek_BIN" simply refers to the code
page that is used to store non-Unicode data. It doesn't change how the
binary collation sorts or compares Unicode data.
Binary sorting for non-Unicode characters is fairly intuitive; for
example, code point 0x49 sorts before 0x4a, which sorts before 0x4b, etc.
Binary Unicode sorting, however, also depends only on the character code
points, but due to an implementation quirk does not order characters in
strict code point order. The actual binary Unicode collation behavior is
discussed briefly in 816039 FIX: Code Point Comparison Semantics for
SQL_Latin1_General_Cp850_BIN (http://support.microsoft.com/?id=816039)
and is also demonstrated by the script below. Note that this behavior is
also how SQL 7.0 worked in binary Unicode string comparisons.
drop table #t1
declare @.c1 nchar(1)
declare @.c2 nchar(1)
declare @.c3 nchar(1)
declare @.c4 nchar(1)
declare @.c5 nchar(1)
set @.c1 = nchar (0x03C0)
set @.c2 = nchar (0x00F0)
set @.c3 = nchar (0x01E0)
set @.c4 = nchar (0x01E1)
set @.c5 = nchar (0xABCD)
create table #t1 (c1 nchar(2) collate Latin1_General_BIN)
insert into #t1 values (@.c1 + @.c1)
insert into #t1 values (@.c1 + @.c2)
insert into #t1 values (@.c2 + @.c1)
insert into #t1 values (@.c2 + @.c2)
insert into #t1 values (@.c3 + @.c4)
insert into #t1 values (@.c4 + @.c3)
insert into #t1 values (@.c5 + @.c5)
select c1, convert (varbinary (4), c1) as bin_string,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0xFF00) / 256) as byte2_bin,
convert (varbinary(1), (convert (int, unicode (substring (c1, 1, 1)))
& 0x00FF)) as byte1_bin,
convert (varbinary(2), substring (c1, 2, 100)) as remaining_bytes_bin
from #t1 order by c1
HTH,
Bart
--
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no
rights.
Thread-Topic: Non localized unicode collating
thread-index: AcQ4brcfkye8UvuwSXaGffp7bpZtog==X-WN-Post: microsoft.public.sqlserver.server
From: "=?Utf-8?B?TWljaGFlbCBCYXVlcnM=?="
<anonymous@.discussions.microsoft.com>
References: <BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com>
Subject: RE: Non localized unicode collating
Date: Wed, 12 May 2004 15:16:04 -0700
Lines: 3
Message-ID: <3FDBAF61-39D9-4BD7-BFDF-A5E330B354F1@.microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain;
charset="Utf-8"
Content-Transfer-Encoding: 7bit
X-Newsreader: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
Newsgroups: microsoft.public.sqlserver.server
Path: cpmsftngxa10.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:341014
NNTP-Posting-Host: tk2msftcmty1.phx.gbl 10.40.1.180
X-Tomcat-NG: microsoft.public.sqlserver.server
I believe I also saw those. But it was not clear to me if that's what we
are looking for. I do appreciate you looking.
From: "Dinesh T.K" <tkdinesh@.nospam.mail.tkdinesh.com>
Subject: Re: Non localized unicode collating
Date: Thu, 13 May 2004 03:24:28 +0530
Newsgroups: microsoft.public.sqlserver.server
Michael,
Not sure, I ran the below query and came up with this:
SQL_Latin1_General_CP437_BIN
SQL_Latin1_General_CP850_BIN
select *
from ::fn_helpcollations()
where [description] like '%binary%'
--
Dinesh
SQL Server MVP
--
--
SQL Server FAQ at
http://www.tkdinesh.com
"Michael Bauers" <anonymous@.discussions.microsoft.com> wrote in message
news:BA9AFE0F-4DEA-4EDA-A538-80FFDD11945A@.microsoft.com...
> I noticed that SQL Server 7 had a specific ID for collating unicode in
binary ordering. I could not find find a collation in SQL Server 2000
that
was clearly a non localized binary ordering.
> Does anyone know what collation I want for this purpose?
>