Wednesday, March 28, 2012

More Extremely weird SQL problem.

I believe SQL Server 2000 EE has a bug in it. I think this
because of the article here :
http://support.microsoft.com/default.aspx?scid=kb;en-us;274765
and my symptoms:
It started happening again. It lasted longer this time.
Here is the testing I did while the problem was happening.
INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
FROM subscriber, Listmember
WHERE Listmember.LM_C_ID = subscriber.id_email
AND ((subscriber.banned = 0
AND subscriber.id_email < 5000000)
AND (Listmember.LM_L_ID = 253))
still produced all records where id_email was > or <
5000000 . around 600K
but if I do the this query
SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
FROM subscriber, Listmember
WHERE Listmember.LM_C_ID = subscriber.id_email
AND ((subscriber.banned = 0
AND subscriber.id_email < 5000000)
AND (Listmember.LM_L_ID = 253))
it is only the records where id_email < 5000000 . I made
two identical new tables called recipient_test
and subscriber_test and filled them with the same data as
the others. Ran the above queries and they both worked
correctly. If I use the original Recipient but the new
subscriber_test table, it works, if I use Subscriber and
Recipient_test, it works. I did not create indexes or
Foreign key contstraints on the test tables. So that could
be a possibility.
Another clue is I could reconfigure the server to Max
degree of Parallelism = 1 and both worked during, then I
could reconfigure it back to MAXDOP = 0 (or 2 or 3, I tried
both) and it would go back to happening again.
So, it has something to do with MAXDOP, and maybe the
reason it happens sometimes and not others is because of
the cost of threshold for parallelism setting being 5. When
it thinks it needs parallelism it happens, and when the
server thinks it doesn't need parallelism, it doesn't.
It is embedded in a Java program call Accucast. The
Accucast software hasn't changed for two years. This only
started happening when we migrated from SQL standard SP3 on
W2K server to SQL Enterprise SP3 and Windows Server 2003.
the database we recreated. New tables, new indexes, FK's
etc. The two tables are in different filegroups. All
indexes except the pk clustered indexes on each are in
different filegroups as well.
Recipient table in recipient Filegroup
pk on recipient on rc_d_id, rc_c_id in recipient filegroup
other indexes on recipient are in recipient_ix filegroup
same set up on subscriber
id_email is pk
both pk are clustered.
It's a Dell 2650 Dual Intel 3.06mb 1mb cache Xeon , 6 gb
ECC memory, 5 Maxtor 145 Gb 10K scsi drives configure as
RAID 10 with Dell Perc4/DC (spanned raid 1) with 1 hot
spare. Windows Server 2003 all patched up. SQL Server
Enterprise Edition SP3a. Everything seems to be fine but
this. I can't figure it out.
Any ideas?
ThanksNick,
A few thoughts:
- Does the query plan actually include a hash join? You say that the tables
are quite well indexed, and this would usually cause the Query Optimizer to
use either nested loops or Merge Joins rather than hash joins, and the
article you referred to talks about problems with hash joins only.
- Have you recently run DBCC CHECKTABLE on the affected tables? This kind of
behaviour is maybe caused by a corrupted non clustered index.
These are the two things I would look at first, with the information you
have given us.
--
Jacco Schalkwijk
SQL Server MVP
"Nick" <anonymous@.discussions.microsoft.com> wrote in message
news:017601c3bf59$182ef130$a501280a@.phx.gbl...
> I believe SQL Server 2000 EE has a bug in it. I think this
> because of the article here :
> http://support.microsoft.com/default.aspx?scid=kb;en-us;274765
> and my symptoms:
> It started happening again. It lasted longer this time.
> Here is the testing I did while the problem was happening.
> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> FROM subscriber, Listmember
> WHERE Listmember.LM_C_ID = subscriber.id_email
> AND ((subscriber.banned = 0
> AND subscriber.id_email < 5000000)
> AND (Listmember.LM_L_ID = 253))
> still produced all records where id_email was > or <
> 5000000 . around 600K
> but if I do the this query
> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> FROM subscriber, Listmember
> WHERE Listmember.LM_C_ID = subscriber.id_email
> AND ((subscriber.banned = 0
> AND subscriber.id_email < 5000000)
> AND (Listmember.LM_L_ID = 253))
> it is only the records where id_email < 5000000 . I made
> two identical new tables called recipient_test
> and subscriber_test and filled them with the same data as
> the others. Ran the above queries and they both worked
> correctly. If I use the original Recipient but the new
> subscriber_test table, it works, if I use Subscriber and
> Recipient_test, it works. I did not create indexes or
> Foreign key contstraints on the test tables. So that could
> be a possibility.
> Another clue is I could reconfigure the server to Max
> degree of Parallelism = 1 and both worked during, then I
> could reconfigure it back to MAXDOP = 0 (or 2 or 3, I tried
> both) and it would go back to happening again.
> So, it has something to do with MAXDOP, and maybe the
> reason it happens sometimes and not others is because of
> the cost of threshold for parallelism setting being 5. When
> it thinks it needs parallelism it happens, and when the
> server thinks it doesn't need parallelism, it doesn't.
> It is embedded in a Java program call Accucast. The
> Accucast software hasn't changed for two years. This only
> started happening when we migrated from SQL standard SP3 on
> W2K server to SQL Enterprise SP3 and Windows Server 2003.
> the database we recreated. New tables, new indexes, FK's
> etc. The two tables are in different filegroups. All
> indexes except the pk clustered indexes on each are in
> different filegroups as well.
> Recipient table in recipient Filegroup
> pk on recipient on rc_d_id, rc_c_id in recipient filegroup
> other indexes on recipient are in recipient_ix filegroup
> same set up on subscriber
> id_email is pk
> both pk are clustered.
> It's a Dell 2650 Dual Intel 3.06mb 1mb cache Xeon , 6 gb
> ECC memory, 5 Maxtor 145 Gb 10K scsi drives configure as
> RAID 10 with Dell Perc4/DC (spanned raid 1) with 1 hot
> spare. Windows Server 2003 all patched up. SQL Server
> Enterprise Edition SP3a. Everything seems to be fine but
> this. I can't figure it out.
>
> Any ideas?
> Thanks
>|||One says this for dbcc checktable:
dbcc checktable ('Recipient') with all_errormsgs
DBCC results for 'Recipient'.
There are 37121847 rows in 160756 pages for
object 'Recipient'.
DBCC execution completed. If DBCC printed error messages,
contact your system administrator.
I can't do a repair option at the moment because it is a
production db. However, would the above tell me if
something was wrong? Did the same thing for the
Subscriber table and Listmember table. All the same
message just different row count and page count.
here is the output from showplan_text on and then the
query:
StmtText
--
SET STATISTICS PROFILE ON
(1 row(s) affected)
StmtText
---
---
---
---
--
INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
SELECT DISTINCT 22000, subscriber.id_email, '2003-12-09'
FROM subscriber, Listmember
WHERE Listmember.LM_C_ID = subscriber.id_email
AND ((subscriber.banned = 0
AND subscriber.id_email < 5000000)
(1 row(s) affected)
StmtText
---
---
---
---
--
|--Index Insert(OBJECT:([accucast].[dbo].[Recipient].
[IX_Recipient]), SET:([RC_D_ID1014]=[Recipient].[RC_D_ID],
[RC_C_ID1013]=[Recipient].[RC_C_ID], [IdxBmk1012]=[Bmk1007]))
|--Sort(ORDER BY:([Recipient].[RC_C_ID] ASC,
[Recipient].[RC_D_ID] ASC, [Bmk1007] ASC))
|--Assert(WHERE:(If (NOT([Pass1011]) AND
([Expr1010] IS NULL)) then 0 else NULL))
|--Nested Loops(Left Semi Join, WHERE:
([Recipient].[RC_C_ID] IS NULL)OUTER REFERENCES:
([Recipient].[RC_C_ID]), DEFINE:([Expr1010] = [PROBE
VALUE]))
|--Clustered Index Insert(OBJECT:
([accucast].[dbo].[Recipient].[PK_Recipient]), SET:
([Recipient].[RC_CONT_ERR]=NULL, [Recipient].[RC_ADDR_ERR]
=NULL, [Recipient].[RC_READ]=NULL, [Recipient].[RC_DATE]
='Dec 9 2003 12:00AM', [Recipient].
| |--Top(ROWCOUNT est 0)
| |--Parallelism(Gather
Streams, ORDER BY:([Subscriber].[id_email] ASC))
| |--Merge Join(Inner
Join, MERGE:([Listmember].[LM_C_ID])=([Subscriber].
[id_email]), RESIDUAL:([Subscriber].[id_email]=[Listmember].[LM_C_ID]))
| |--Parallelism
(Repartition Streams, PARTITION COLUMNS:([Listmember].
[LM_C_ID]), ORDER BY:([Listmember].[LM_C_ID] ASC))
| | |--Index
Seek(OBJECT:([accucast].[dbo].[Listmember].
[IX_Listmember]), SEEK:([Listmember].[LM_L_ID]=138)
ORDERED FORWARD)
| |--Parallelism
(Repartition Streams, PARTITION COLUMNS:([Subscriber].
[id_email]), ORDER BY:([Subscriber].[id_email] ASC))
| |--Index
Seek(OBJECT:([accucast].[dbo].[Subscriber].
[IX_subscriber_1]), SEEK:([Subscriber].[banned]=0 AND
[Subscriber].[id_email] < 5000000) ORDERED FORWARD)
|--Row Count Spool
|--Clustered Index Seek(OBJECT:
([accucast].[dbo].[Subscriber].[PK_subscriber]), SEEK:
([Subscriber].[id_email]=[Recipient].[RC_C_ID]) ORDERED
FORWARD)
(14 row(s) affected)
StmtText
--
SET STATISTICS PROFILE OFF
(1 row(s) affected)
Any help would be appreciated.
Thanks.
Nick
>--Original Message--
>Nick,
>A few thoughts:
>- Does the query plan actually include a hash join? You
say that the tables
>are quite well indexed, and this would usually cause the
Query Optimizer to
>use either nested loops or Merge Joins rather than hash
joins, and the
>article you referred to talks about problems with hash
joins only.
>- Have you recently run DBCC CHECKTABLE on the affected
tables? This kind of
>behaviour is maybe caused by a corrupted non clustered
index.
>These are the two things I would look at first, with the
information you
>have given us.
>--
>Jacco Schalkwijk
>SQL Server MVP
>
>"Nick" <anonymous@.discussions.microsoft.com> wrote in
message
>news:017601c3bf59$182ef130$a501280a@.phx.gbl...
>> I believe SQL Server 2000 EE has a bug in it. I think
this
>> because of the article here :
>> http://support.microsoft.com/default.aspx?scid=kb;en-
us;274765
>> and my symptoms:
>> It started happening again. It lasted longer this time.
>> Here is the testing I did while the problem was
happening.
>> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
>> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
>> FROM subscriber, Listmember
>> WHERE Listmember.LM_C_ID = subscriber.id_email
>> AND ((subscriber.banned = 0
>> AND subscriber.id_email < 5000000)
>> AND (Listmember.LM_L_ID = 253))
>> still produced all records where id_email was > or <
>> 5000000 . around 600K
>> but if I do the this query
>> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
>> FROM subscriber, Listmember
>> WHERE Listmember.LM_C_ID = subscriber.id_email
>> AND ((subscriber.banned = 0
>> AND subscriber.id_email < 5000000)
>> AND (Listmember.LM_L_ID = 253))
>> it is only the records where id_email < 5000000 . I made
>> two identical new tables called recipient_test
>> and subscriber_test and filled them with the same data
as
>> the others. Ran the above queries and they both worked
>> correctly. If I use the original Recipient but the new
>> subscriber_test table, it works, if I use Subscriber and
>> Recipient_test, it works. I did not create indexes or
>> Foreign key contstraints on the test tables. So that
could
>> be a possibility.
>> Another clue is I could reconfigure the server to Max
>> degree of Parallelism = 1 and both worked during, then I
>> could reconfigure it back to MAXDOP = 0 (or 2 or 3, I
tried
>> both) and it would go back to happening again.
>> So, it has something to do with MAXDOP, and maybe the
>> reason it happens sometimes and not others is because of
>> the cost of threshold for parallelism setting being 5.
When
>> it thinks it needs parallelism it happens, and when the
>> server thinks it doesn't need parallelism, it doesn't.
>> It is embedded in a Java program call Accucast. The
>> Accucast software hasn't changed for two years. This
only
>> started happening when we migrated from SQL standard
SP3 on
>> W2K server to SQL Enterprise SP3 and Windows Server
2003.
>> the database we recreated. New tables, new indexes, FK's
>> etc. The two tables are in different filegroups. All
>> indexes except the pk clustered indexes on each are in
>> different filegroups as well.
>> Recipient table in recipient Filegroup
>> pk on recipient on rc_d_id, rc_c_id in recipient
filegroup
>> other indexes on recipient are in recipient_ix filegroup
>> same set up on subscriber
>> id_email is pk
>> both pk are clustered.
>> It's a Dell 2650 Dual Intel 3.06mb 1mb cache Xeon , 6 gb
>> ECC memory, 5 Maxtor 145 Gb 10K scsi drives configure as
>> RAID 10 with Dell Perc4/DC (spanned raid 1) with 1 hot
>> spare. Windows Server 2003 all patched up. SQL Server
>> Enterprise Edition SP3a. Everything seems to be fine but
>> this. I can't figure it out.
>>
>> Any ideas?
>> Thanks
>>
>
>.
>|||Hi Nick,
The results you get indicate that there are no problems with the indexes,
which I thought could be a cause of the problem you have. If something was
wrong you would have gotten error messages in Query Analyzer so that would
have been quite clear.
The showplan shows, as I expected, that the query does not use a hash join
but uses one nested loop and one merge join. So the information in the
knowledge base article does not apply to this problem.
If you can't find any more inforamtion about it in the Knowledge Base, it
might indeed be a bug and the best thing to do would be to open a case with
Microsoft Product Support. If it is a bug you won't be charged.
--
Jacco Schalkwijk
SQL Server MVP
"Nick" <anonymous@.discussions.microsoft.com> wrote in message
news:073101c3bff4$193bf0f0$a001280a@.phx.gbl...
> One says this for dbcc checktable:
> dbcc checktable ('Recipient') with all_errormsgs
> DBCC results for 'Recipient'.
> There are 37121847 rows in 160756 pages for
> object 'Recipient'.
> DBCC execution completed. If DBCC printed error messages,
> contact your system administrator.
> I can't do a repair option at the moment because it is a
> production db. However, would the above tell me if
> something was wrong? Did the same thing for the
> Subscriber table and Listmember table. All the same
> message just different row count and page count.
>
> here is the output from showplan_text on and then the
> query:
> StmtText
> --
> SET STATISTICS PROFILE ON
> (1 row(s) affected)
> StmtText
>
>
> ---
> ---
> ---
> ---
> --
> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
> SELECT DISTINCT 22000, subscriber.id_email, '2003-12-09'
> FROM subscriber, Listmember
> WHERE Listmember.LM_C_ID = subscriber.id_email
> AND ((subscriber.banned = 0
> AND subscriber.id_email < 5000000)
> (1 row(s) affected)
> StmtText
>
>
> ---
> ---
> ---
> ---
> --
> |--Index Insert(OBJECT:([accucast].[dbo].[Recipient].
> [IX_Recipient]), SET:([RC_D_ID1014]=[Recipient].[RC_D_ID],
> [RC_C_ID1013]=[Recipient].[RC_C_ID], [IdxBmk1012]=> [Bmk1007]))
> |--Sort(ORDER BY:([Recipient].[RC_C_ID] ASC,
> [Recipient].[RC_D_ID] ASC, [Bmk1007] ASC))
> |--Assert(WHERE:(If (NOT([Pass1011]) AND
> ([Expr1010] IS NULL)) then 0 else NULL))
> |--Nested Loops(Left Semi Join, WHERE:
> ([Recipient].[RC_C_ID] IS NULL)OUTER REFERENCES:
> ([Recipient].[RC_C_ID]), DEFINE:([Expr1010] = [PROBE
> VALUE]))
> |--Clustered Index Insert(OBJECT:
> ([accucast].[dbo].[Recipient].[PK_Recipient]), SET:
> ([Recipient].[RC_CONT_ERR]=NULL, [Recipient].[RC_ADDR_ERR]
> =NULL, [Recipient].[RC_READ]=NULL, [Recipient].[RC_DATE]
> ='Dec 9 2003 12:00AM', [Recipient].
> | |--Top(ROWCOUNT est 0)
> | |--Parallelism(Gather
> Streams, ORDER BY:([Subscriber].[id_email] ASC))
> | |--Merge Join(Inner
> Join, MERGE:([Listmember].[LM_C_ID])=([Subscriber].
> [id_email]), RESIDUAL:([Subscriber].[id_email]=> [Listmember].[LM_C_ID]))
> | |--Parallelism
> (Repartition Streams, PARTITION COLUMNS:([Listmember].
> [LM_C_ID]), ORDER BY:([Listmember].[LM_C_ID] ASC))
> | | |--Index
> Seek(OBJECT:([accucast].[dbo].[Listmember].
> [IX_Listmember]), SEEK:([Listmember].[LM_L_ID]=138)
> ORDERED FORWARD)
> | |--Parallelism
> (Repartition Streams, PARTITION COLUMNS:([Subscriber].
> [id_email]), ORDER BY:([Subscriber].[id_email] ASC))
> | |--Index
> Seek(OBJECT:([accucast].[dbo].[Subscriber].
> [IX_subscriber_1]), SEEK:([Subscriber].[banned]=0 AND
> [Subscriber].[id_email] < 5000000) ORDERED FORWARD)
> |--Row Count Spool
> |--Clustered Index Seek(OBJECT:
> ([accucast].[dbo].[Subscriber].[PK_subscriber]), SEEK:
> ([Subscriber].[id_email]=[Recipient].[RC_C_ID]) ORDERED
> FORWARD)
> (14 row(s) affected)
> StmtText
> --
> SET STATISTICS PROFILE OFF
> (1 row(s) affected)
>
> Any help would be appreciated.
> Thanks.
> Nick
> >--Original Message--
> >Nick,
> >
> >A few thoughts:
> >- Does the query plan actually include a hash join? You
> say that the tables
> >are quite well indexed, and this would usually cause the
> Query Optimizer to
> >use either nested loops or Merge Joins rather than hash
> joins, and the
> >article you referred to talks about problems with hash
> joins only.
> >- Have you recently run DBCC CHECKTABLE on the affected
> tables? This kind of
> >behaviour is maybe caused by a corrupted non clustered
> index.
> >
> >These are the two things I would look at first, with the
> information you
> >have given us.
> >
> >--
> >Jacco Schalkwijk
> >SQL Server MVP
> >
> >
> >"Nick" <anonymous@.discussions.microsoft.com> wrote in
> message
> >news:017601c3bf59$182ef130$a501280a@.phx.gbl...
> >> I believe SQL Server 2000 EE has a bug in it. I think
> this
> >> because of the article here :
> >>
> >> http://support.microsoft.com/default.aspx?scid=kb;en-
> us;274765
> >>
> >> and my symptoms:
> >>
> >> It started happening again. It lasted longer this time.
> >> Here is the testing I did while the problem was
> happening.
> >>
> >> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
> >> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> >> FROM subscriber, Listmember
> >> WHERE Listmember.LM_C_ID = subscriber.id_email
> >> AND ((subscriber.banned = 0
> >> AND subscriber.id_email < 5000000)
> >> AND (Listmember.LM_L_ID = 253))
> >>
> >> still produced all records where id_email was > or <
> >> 5000000 . around 600K
> >> but if I do the this query
> >>
> >> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> >> FROM subscriber, Listmember
> >> WHERE Listmember.LM_C_ID = subscriber.id_email
> >> AND ((subscriber.banned = 0
> >> AND subscriber.id_email < 5000000)
> >> AND (Listmember.LM_L_ID = 253))
> >>
> >> it is only the records where id_email < 5000000 . I made
> >> two identical new tables called recipient_test
> >> and subscriber_test and filled them with the same data
> as
> >> the others. Ran the above queries and they both worked
> >> correctly. If I use the original Recipient but the new
> >> subscriber_test table, it works, if I use Subscriber and
> >> Recipient_test, it works. I did not create indexes or
> >> Foreign key contstraints on the test tables. So that
> could
> >> be a possibility.
> >>
> >> Another clue is I could reconfigure the server to Max
> >> degree of Parallelism = 1 and both worked during, then I
> >> could reconfigure it back to MAXDOP = 0 (or 2 or 3, I
> tried
> >> both) and it would go back to happening again.
> >>
> >> So, it has something to do with MAXDOP, and maybe the
> >> reason it happens sometimes and not others is because of
> >> the cost of threshold for parallelism setting being 5.
> When
> >> it thinks it needs parallelism it happens, and when the
> >> server thinks it doesn't need parallelism, it doesn't.
> >>
> >> It is embedded in a Java program call Accucast. The
> >> Accucast software hasn't changed for two years. This
> only
> >> started happening when we migrated from SQL standard
> SP3 on
> >> W2K server to SQL Enterprise SP3 and Windows Server
> 2003.
> >> the database we recreated. New tables, new indexes, FK's
> >> etc. The two tables are in different filegroups. All
> >> indexes except the pk clustered indexes on each are in
> >> different filegroups as well.
> >>
> >> Recipient table in recipient Filegroup
> >> pk on recipient on rc_d_id, rc_c_id in recipient
> filegroup
> >> other indexes on recipient are in recipient_ix filegroup
> >>
> >> same set up on subscriber
> >> id_email is pk
> >>
> >> both pk are clustered.
> >>
> >> It's a Dell 2650 Dual Intel 3.06mb 1mb cache Xeon , 6 gb
> >> ECC memory, 5 Maxtor 145 Gb 10K scsi drives configure as
> >> RAID 10 with Dell Perc4/DC (spanned raid 1) with 1 hot
> >> spare. Windows Server 2003 all patched up. SQL Server
> >> Enterprise Edition SP3a. Everything seems to be fine but
> >> this. I can't figure it out.
> >>
> >>
> >>
> >> Any ideas?
> >>
> >> Thanks
> >>
> >>
> >
> >
> >.
> >|||Hi Nick,
Can you report back here when you have had an answer? I'm curious.
--
Jacco Schalkwijk
SQL Server MVP
"Nick" <anonymous@.discussions.microsoft.com> wrote in message
news:0f0b01c3c006$8d231810$a401280a@.phx.gbl...
> Thanks man. I'll do that.
> >--Original Message--
> >Hi Nick,
> >
> >The results you get indicate that there are no problems
> with the indexes,
> >which I thought could be a cause of the problem you have.
> If something was
> >wrong you would have gotten error messages in Query
> Analyzer so that would
> >have been quite clear.
> >
> >The showplan shows, as I expected, that the query does
> not use a hash join
> >but uses one nested loop and one merge join. So the
> information in the
> >knowledge base article does not apply to this problem.
> >
> >If you can't find any more inforamtion about it in the
> Knowledge Base, it
> >might indeed be a bug and the best thing to do would be
> to open a case with
> >Microsoft Product Support. If it is a bug you won't be
> charged.
> >
> >--
> >Jacco Schalkwijk
> >SQL Server MVP
> >
> >
> >"Nick" <anonymous@.discussions.microsoft.com> wrote in
> message
> >news:073101c3bff4$193bf0f0$a001280a@.phx.gbl...
> >> One says this for dbcc checktable:
> >>
> >> dbcc checktable ('Recipient') with all_errormsgs
> >>
> >> DBCC results for 'Recipient'.
> >> There are 37121847 rows in 160756 pages for
> >> object 'Recipient'.
> >> DBCC execution completed. If DBCC printed error
> messages,
> >> contact your system administrator.
> >>
> >> I can't do a repair option at the moment because it is a
> >> production db. However, would the above tell me if
> >> something was wrong? Did the same thing for the
> >> Subscriber table and Listmember table. All the same
> >> message just different row count and page count.
> >>
> >>
> >> here is the output from showplan_text on and then the
> >> query:
> >>
> >> StmtText
> >> --
> >> SET STATISTICS PROFILE ON
> >>
> >> (1 row(s) affected)
> >>
> >> StmtText
> >>
> >>
> >>
> >>
> >> ---
> --
> >> ---
> --
> >> ---
> --
> >> ---
> --
> >> --
> >> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
> >> SELECT DISTINCT 22000, subscriber.id_email, '2003-12-09'
> >> FROM subscriber, Listmember
> >> WHERE Listmember.LM_C_ID = subscriber.id_email
> >> AND ((subscriber.banned = 0
> >> AND subscriber.id_email < 5000000)
> >>
> >> (1 row(s) affected)
> >>
> >> StmtText
> >>
> >>
> >>
> >>
> >> ---
> --
> >> ---
> --
> >> ---
> --
> >> ---
> --
> >> --
> >> |--Index Insert(OBJECT:([accucast].[dbo].[Recipient].
> >> [IX_Recipient]), SET:([RC_D_ID1014]=[Recipient].
> [RC_D_ID],
> >> [RC_C_ID1013]=[Recipient].[RC_C_ID], [IdxBmk1012]=> >> [Bmk1007]))
> >> |--Sort(ORDER BY:([Recipient].[RC_C_ID] ASC,
> >> [Recipient].[RC_D_ID] ASC, [Bmk1007] ASC))
> >> |--Assert(WHERE:(If (NOT([Pass1011]) AND
> >> ([Expr1010] IS NULL)) then 0 else NULL))
> >> |--Nested Loops(Left Semi Join, WHERE:
> >> ([Recipient].[RC_C_ID] IS NULL)OUTER REFERENCES:
> >> ([Recipient].[RC_C_ID]), DEFINE:([Expr1010] = [PROBE
> >> VALUE]))
> >> |--Clustered Index Insert(OBJECT:
> >> ([accucast].[dbo].[Recipient].[PK_Recipient]), SET:
> >> ([Recipient].[RC_CONT_ERR]=NULL, [Recipient].
> [RC_ADDR_ERR]
> >> =NULL, [Recipient].[RC_READ]=NULL, [Recipient].[RC_DATE]
> >> ='Dec 9 2003 12:00AM', [Recipient].
> >> | |--Top(ROWCOUNT est 0)
> >> | |--Parallelism(Gather
> >> Streams, ORDER BY:([Subscriber].[id_email] ASC))
> >> | |--Merge Join(Inner
> >> Join, MERGE:([Listmember].[LM_C_ID])=([Subscriber].
> >> [id_email]), RESIDUAL:([Subscriber].[id_email]=> >> [Listmember].[LM_C_ID]))
> >> | |--Parallelism
> >> (Repartition Streams, PARTITION COLUMNS:([Listmember].
> >> [LM_C_ID]), ORDER BY:([Listmember].[LM_C_ID] ASC))
> >> | | |--Index
> >> Seek(OBJECT:([accucast].[dbo].[Listmember].
> >> [IX_Listmember]), SEEK:([Listmember].[LM_L_ID]=138)
> >> ORDERED FORWARD)
> >> | |--Parallelism
> >> (Repartition Streams, PARTITION COLUMNS:([Subscriber].
> >> [id_email]), ORDER BY:([Subscriber].[id_email] ASC))
> >> | |--Index
> >> Seek(OBJECT:([accucast].[dbo].[Subscriber].
> >> [IX_subscriber_1]), SEEK:([Subscriber].[banned]=0 AND
> >> [Subscriber].[id_email] < 5000000) ORDERED FORWARD)
> >> |--Row Count Spool
> >> |--Clustered Index Seek
> (OBJECT:
> >> ([accucast].[dbo].[Subscriber].[PK_subscriber]), SEEK:
> >> ([Subscriber].[id_email]=[Recipient].[RC_C_ID]) ORDERED
> >> FORWARD)
> >>
> >> (14 row(s) affected)
> >>
> >> StmtText
> >> --
> >> SET STATISTICS PROFILE OFF
> >>
> >> (1 row(s) affected)
> >>
> >>
> >>
> >> Any help would be appreciated.
> >>
> >> Thanks.
> >>
> >> Nick
> >>
> >> >--Original Message--
> >> >Nick,
> >> >
> >> >A few thoughts:
> >> >- Does the query plan actually include a hash join? You
> >> say that the tables
> >> >are quite well indexed, and this would usually cause
> the
> >> Query Optimizer to
> >> >use either nested loops or Merge Joins rather than hash
> >> joins, and the
> >> >article you referred to talks about problems with hash
> >> joins only.
> >> >- Have you recently run DBCC CHECKTABLE on the affected
> >> tables? This kind of
> >> >behaviour is maybe caused by a corrupted non clustered
> >> index.
> >> >
> >> >These are the two things I would look at first, with
> the
> >> information you
> >> >have given us.
> >> >
> >> >--
> >> >Jacco Schalkwijk
> >> >SQL Server MVP
> >> >
> >> >
> >> >"Nick" <anonymous@.discussions.microsoft.com> wrote in
> >> message
> >> >news:017601c3bf59$182ef130$a501280a@.phx.gbl...
> >> >> I believe SQL Server 2000 EE has a bug in it. I
> think
> >> this
> >> >> because of the article here :
> >> >>
> >> >> http://support.microsoft.com/default.aspx?scid=kb;en-
> >> us;274765
> >> >>
> >> >> and my symptoms:
> >> >>
> >> >> It started happening again. It lasted longer this
> time.
> >> >> Here is the testing I did while the problem was
> >> happening.
> >> >>
> >> >> INSERT INTO Recipient (RC_D_ID, RC_C_ID, RC_DATE)
> >> >> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> >> >> FROM subscriber, Listmember
> >> >> WHERE Listmember.LM_C_ID = subscriber.id_email
> >> >> AND ((subscriber.banned = 0
> >> >> AND subscriber.id_email < 5000000)
> >> >> AND (Listmember.LM_L_ID = 253))
> >> >>
> >> >> still produced all records where id_email was > or <
> >> >> 5000000 . around 600K
> >> >> but if I do the this query
> >> >>
> >> >> SELECT DISTINCT 20000, subscriber.id_email, GETDATE()
> >> >> FROM subscriber, Listmember
> >> >> WHERE Listmember.LM_C_ID = subscriber.id_email
> >> >> AND ((subscriber.banned = 0
> >> >> AND subscriber.id_email < 5000000)
> >> >> AND (Listmember.LM_L_ID = 253))
> >> >>
> >> >> it is only the records where id_email < 5000000 . I
> made
> >> >> two identical new tables called recipient_test
> >> >> and subscriber_test and filled them with the same
> data
> >> as
> >> >> the others. Ran the above queries and they both
> worked
> >> >> correctly. If I use the original Recipient but the
> new
> >> >> subscriber_test table, it works, if I use Subscriber
> and
> >> >> Recipient_test, it works. I did not create indexes or
> >> >> Foreign key contstraints on the test tables. So that
> >> could
> >> >> be a possibility.
> >> >>
> >> >> Another clue is I could reconfigure the server to Max
> >> >> degree of Parallelism = 1 and both worked during,
> then I
> >> >> could reconfigure it back to MAXDOP = 0 (or 2 or 3, I
> >> tried
> >> >> both) and it would go back to happening again.
> >> >>
> >> >> So, it has something to do with MAXDOP, and maybe the
> >> >> reason it happens sometimes and not others is
> because of
> >> >> the cost of threshold for parallelism setting being
> 5.
> >> When
> >> >> it thinks it needs parallelism it happens, and when
> the
> >> >> server thinks it doesn't need parallelism, it
> doesn't.
> >> >>
> >> >> It is embedded in a Java program call Accucast. The
> >> >> Accucast software hasn't changed for two years. This
> >> only
> >> >> started happening when we migrated from SQL standard
> >> SP3 on
> >> >> W2K server to SQL Enterprise SP3 and Windows Server
> >> 2003.
> >> >> the database we recreated. New tables, new indexes,
> FK's
> >> >> etc. The two tables are in different filegroups. All
> >> >> indexes except the pk clustered indexes on each are
> in
> >> >> different filegroups as well.
> >> >>
> >> >> Recipient table in recipient Filegroup
> >> >> pk on recipient on rc_d_id, rc_c_id in recipient
> >> filegroup
> >> >> other indexes on recipient are in recipient_ix
> filegroup
> >> >>
> >> >> same set up on subscriber
> >> >> id_email is pk
> >> >>
> >> >> both pk are clustered.
> >> >>
> >> >> It's a Dell 2650 Dual Intel 3.06mb 1mb cache Xeon ,
> 6 gb
> >> >> ECC memory, 5 Maxtor 145 Gb 10K scsi drives
> configure as
> >> >> RAID 10 with Dell Perc4/DC (spanned raid 1) with 1
> hot
> >> >> spare. Windows Server 2003 all patched up. SQL Server
> >> >> Enterprise Edition SP3a. Everything seems to be fine
> but
> >> >> this. I can't figure it out.
> >> >>
> >> >>
> >> >>
> >> >> Any ideas?
> >> >>
> >> >> Thanks
> >> >>
> >> >>
> >> >
> >> >
> >> >.
> >> >
> >
> >
> >.
> >

No comments:

Post a Comment