Friday, March 30, 2012

More on querying remote server

i'll just keep posting the problems, as my own archive.
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
Issues the remote query:
SOURCE:(servertest),
QUERY:(
SELECT Tbl1001."TransactionGUID"
Col1003,Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)
And tries to bring back 8.5M rows - and takes minutes to return, when really
there is only one row.
Question #1: Why doesn't it include the WHERE clause in the remote query?
If i change the query to:
SELECT TOP 1 TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
It still issues the remote query:
SOURCE:(servertest),
QUERY:(
SELECT
Tbl1001."TransactionGUID" Col1003,
Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)
But this time it only brings over 10 rows (somehow), performs a top 1
filter, and returns instantly.
Question #2: How is it bringing over 10 rows only? (hint: cursor)
If i change the query to:
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionNumber = 9679
It issues the remote query:
SOURCE:(servertest),
QUERY:(
SELECT
Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
WHERE Tbl1001."TransactionNumber"=(9679)
)
which returns only one row, and returns instantly. Obviously SQL Server can
*sometimes* do the optimization, othertimes it won't.
Question #3: Why does it include the WHERE clause in the remote query.
If i change the query to:
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionDate = '2002-06-10 08:19:10.513'
It issues the remote query:
SOURCE:(servertest),
QUERY:(
SELECT
Tbl1001."TransactionNumber" Col1004
FROM "cmsarchivetraining"."dbo"."Transactions_90" Tbl1001
WHERE Tbl1001."TransactionDate"='2002-06-10T08:19:10.513'
)
Question #4: Why does it include the WHERE clause in the remote query?
Now i create a view:
CREATE VIEW CMSArchiveTranasctions AS
SELECT *
FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90
and issue the query:
SELECT TransactionNumber
FROM CMSArchiveTransactions
WHERE TransactionNumber = 9679
It wants to issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
SELECT
Tbl1001."TransactionNumber" Col1005
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)
And tries to bring back 8.5M rows - and takes minutes to return, when really
there is only one row.
Question #5: Why does it not include the WHERE clause in the remote query?
If i change the query to:
SELECT TOP 1 TransactionNumber
FROM CMSArchiveTransactions
WHERE TransactionNumber = 9679
it issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
SELECT TOP 1 Col1004
FROM (
SELECT
Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
WHERE Tbl1001."TransactionNumber"=(9679)
) Qry1018
)
Question #6: Why is it now including the TOP inside the remote query, when
my earlier issue of a TOP query didn't include the TOP limiter?
Question #7: Why is it now including the WHERE clause in the remote query,
when not including a TOP 1 it won't include the where clause limiter?
Question #8: Why does it include the WHERE clause when i have any TOP
limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
where clause when i don't?
So i try changing my query to:
SELECT TransactionNumber
FROM (
SELECT *
FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90) t
WHERE TransactionNumber = 9679
And it issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
SELECT
Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
WHERE Tbl1001."TransactionNumber"=(9679)
)
So querying through a view it will not include the WHERE clause, but if i
include the view as a derived table, it can include the WHERE clause.
Question #9: Why does it include the WHERE clause when i query through a
derived table, and not through a view?
So i try changing the query to
SELECT TransactionNumber
FROM (
SELECT *
FROM CMSArchiveTransactions) t
WHERE TransactionNumber = 9679
So here we are, pay attention to this one. If i attempt to query the view
directly, it doesn't get optimized. Now i am going to query my view THROUGH
a derived table. What do you think it will do?
SOURCE:(SERVERTEST),
QUERY:(
SELECT Tbl1001."TransactionNumber" Col1004
FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
WHERE Tbl1001."TransactionNumber"=(9679)
)
It in fact now DOES include the where clause, but if i try to query my view
directly, it can't optimize it. This is a stunning development.
It means the rules are random, which makes it difficult to optimize
distributed queries.
Rules for including where clause:
Filter on uniqueidentifer:
Query includes TOP 1: Yes*
Query includes TOP 2: No
Query includes TOP n: No
Query includes TOP n PERCENT: No
Query doesn't include TOP: No
Filter on integer:
Query linked SQL Server: Yes
Query view that queries linked SQL Server:
Query view directly:
Query includes TOP 1: Yes
Query includes TOP 2: Yes
Query includes TOP 100: Yes
Query includes TOP 999999999: Yes
Query includes TOP 100 PERCENT: Yes
Query doesn't include TOP: No
Query view through derived table:
Query includes TOP 1: Yes
Query includes: TOP 2: Yes
Query includes: TOP n: Yes
Query includes: TOP 100 PERCENT: Yes
Query includes TOP: YesYou haven't posted any DDL. Are there any indexes on the remote table? Which
columns are indexed?
ML
http://milambda.blogspot.com/|||"ML" <ML@.discussions.microsoft.com> wrote in message
news:E89D60ED-EAEF-46F9-9F31-1D56842B277E@.microsoft.com...
> You haven't posted any DDL. Are there any indexes on the remote table?
> Which
> columns are indexed?
Before answering that, let me ask you this: why does it matter? It doesn't
seem to matter *sometimes*. And even if there aren't an indexes on the
remote tables, everyone can agree that sending any filtering criteria to a
remote server is faster than trying to filter rows after they've crossed a
link.
Even a table scan on a remote server is better than bringing the rows over a
network to be filtered.
Additionally, my post is not dealing with query performance. When SQL Server
does what it should do, the queries run fine. If SQL Server doesn't do what
it's supposed to do, the queries do not run fine. Indexes are not the
performance limitation here.
That having been said: yes.|||If you think you know your data better than the optimizer does (as might be
the case - especially with outdated statistics), you could use the REMOTE
join hint to tell the optimizer to process the join on the remote server.
Basically, the optimizer can only rely on statistics. If they are missing on
the remote site, he prefers "staying local" although filtering data remotely
might have yielded better performance. Why put a stress on a remote server i
f
poor performance is expected anyway?
But when statistics are up to date on both sites, then the optimizer still
has full control over the execution of queries (unless specified otherwise
through the use of hints) which basically means that he can still make the
wrong assumptions - and I'm as much in the dark here as you are.
ML
http://milambda.blogspot.com/|||> Basically, the optimizer can only rely on statistics. If they are missing
> on
> the remote site, he prefers "staying local" although filtering data
> remotely
> might have yielded better performance. Why put a stress on a remote server
> if
> poor performance is expected anyway?
i guess this is where common sense gets to take a back seat.

> But when statistics are up to date on both sites, then the optimizer still
> has full control over the execution of queries (unless specified otherwise
> through the use of hints) which basically means that he can still make the
> wrong assumptions - and I'm as much in the dark here as you are.
Is there anyone who could explain the bewildering optimizer choices?|||Can anyone
ever imagine
any situation
under any circumstances
in any manner
in any capacity
in any way
with any product
running on any kind of data link
fast or slow
on any database setup
existing in any universe
moving at any velocity
at any time since the big bang itself
where it would be better to filter rows after fetching them?|||Here is some SQL to create a table named "Transactions", and will contain
some fields including "TransactionDate", "TransactionNumber",
"TransactionGUID"
USE pubs
go
DROP TABLE Transactions
go
CREATE Table Transactions (
[TransactionID] [int] NOT NULL IDENTITY,
TransactionGUID uniqueidentifier NOT NULL default newid(),
[title] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[type] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[pub_id] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[price] [money] NULL ,
[advance] [money] NULL ,
[royalty] [int] NULL ,
[ytd_sales] [int] NULL ,
[notes] [varchar] (200) NULL ,
[TransactionDate] [datetime] NOT NULL DEFAULT (getdate())
)
go
INSERT INTO Transactions (title, type, pub_id, price, advance, royalty,
ytd_sales, notes, TransactionDate)
SELECT
[title],
[type],
[pub_id],
[price],
[advance],
[royalty],
[ytd_sales],
[notes],
CAST(CAST([pubdate] as real) + RAND(3234)*2000 - 1000 AS datetime)
FROM Titles
CROSS JOIN (
SELECT (a.Number * 256) + b.Number AS Number
FROM master..spt_values a,
master..spt_values b
WHERE a.Type = 'p'
AND b.Type = 'p') numbers
You'll have to create the linked server yourself, you'll need two servers.
You'll also have to substitute your own values for TransactionGUID,
TransactionDate, TransactionNumber|||I would hazard a guess that the local SQL server has no knowledge of the
statistics on the remote server and can only determine which query to send
across to the remote server based on the existence of keys (possibly indexes
or unique indexes).
The question would be what indexes exist on the remote table. I believe if
you look at these closely your results will turn out to be much more
consistent than you think.
"ML" <ML@.discussions.microsoft.com> wrote in message
news:064A71B2-9B67-4F97-8635-6854A4B9551C@.microsoft.com...
> If you think you know your data better than the optimizer does (as might
be
> the case - especially with outdated statistics), you could use the REMOTE
> join hint to tell the optimizer to process the join on the remote server.
> Basically, the optimizer can only rely on statistics. If they are missing
on
> the remote site, he prefers "staying local" although filtering data
remotely
> might have yielded better performance. Why put a stress on a remote server
if
> poor performance is expected anyway?
> But when statistics are up to date on both sites, then the optimizer still
> has full control over the execution of queries (unless specified otherwise
> through the use of hints) which basically means that he can still make the
> wrong assumptions - and I'm as much in the dark here as you are.
>
> ML
> --
> http://milambda.blogspot.com/|||I am guessing to a fair extent here, but some possible (and I think likely)
explanations for this behavior...
The main thing is, I think, that SQL Server has no statistics available when
accessing a remote database and makes a best guess based on the existence of
keys.
Incidently, when working with Oracle 8 I found similar issues with their
"database links".

>Question #1: Why doesn't it include the WHERE clause in the remote query?
No Index on transactionGUID?

>Question #2: How is it bringing over 10 rows only? (hint: cursor)
10 rows is the size of the chunks retrieved. The top 1 is in the first ten
rows, so SQL Server doesnt bother retrieving any further data from the
remote database.

>Question #3: Why does it include the WHERE clause in the remote query.
Index or PK on TransactionNumber?

>Question #4: Why does it include the WHERE clause in the remote query?
Index or PK on TransactionDate?

>Question #5: Why does it not include the WHERE clause in the remote query?
Entire view is retrieved. The optimizer will sometimes do this with local
views as well, depending on how they are used.

>Question #6: Why is it now including the TOP inside the remote query, when
>my earlier issue of a TOP query didn't include the TOP limiter?
>Question #7: Why is it now including the WHERE clause in the remote query,
>when not including a TOP 1 it won't include the where clause limiter?
>Question #8: Why does it include the WHERE clause when i have any TOP
>limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
>where clause when i don't?
Top says return only X rows from the view. Because it needs to order the
results, a different execution plan is returned. This will happen with
local views also, depending on statistics and how they are used.

>Question #9: Why does it include the WHERE clause when i query through a
>derived table, and not through a view?
The local SQL server is rewriting your select to directly access the table.
This particular SQL may look different to a human, but logically, and to SQL
Server, it is identical to your 3rd example. This is perfectly consistent.
"Ian Boyd" <ian.msnews010@.avatopia.com> wrote in message
news:O$7N6IZWGHA.4424@.TK2MSFTNGP05.phx.gbl...
> i'll just keep posting the problems, as my own archive.
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
> Issues the remote query:
> SOURCE:(servertest),
> QUERY:(
> SELECT Tbl1001."TransactionGUID"
> Col1003,Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
> And tries to bring back 8.5M rows - and takes minutes to return, when
really
> there is only one row.
> Question #1: Why doesn't it include the WHERE clause in the remote query?
> If i change the query to:
> SELECT TOP 1 TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
> It still issues the remote query:
> SOURCE:(servertest),
> QUERY:(
> SELECT
> Tbl1001."TransactionGUID" Col1003,
> Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
> But this time it only brings over 10 rows (somehow), performs a top 1
> filter, and returns instantly.
> Question #2: How is it bringing over 10 rows only? (hint: cursor)
> If i change the query to:
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionNumber = 9679
> It issues the remote query:
> SOURCE:(servertest),
> QUERY:(
> SELECT
> Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> WHERE Tbl1001."TransactionNumber"=(9679)
> )
> which returns only one row, and returns instantly. Obviously SQL Server
can
> *sometimes* do the optimization, othertimes it won't.
> Question #3: Why does it include the WHERE clause in the remote query.
> If i change the query to:
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionDate = '2002-06-10 08:19:10.513'
> It issues the remote query:
> SOURCE:(servertest),
> QUERY:(
> SELECT
> Tbl1001."TransactionNumber" Col1004
> FROM "cmsarchivetraining"."dbo"."Transactions_90" Tbl1001
> WHERE Tbl1001."TransactionDate"='2002-06-10T08:19:10.513'
> )
> Question #4: Why does it include the WHERE clause in the remote query?
> Now i create a view:
> CREATE VIEW CMSArchiveTranasctions AS
> SELECT *
> FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90
> and issue the query:
> SELECT TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
> It wants to issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
> SELECT
> Tbl1001."TransactionNumber" Col1005
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
> And tries to bring back 8.5M rows - and takes minutes to return, when
really
> there is only one row.
> Question #5: Why does it not include the WHERE clause in the remote query?
> If i change the query to:
> SELECT TOP 1 TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
> it issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
> SELECT TOP 1 Col1004
> FROM (
> SELECT
> Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> WHERE Tbl1001."TransactionNumber"=(9679)
> ) Qry1018
> )
> Question #6: Why is it now including the TOP inside the remote query, when
> my earlier issue of a TOP query didn't include the TOP limiter?
> Question #7: Why is it now including the WHERE clause in the remote query,
> when not including a TOP 1 it won't include the where clause limiter?
> Question #8: Why does it include the WHERE clause when i have any TOP
> limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
> where clause when i don't?
> So i try changing my query to:
> SELECT TransactionNumber
> FROM (
> SELECT *
> FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90) t
> WHERE TransactionNumber = 9679
> And it issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
> SELECT
> Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> WHERE Tbl1001."TransactionNumber"=(9679)
> )
> So querying through a view it will not include the WHERE clause, but if i
> include the view as a derived table, it can include the WHERE clause.
> Question #9: Why does it include the WHERE clause when i query through a
> derived table, and not through a view?
> So i try changing the query to
> SELECT TransactionNumber
> FROM (
> SELECT *
> FROM CMSArchiveTransactions) t
> WHERE TransactionNumber = 9679
> So here we are, pay attention to this one. If i attempt to query the view
> directly, it doesn't get optimized. Now i am going to query my view
THROUGH
> a derived table. What do you think it will do?
> SOURCE:(SERVERTEST),
> QUERY:(
> SELECT Tbl1001."TransactionNumber" Col1004
> FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> WHERE Tbl1001."TransactionNumber"=(9679)
> )
> It in fact now DOES include the where clause, but if i try to query my
view
> directly, it can't optimize it. This is a stunning development.
> It means the rules are random, which makes it difficult to optimize
> distributed queries.
>
> Rules for including where clause:
> Filter on uniqueidentifer:
> Query includes TOP 1: Yes*
> Query includes TOP 2: No
> Query includes TOP n: No
> Query includes TOP n PERCENT: No
> Query doesn't include TOP: No
> Filter on integer:
> Query linked SQL Server: Yes
> Query view that queries linked SQL Server:
> Query view directly:
> Query includes TOP 1: Yes
> Query includes TOP 2: Yes
> Query includes TOP 100: Yes
> Query includes TOP 999999999: Yes
> Query includes TOP 100 PERCENT: Yes
> Query doesn't include TOP: No
> Query view through derived table:
> Query includes TOP 1: Yes
> Query includes: TOP 2: Yes
> Query includes: TOP n: Yes
> Query includes: TOP 100 PERCENT: Yes
> Query includes TOP: Yes
>|||Absolutely. The same way there are cases when a full table scan is better
than using an index.
If you have 1,000 rows in a table and you add a filter on and indexed field
that returns 999 records, then SQL server will do a full table scan instead
of using the index.
If you have 1,000 rows in a remote table and you add a filter that returns
999 records, let the database that is actually going to use the data spend
the cycles filtering it. Let the remote server just do a quick IO and send
the data over the network with minimal CPU usage.
Now, if the local server has statistics on the tables in the remote
database, then it could make much better decisions regarding when to send
the filter and when to not. Because it doesn't know the remote database, it
has no idea how many rows will be returned (unless querying on a key field
maybe?) and makes a (rather useless) judgement call.
"Ian Boyd" <ian.msnews010@.avatopia.com> wrote in message
news:%23fsEP7ZWGHA.2064@.TK2MSFTNGP03.phx.gbl...
> Can anyone
> ever imagine
> any situation
> under any circumstances
> in any manner
> in any capacity
> in any way
> with any product
> running on any kind of data link
> fast or slow
> on any database setup
> existing in any universe
> moving at any velocity
> at any time since the big bang itself
> where it would be better to filter rows after fetching them?
>sql

No comments:

Post a Comment