Wednesday, March 28, 2012

More index questions

I'm learning about indexes and performance, looking at some of my queries an
d
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about 50
0
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is thi
s
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context anyway
?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
> I'm learning about indexes and performance, looking at some of my queries
> and
> comparing them with the Index Tuning Wizard (ITW) to figure things out.
> BTW, is there a way to use the Profiler to "get the distinct queries [
on
> table x]"? I would really like to see every query on a table when looking
> at
> the indexes on them, but I can't figure out how to do this.
> Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an
> int
> fkey (security), two floats (one unused), a type, a date and a varchar
> notes
> (typically empty). Data is unique on fkey, type and date. We insert about
> 500
> rows a day in one big batch, so insert performance isn't _too_ much of an
> issue, but because it's a batch you tend to notice it.
> Indexes are currently on the pkey (clustered unique), fkey, type and date.
> There is also a constraint on fkey/type/date.
> My GUESS is that the common types of queries are...
> 1) price for this security and this date
> 2) prices for this date
> 3) prices for this security
> I don't really know which ones are common or not, which is why I was
> asking
> how I could find this out.
> dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
> this makes it an excellent target for an index. In reality there are only
> 5
> values in total, and the VAST majority of the records are a single one.
> Can
> anyone explain this output? Also, why didn't the ITW suggest removing it,
> when it obviously will never be used?
> ITW also suggested removing the index on date. This seems suspect, but I
> can't prove it. The reason it's suspect is that we would have roughtly 500
> prices per date, and about the same number per security. This would seem
> to
> imply that either index should be useful depending on the query. We tend
> to
> GROUP BY on the security, in order to get the newest price per security,
> which would also seem to imply that the date index would be useful -- is
> this
> possible? Can it use the fkey to extract the unique securities, the date
> to
> get the newest for all securities, and then hash match?
> Basically ITW simply doesn't have enough information for me to make up my
> mind. If I could see the queries, how often they occured, and what indexes
> they used, then it would be useful. Now, however, I don't trust the
> numbers.
> One last question about the compound constraint. Since the first column in
> it is the fkey, should I instead make it an index and drop the fkey index?
> What is the difference between a constraint and index in this context
> anyway?
A foreign key constraint will generate a check on insert and update to
guarantee that a related row exists in the primary key table. The
constraint does not store any data, and is no help at all in querying. In
generay you should support each foreign key constraint with an index on the
foreign key table on the foreign key columns.
Now. Your bigger question. What indexing scheme to use. First off since
you only insert 500 rows/day, and you do it in one big batch, you should
probably change your clustered index. You will need to run Profiler to
capture a typical query workload, and this will drive your choice of index.
I suspect that the rignt clustered index is a unique clustered index on
either (security,date,type) or (date,security,type). Which one will depend
on the query workload. Then you will have an index for your primary key,
and clustered index for your alternate key. Once you have that, rerun ITW
and see where you are.

> I assume that the constraint doesn't do a table search for every insert,
> so
> there's some sort of index that it's using anyway?
Yes it uses the primary key index on the PK table, but that doesn't help
your query.
David|||I did respond, quite a bit, to another post you had, but what you are asking
is a pretty generic question, since we do not have specific enough
information here to give detailed repsones; so, about the best you are going
to get is a generic response...but that can be useful too.
1. Relational theory dictates that before an entity is evein 1NF (first
normal form) you MUST specify a Key. Any entity may have one or multiple
candidate keys. There may be a natural key. And, there may even be
complex, compound (composite) keys. One of these has to be the Primary Key.
The rest are alternate keys and should have Unique Key Constraints defined
on them. Oftentimes, the Primary Business Key is a lousy physical key; so,
a surrogate, Auto-Generated, IDENTITY is constructed then chosen as the
Primary Key, physically. This is fine, but keep in mind that that physical
database is just an implementation of a Logical Database Model that supports
a very specific Business Model and that NO surrogate key defines a business
key; thus, the term surrogate. The physical alternate keys, that model the
Business Keys, should always be constrained in order to enforce the logical
business rules.
2. As you go through the normalization process, you wll, hopefully, end up
at 3NF (third normal form), or higher, and a physical consequence of which
is that you will end up with dependent tables that must be constrained with
Foreign Key definitions. In the dependents, these FKCs should be indexed.
In the parent, they must be Uniquely constrained by a PKC or UCs. In the
child, they normally will be reasonably selectible, if not highly
selectible.
3. EVERY TABLE should have a clustered index defined. This is a purely
physical argument and is highly dependent on the SQL Server DBMS. Heaps,
which is what you get when a clustered index is NOT defined, perform
horibly, in terms of storage management and database maitenance. But many
have trouble choosing a proper definition. Here are some generic tips. The
attributes should not be modified but minimally at most. Most often you
would be better served with a single attribute as opposed to a composite
definition. Choose an attribute that is queried often, but by a range
mechanism as opposed to singleton selects (medium cardinality). Then,
finally, the attribute should be as narrow as possible, which is helped by
having single column definitions, but even those, numerics are better than
characters, and narrow characters are better than wider ones. You should
examine the alternate key definitions as possible clustered index candidates
as ALL clustered indexes use UNIQUENESS, either because they are defined as
unique, or because SQL Server appends a GUID to the attribute definition to
make them unique. So, choosing an already uniquely defined key does keep
the overhead down.
4. Now, go through your tables and determine which columns are highly
selectible (high cardinality), that is, if you were to query the table, with
a specific value, what fraction of the table rows would you expect to
return? The fewer, the better candidate for a non-clustered index that
attribute would be. At this stage, I would concentrate solely on single
attribute column indexes. The reason is that statistical histograms are
only stored for the most significant attribute; thus, it is very important
for that attribute to be highly selectible.
The only reason having a composite index would be of value would be becuase
what is known as a "covering" index. A clustered index is a covering index
by definition. This is because a covering index has all the data defining
the index stored on the index page. So, if first name is apart of a name
index, where the last name is the first defining attribute, after the
page(s) is located by a search on last name, SQL Server can also pull the
first name attribute from the index page itself instead of have to run a
bookmark lookup to the referenced data page. Not too many indexes can be
defined this way. A clustered index can because the entire data row is
stored on the clustered index page because, for a clustered index, the index
page IS the data page. Another example would be a non-clustered index but
where every table attribute is used to define it, but that would be highly
redundant. So, besides the clustered index, there are not to many
candidates that could be defined for a covering index.
Moreover, having many singleton indexes provides SQL Server an oportunity to
pull several indexes to perform a set intersection routine, which is far
more efficient than having any one index with composite values.
5. Finally, run sp_createstats against your database. This will create
single-column statistics against every table and every column that does not
already have a supporting index. This can help you determine missed columns
that have high cardinality that could be assisted by creating a
non-clustered index. These statistics also assist the Optimizer to quickly
reject low cardinality columns.
Well, I know this covered quite a bit of gound, some of it not too well, but
there is a ton of literature out there that goes into a lot of detail on
this subject. You need to dig into it. But, a word of caution, many of
these publications are just flat-out wrong. Take everything you read,
your's truely included, with some scepticism. Spend time in the Query
Analyzer. Examine the execution plans. Use hints to override the
Optimizer, not as a production practice, but to see where and how the
Optimizer behaves and why it is far more likely to pick an execution plan
than you are.
Lastly, always make sure your statistics are up to date before you begin any
analysis work. The Optimizer is only as good as these statistics are.
Sincerely,
Anthony Thomas
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
I'm learning about indexes and performance, looking at some of my queries
and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about
500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is
this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context
anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury

No comments:

Post a Comment