Money Data Type: index

Showing posts with label index. Show all posts

Friday, March 30, 2012

More on Full Text Searching across multiple tables

Imagine you're building a search for a movie database which you wanted to index actors, quotes from scenes and movie names into a single search.... how do you accomplish this using full text search? The closest I can think of is... (for a sample search for "Walken"

select RANK, actorId as Id from FREETEXTTABLE( actors, *, 'ISABOUT (+ WALKEN + WEIGHT(1.0))') a JOIN actors b on a.[key] = b.actorid
UNION ALL
select RANK, sceneId as Id from FREETEXTTABLE( scenes, *, 'ISABOUT (+ WALKEN + WEIGHT(1.0))') a JOIN scenes b on a.[key] = b.sceneId
UNION ALL
select RANK, movieId as Id from FREETEXTTABLE( movie, *, 'ISABOUT (+ WALKEN + WEIGHT(1.0))') a JOIN movie b on a.[key] = b.movieId

But it doesn't rank correctly. Suggestions?

How do you want the ranking to work?

What do you want to return? have you tried

select RANK, actorId as Id from CONTAINSTABLE( actors, *, 'WALKEN') a JOIN actors b on a.[key] = b.actorid
UNION ALL
select RANK, sceneId as Id from CONTAINSTABLE( scenes, *, 'WALKEN') a JOIN scenes b on a.[key] = b.sceneId
UNION ALL
select RANK, movieId as Id from CONTAINSTABLE( movie, *, 'WALKEN') a JOIN movie b on a.[key] = b.movieId

The other thing to note is that RANK is not a fixed number it is relative to a number of factors, largely it is related to the batch of data a record was indexed with. So if you have records being indexed in differing batchs you can end up with odd ranking.

Bottom linke is a rank from tableA can't be used to compare with a rank from tableB

|||Ideally, i'd like to return the item(s), in order that match the query string. This means if the scene is a closer match than the movie, the scene is returned first. Each one should have the id returned which i can then construct a result from.|||

Unfortunately you can't use the rank from one index to compare with a rank from another index.

Have a lookin BOL under the heading "Understanding Ranking "

|||Right, what i'm looking for is a solution that will work across multiple tables... any suggestions?

Wednesday, March 28, 2012

More index questions

I'm learning about indexes and performance, looking at some of my queries and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about 500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
> I'm learning about indexes and performance, looking at some of my queries
> and
> comparing them with the Index Tuning Wizard (ITW) to figure things out.
> BTW, is there a way to use the Profiler to "get the distinct queries [on
> table x]"? I would really like to see every query on a table when looking
> at
> the indexes on them, but I can't figure out how to do this.
> Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an
> int
> fkey (security), two floats (one unused), a type, a date and a varchar
> notes
> (typically empty). Data is unique on fkey, type and date. We insert about
> 500
> rows a day in one big batch, so insert performance isn't _too_ much of an
> issue, but because it's a batch you tend to notice it.
> Indexes are currently on the pkey (clustered unique), fkey, type and date.
> There is also a constraint on fkey/type/date.
> My GUESS is that the common types of queries are...
> 1) price for this security and this date
> 2) prices for this date
> 3) prices for this security
> I don't really know which ones are common or not, which is why I was
> asking
> how I could find this out.
> dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
> this makes it an excellent target for an index. In reality there are only
> 5
> values in total, and the VAST majority of the records are a single one.
> Can
> anyone explain this output? Also, why didn't the ITW suggest removing it,
> when it obviously will never be used?
> ITW also suggested removing the index on date. This seems suspect, but I
> can't prove it. The reason it's suspect is that we would have roughtly 500
> prices per date, and about the same number per security. This would seem
> to
> imply that either index should be useful depending on the query. We tend
> to
> GROUP BY on the security, in order to get the newest price per security,
> which would also seem to imply that the date index would be useful -- is
> this
> possible? Can it use the fkey to extract the unique securities, the date
> to
> get the newest for all securities, and then hash match?
> Basically ITW simply doesn't have enough information for me to make up my
> mind. If I could see the queries, how often they occured, and what indexes
> they used, then it would be useful. Now, however, I don't trust the
> numbers.
> One last question about the compound constraint. Since the first column in
> it is the fkey, should I instead make it an index and drop the fkey index?
> What is the difference between a constraint and index in this context
> anyway?
A foreign key constraint will generate a check on insert and update to
guarantee that a related row exists in the primary key table. The
constraint does not store any data, and is no help at all in querying. In
generay you should support each foreign key constraint with an index on the
foreign key table on the foreign key columns.
Now. Your bigger question. What indexing scheme to use. First off since
you only insert 500 rows/day, and you do it in one big batch, you should
probably change your clustered index. You will need to run Profiler to
capture a typical query workload, and this will drive your choice of index.
I suspect that the rignt clustered index is a unique clustered index on
either (security,date,type) or (date,security,type). Which one will depend
on the query workload. Then you will have an index for your primary key,
and clustered index for your alternate key. Once you have that, rerun ITW
and see where you are.

> I assume that the constraint doesn't do a table search for every insert,
> so
> there's some sort of index that it's using anyway?
Yes it uses the primary key index on the PK table, but that doesn't help
your query.
David
|||I did respond, quite a bit, to another post you had, but what you are asking
is a pretty generic question, since we do not have specific enough
information here to give detailed repsones; so, about the best you are going
to get is a generic response...but that can be useful too.
1. Relational theory dictates that before an entity is evein 1NF (first
normal form) you MUST specify a Key. Any entity may have one or multiple
candidate keys. There may be a natural key. And, there may even be
complex, compound (composite) keys. One of these has to be the Primary Key.
The rest are alternate keys and should have Unique Key Constraints defined
on them. Oftentimes, the Primary Business Key is a lousy physical key; so,
a surrogate, Auto-Generated, IDENTITY is constructed then chosen as the
Primary Key, physically. This is fine, but keep in mind that that physical
database is just an implementation of a Logical Database Model that supports
a very specific Business Model and that NO surrogate key defines a business
key; thus, the term surrogate. The physical alternate keys, that model the
Business Keys, should always be constrained in order to enforce the logical
business rules.
2. As you go through the normalization process, you wll, hopefully, end up
at 3NF (third normal form), or higher, and a physical consequence of which
is that you will end up with dependent tables that must be constrained with
Foreign Key definitions. In the dependents, these FKCs should be indexed.
In the parent, they must be Uniquely constrained by a PKC or UCs. In the
child, they normally will be reasonably selectible, if not highly
selectible.
3. EVERY TABLE should have a clustered index defined. This is a purely
physical argument and is highly dependent on the SQL Server DBMS. Heaps,
which is what you get when a clustered index is NOT defined, perform
horibly, in terms of storage management and database maitenance. But many
have trouble choosing a proper definition. Here are some generic tips. The
attributes should not be modified but minimally at most. Most often you
would be better served with a single attribute as opposed to a composite
definition. Choose an attribute that is queried often, but by a range
mechanism as opposed to singleton selects (medium cardinality). Then,
finally, the attribute should be as narrow as possible, which is helped by
having single column definitions, but even those, numerics are better than
characters, and narrow characters are better than wider ones. You should
examine the alternate key definitions as possible clustered index candidates
as ALL clustered indexes use UNIQUENESS, either because they are defined as
unique, or because SQL Server appends a GUID to the attribute definition to
make them unique. So, choosing an already uniquely defined key does keep
the overhead down.
4. Now, go through your tables and determine which columns are highly
selectible (high cardinality), that is, if you were to query the table, with
a specific value, what fraction of the table rows would you expect to
return? The fewer, the better candidate for a non-clustered index that
attribute would be. At this stage, I would concentrate solely on single
attribute column indexes. The reason is that statistical histograms are
only stored for the most significant attribute; thus, it is very important
for that attribute to be highly selectible.
The only reason having a composite index would be of value would be becuase
what is known as a "covering" index. A clustered index is a covering index
by definition. This is because a covering index has all the data defining
the index stored on the index page. So, if first name is apart of a name
index, where the last name is the first defining attribute, after the
page(s) is located by a search on last name, SQL Server can also pull the
first name attribute from the index page itself instead of have to run a
bookmark lookup to the referenced data page. Not too many indexes can be
defined this way. A clustered index can because the entire data row is
stored on the clustered index page because, for a clustered index, the index
page IS the data page. Another example would be a non-clustered index but
where every table attribute is used to define it, but that would be highly
redundant. So, besides the clustered index, there are not to many
candidates that could be defined for a covering index.
Moreover, having many singleton indexes provides SQL Server an oportunity to
pull several indexes to perform a set intersection routine, which is far
more efficient than having any one index with composite values.
5. Finally, run sp_createstats against your database. This will create
single-column statistics against every table and every column that does not
already have a supporting index. This can help you determine missed columns
that have high cardinality that could be assisted by creating a
non-clustered index. These statistics also assist the Optimizer to quickly
reject low cardinality columns.
Well, I know this covered quite a bit of gound, some of it not too well, but
there is a ton of literature out there that goes into a lot of detail on
this subject. You need to dig into it. But, a word of caution, many of
these publications are just flat-out wrong. Take everything you read,
your's truely included, with some scepticism. Spend time in the Query
Analyzer. Examine the execution plans. Use hints to override the
Optimizer, not as a production practice, but to see where and how the
Optimizer behaves and why it is far more likely to pick an execution plan
than you are.
Lastly, always make sure your statistics are up to date before you begin any
analysis work. The Optimizer is only as good as these statistics are.
Sincerely,
Anthony Thomas
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
I'm learning about indexes and performance, looking at some of my queries
and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about
500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is
this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context
anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury

More index questions

I'm learning about indexes and performance, looking at some of my queries an
d
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about 50
0
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is thi
s
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context anyway
?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
> I'm learning about indexes and performance, looking at some of my queries
> and
> comparing them with the Index Tuning Wizard (ITW) to figure things out.
> BTW, is there a way to use the Profiler to "get the distinct queries [
on
> table x]"? I would really like to see every query on a table when looking
> at
> the indexes on them, but I can't figure out how to do this.
> Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an
> int
> fkey (security), two floats (one unused), a type, a date and a varchar
> notes
> (typically empty). Data is unique on fkey, type and date. We insert about
> 500
> rows a day in one big batch, so insert performance isn't _too_ much of an
> issue, but because it's a batch you tend to notice it.
> Indexes are currently on the pkey (clustered unique), fkey, type and date.
> There is also a constraint on fkey/type/date.
> My GUESS is that the common types of queries are...
> 1) price for this security and this date
> 2) prices for this date
> 3) prices for this security
> I don't really know which ones are common or not, which is why I was
> asking
> how I could find this out.
> dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
> this makes it an excellent target for an index. In reality there are only
> 5
> values in total, and the VAST majority of the records are a single one.
> Can
> anyone explain this output? Also, why didn't the ITW suggest removing it,
> when it obviously will never be used?
> ITW also suggested removing the index on date. This seems suspect, but I
> can't prove it. The reason it's suspect is that we would have roughtly 500
> prices per date, and about the same number per security. This would seem
> to
> imply that either index should be useful depending on the query. We tend
> to
> GROUP BY on the security, in order to get the newest price per security,
> which would also seem to imply that the date index would be useful -- is
> this
> possible? Can it use the fkey to extract the unique securities, the date
> to
> get the newest for all securities, and then hash match?
> Basically ITW simply doesn't have enough information for me to make up my
> mind. If I could see the queries, how often they occured, and what indexes
> they used, then it would be useful. Now, however, I don't trust the
> numbers.
> One last question about the compound constraint. Since the first column in
> it is the fkey, should I instead make it an index and drop the fkey index?
> What is the difference between a constraint and index in this context
> anyway?
A foreign key constraint will generate a check on insert and update to
guarantee that a related row exists in the primary key table. The
constraint does not store any data, and is no help at all in querying. In
generay you should support each foreign key constraint with an index on the
foreign key table on the foreign key columns.
Now. Your bigger question. What indexing scheme to use. First off since
you only insert 500 rows/day, and you do it in one big batch, you should
probably change your clustered index. You will need to run Profiler to
capture a typical query workload, and this will drive your choice of index.
I suspect that the rignt clustered index is a unique clustered index on
either (security,date,type) or (date,security,type). Which one will depend
on the query workload. Then you will have an index for your primary key,
and clustered index for your alternate key. Once you have that, rerun ITW
and see where you are.

> I assume that the constraint doesn't do a table search for every insert,
> so
> there's some sort of index that it's using anyway?
Yes it uses the primary key index on the PK table, but that doesn't help
your query.
David|||I did respond, quite a bit, to another post you had, but what you are asking
is a pretty generic question, since we do not have specific enough
information here to give detailed repsones; so, about the best you are going
to get is a generic response...but that can be useful too.
1. Relational theory dictates that before an entity is evein 1NF (first
normal form) you MUST specify a Key. Any entity may have one or multiple
candidate keys. There may be a natural key. And, there may even be
complex, compound (composite) keys. One of these has to be the Primary Key.
The rest are alternate keys and should have Unique Key Constraints defined
on them. Oftentimes, the Primary Business Key is a lousy physical key; so,
a surrogate, Auto-Generated, IDENTITY is constructed then chosen as the
Primary Key, physically. This is fine, but keep in mind that that physical
database is just an implementation of a Logical Database Model that supports
a very specific Business Model and that NO surrogate key defines a business
key; thus, the term surrogate. The physical alternate keys, that model the
Business Keys, should always be constrained in order to enforce the logical
business rules.
2. As you go through the normalization process, you wll, hopefully, end up
at 3NF (third normal form), or higher, and a physical consequence of which
is that you will end up with dependent tables that must be constrained with
Foreign Key definitions. In the dependents, these FKCs should be indexed.
In the parent, they must be Uniquely constrained by a PKC or UCs. In the
child, they normally will be reasonably selectible, if not highly
selectible.
3. EVERY TABLE should have a clustered index defined. This is a purely
physical argument and is highly dependent on the SQL Server DBMS. Heaps,
which is what you get when a clustered index is NOT defined, perform
horibly, in terms of storage management and database maitenance. But many
have trouble choosing a proper definition. Here are some generic tips. The
attributes should not be modified but minimally at most. Most often you
would be better served with a single attribute as opposed to a composite
definition. Choose an attribute that is queried often, but by a range
mechanism as opposed to singleton selects (medium cardinality). Then,
finally, the attribute should be as narrow as possible, which is helped by
having single column definitions, but even those, numerics are better than
characters, and narrow characters are better than wider ones. You should
examine the alternate key definitions as possible clustered index candidates
as ALL clustered indexes use UNIQUENESS, either because they are defined as
unique, or because SQL Server appends a GUID to the attribute definition to
make them unique. So, choosing an already uniquely defined key does keep
the overhead down.
4. Now, go through your tables and determine which columns are highly
selectible (high cardinality), that is, if you were to query the table, with
a specific value, what fraction of the table rows would you expect to
return? The fewer, the better candidate for a non-clustered index that
attribute would be. At this stage, I would concentrate solely on single
attribute column indexes. The reason is that statistical histograms are
only stored for the most significant attribute; thus, it is very important
for that attribute to be highly selectible.
The only reason having a composite index would be of value would be becuase
what is known as a "covering" index. A clustered index is a covering index
by definition. This is because a covering index has all the data defining
the index stored on the index page. So, if first name is apart of a name
index, where the last name is the first defining attribute, after the
page(s) is located by a search on last name, SQL Server can also pull the
first name attribute from the index page itself instead of have to run a
bookmark lookup to the referenced data page. Not too many indexes can be
defined this way. A clustered index can because the entire data row is
stored on the clustered index page because, for a clustered index, the index
page IS the data page. Another example would be a non-clustered index but
where every table attribute is used to define it, but that would be highly
redundant. So, besides the clustered index, there are not to many
candidates that could be defined for a covering index.
Moreover, having many singleton indexes provides SQL Server an oportunity to
pull several indexes to perform a set intersection routine, which is far
more efficient than having any one index with composite values.
5. Finally, run sp_createstats against your database. This will create
single-column statistics against every table and every column that does not
already have a supporting index. This can help you determine missed columns
that have high cardinality that could be assisted by creating a
non-clustered index. These statistics also assist the Optimizer to quickly
reject low cardinality columns.
Well, I know this covered quite a bit of gound, some of it not too well, but
there is a ton of literature out there that goes into a lot of detail on
this subject. You need to dig into it. But, a word of caution, many of
these publications are just flat-out wrong. Take everything you read,
your's truely included, with some scepticism. Spend time in the Query
Analyzer. Examine the execution plans. Use hints to override the
Optimizer, not as a production practice, but to see where and how the
Optimizer behaves and why it is far more likely to pick an execution plan
than you are.
Lastly, always make sure your statistics are up to date before you begin any
analysis work. The Optimizer is only as good as these statistics are.
Sincerely,
Anthony Thomas
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
I'm learning about indexes and performance, looking at some of my queries
and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about
500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is
this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context
anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury

More index questions

I'm learning about indexes and performance, looking at some of my queries and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about 500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
> I'm learning about indexes and performance, looking at some of my queries
> and
> comparing them with the Index Tuning Wizard (ITW) to figure things out.
> BTW, is there a way to use the Profiler to "get the distinct queries [on
> table x]"? I would really like to see every query on a table when looking
> at
> the indexes on them, but I can't figure out how to do this.
> Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an
> int
> fkey (security), two floats (one unused), a type, a date and a varchar
> notes
> (typically empty). Data is unique on fkey, type and date. We insert about
> 500
> rows a day in one big batch, so insert performance isn't _too_ much of an
> issue, but because it's a batch you tend to notice it.
> Indexes are currently on the pkey (clustered unique), fkey, type and date.
> There is also a constraint on fkey/type/date.
> My GUESS is that the common types of queries are...
> 1) price for this security and this date
> 2) prices for this date
> 3) prices for this security
> I don't really know which ones are common or not, which is why I was
> asking
> how I could find this out.
> dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
> this makes it an excellent target for an index. In reality there are only
> 5
> values in total, and the VAST majority of the records are a single one.
> Can
> anyone explain this output? Also, why didn't the ITW suggest removing it,
> when it obviously will never be used?
> ITW also suggested removing the index on date. This seems suspect, but I
> can't prove it. The reason it's suspect is that we would have roughtly 500
> prices per date, and about the same number per security. This would seem
> to
> imply that either index should be useful depending on the query. We tend
> to
> GROUP BY on the security, in order to get the newest price per security,
> which would also seem to imply that the date index would be useful -- is
> this
> possible? Can it use the fkey to extract the unique securities, the date
> to
> get the newest for all securities, and then hash match?
> Basically ITW simply doesn't have enough information for me to make up my
> mind. If I could see the queries, how often they occured, and what indexes
> they used, then it would be useful. Now, however, I don't trust the
> numbers.
> One last question about the compound constraint. Since the first column in
> it is the fkey, should I instead make it an index and drop the fkey index?
> What is the difference between a constraint and index in this context
> anyway?
A foreign key constraint will generate a check on insert and update to
guarantee that a related row exists in the primary key table. The
constraint does not store any data, and is no help at all in querying. In
generay you should support each foreign key constraint with an index on the
foreign key table on the foreign key columns.
Now. Your bigger question. What indexing scheme to use. First off since
you only insert 500 rows/day, and you do it in one big batch, you should
probably change your clustered index. You will need to run Profiler to
capture a typical query workload, and this will drive your choice of index.
I suspect that the rignt clustered index is a unique clustered index on
either (security,date,type) or (date,security,type). Which one will depend
on the query workload. Then you will have an index for your primary key,
and clustered index for your alternate key. Once you have that, rerun ITW
and see where you are.
> I assume that the constraint doesn't do a table search for every insert,
> so
> there's some sort of index that it's using anyway?
Yes it uses the primary key index on the PK table, but that doesn't help
your query.
David|||I did respond, quite a bit, to another post you had, but what you are asking
is a pretty generic question, since we do not have specific enough
information here to give detailed repsones; so, about the best you are going
to get is a generic response...but that can be useful too.
1. Relational theory dictates that before an entity is evein 1NF (first
normal form) you MUST specify a Key. Any entity may have one or multiple
candidate keys. There may be a natural key. And, there may even be
complex, compound (composite) keys. One of these has to be the Primary Key.
The rest are alternate keys and should have Unique Key Constraints defined
on them. Oftentimes, the Primary Business Key is a lousy physical key; so,
a surrogate, Auto-Generated, IDENTITY is constructed then chosen as the
Primary Key, physically. This is fine, but keep in mind that that physical
database is just an implementation of a Logical Database Model that supports
a very specific Business Model and that NO surrogate key defines a business
key; thus, the term surrogate. The physical alternate keys, that model the
Business Keys, should always be constrained in order to enforce the logical
business rules.
2. As you go through the normalization process, you wll, hopefully, end up
at 3NF (third normal form), or higher, and a physical consequence of which
is that you will end up with dependent tables that must be constrained with
Foreign Key definitions. In the dependents, these FKCs should be indexed.
In the parent, they must be Uniquely constrained by a PKC or UCs. In the
child, they normally will be reasonably selectible, if not highly
selectible.
3. EVERY TABLE should have a clustered index defined. This is a purely
physical argument and is highly dependent on the SQL Server DBMS. Heaps,
which is what you get when a clustered index is NOT defined, perform
horibly, in terms of storage management and database maitenance. But many
have trouble choosing a proper definition. Here are some generic tips. The
attributes should not be modified but minimally at most. Most often you
would be better served with a single attribute as opposed to a composite
definition. Choose an attribute that is queried often, but by a range
mechanism as opposed to singleton selects (medium cardinality). Then,
finally, the attribute should be as narrow as possible, which is helped by
having single column definitions, but even those, numerics are better than
characters, and narrow characters are better than wider ones. You should
examine the alternate key definitions as possible clustered index candidates
as ALL clustered indexes use UNIQUENESS, either because they are defined as
unique, or because SQL Server appends a GUID to the attribute definition to
make them unique. So, choosing an already uniquely defined key does keep
the overhead down.
4. Now, go through your tables and determine which columns are highly
selectible (high cardinality), that is, if you were to query the table, with
a specific value, what fraction of the table rows would you expect to
return? The fewer, the better candidate for a non-clustered index that
attribute would be. At this stage, I would concentrate solely on single
attribute column indexes. The reason is that statistical histograms are
only stored for the most significant attribute; thus, it is very important
for that attribute to be highly selectible.
The only reason having a composite index would be of value would be becuase
what is known as a "covering" index. A clustered index is a covering index
by definition. This is because a covering index has all the data defining
the index stored on the index page. So, if first name is apart of a name
index, where the last name is the first defining attribute, after the
page(s) is located by a search on last name, SQL Server can also pull the
first name attribute from the index page itself instead of have to run a
bookmark lookup to the referenced data page. Not too many indexes can be
defined this way. A clustered index can because the entire data row is
stored on the clustered index page because, for a clustered index, the index
page IS the data page. Another example would be a non-clustered index but
where every table attribute is used to define it, but that would be highly
redundant. So, besides the clustered index, there are not to many
candidates that could be defined for a covering index.
Moreover, having many singleton indexes provides SQL Server an oportunity to
pull several indexes to perform a set intersection routine, which is far
more efficient than having any one index with composite values.
5. Finally, run sp_createstats against your database. This will create
single-column statistics against every table and every column that does not
already have a supporting index. This can help you determine missed columns
that have high cardinality that could be assisted by creating a
non-clustered index. These statistics also assist the Optimizer to quickly
reject low cardinality columns.
Well, I know this covered quite a bit of gound, some of it not too well, but
there is a ton of literature out there that goes into a lot of detail on
this subject. You need to dig into it. But, a word of caution, many of
these publications are just flat-out wrong. Take everything you read,
your's truely included, with some scepticism. Spend time in the Query
Analyzer. Examine the execution plans. Use hints to override the
Optimizer, not as a production practice, but to see where and how the
Optimizer behaves and why it is far more likely to pick an execution plan
than you are.
Lastly, always make sure your statistics are up to date before you begin any
analysis work. The Optimizer is only as good as these statistics are.
Sincerely,
Anthony Thomas
--
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:7058F853-8697-4DBA-9225-3B544D2E59A4@.microsoft.com...
I'm learning about indexes and performance, looking at some of my queries
and
comparing them with the Index Tuning Wizard (ITW) to figure things out.
BTW, is there a way to use the Profiler to "get the distinct queries [on
table x]"? I would really like to see every query on a table when looking at
the indexes on them, but I can't figure out how to do this.
Ok, so right now I'm looking at tblPrices, ~500,000 rows with a pkey, an int
fkey (security), two floats (one unused), a type, a date and a varchar notes
(typically empty). Data is unique on fkey, type and date. We insert about
500
rows a day in one big batch, so insert performance isn't _too_ much of an
issue, but because it's a batch you tend to notice it.
Indexes are currently on the pkey (clustered unique), fkey, type and date.
There is also a constraint on fkey/type/date.
My GUESS is that the common types of queries are...
1) price for this security and this date
2) prices for this date
3) prices for this security
I don't really know which ones are common or not, which is why I was asking
how I could find this out.
dbcc SHOW_STATISTICS (tblprices, type) said "0.0". According to the docs
this makes it an excellent target for an index. In reality there are only 5
values in total, and the VAST majority of the records are a single one. Can
anyone explain this output? Also, why didn't the ITW suggest removing it,
when it obviously will never be used?
ITW also suggested removing the index on date. This seems suspect, but I
can't prove it. The reason it's suspect is that we would have roughtly 500
prices per date, and about the same number per security. This would seem to
imply that either index should be useful depending on the query. We tend to
GROUP BY on the security, in order to get the newest price per security,
which would also seem to imply that the date index would be useful -- is
this
possible? Can it use the fkey to extract the unique securities, the date to
get the newest for all securities, and then hash match?
Basically ITW simply doesn't have enough information for me to make up my
mind. If I could see the queries, how often they occured, and what indexes
they used, then it would be useful. Now, however, I don't trust the numbers.
One last question about the compound constraint. Since the first column in
it is the fkey, should I instead make it an index and drop the fkey index?
What is the difference between a constraint and index in this context
anyway?
I assume that the constraint doesn't do a table search for every insert, so
there's some sort of index that it's using anyway?
Maury

Monday, March 12, 2012

monitoring index usage with profiler

How can you setup the Profiler to monitor which indexes are being used in SQL Server 2000 ?. I'm trying with a simple ad-hoc, specifying the actual index but I can't output the object name, id ... etc anything that'll tell me which index(s) are being us
ed.
Need some help. Thanx.
You can catch the execution plan, there you can see indexes used. Just be
sure to filter events to only those you are really interested in, because
you can get a lot of info.
Dejan Sarka, SQL Server MVP
Associate Mentor
Solid Quality Learning
More than just Training
www.SolidQualityLearning.com
"cqlboy" <anonymous@.discussions.microsoft.com> wrote in message
news:C5206D68-A403-44DC-986C-305196E4CF6D@.microsoft.com...
> How can you setup the Profiler to monitor which indexes are being used in
SQL Server 2000 ?. I'm trying with a simple ad-hoc, specifying the actual
index but I can't output the object name, id ... etc anything that'll tell
me which index(s) are being used.
> Need some help. Thanx.
|||Take a look at the ITW, the docs below combined with BOL will tell you how
to gather a representative workload and then the ITW will help you with the
first pass at tuning.
311826 INF: Index Tuning Wizard Best Practices
http://support.microsoft.com/?id=311826
Index Tuning Wizard 2000,
http://msdn.microsoft.com/library/techart/itwforsql.htm
Index Tuning Wizard 7.0,
http://msdn.microsoft.com/library/te...n_sqlindex.htm
Troubleshooting the Index Tuning Wizard,
http://msdn.microsoft.com/library/ps...tools_48ro.htm
Cindy Gross, MCDBA, MCSE
http://cindygross.tripod.com
This posting is provided "AS IS" with no warranties, and confers no rights.

Wednesday, March 7, 2012

Monitor Object Access

I would like to remove an index from my database, but I need to verify
that it is not being accessed before I do that.

I tried running Profiler, but it doesn't seem to be giving me the
information I need. I put in a filter for ObjectName, but it seems to
be ignoring it - i get several entries in the profile trace, but no
value in the ObjectName column.

Is there an easy way to monitor for object access?

Thanks so much!

SQL Servef 2000 SP4
(I also have 2005, SP2, if there's a better way to do this with 2005)In SQL Server 2005 I have used the sys.dm_db_index_usage_stats dynamic
management view to check for index usage. Essentially a new row is added for
the index when it is used. The catch is that the info in
sys.dm_db_index_usage_stats is reset when the SQL Server service is
restarted.

HTH,

Plamen Ratchev
http://www.SQLStudio.com|||That is PERFECT!! Thank you so much!

On Apr 26, 12:25 am, "Plamen Ratchev" <Pla...@.SQLStudio.comwrote:

Quote:

Originally Posted by

In SQL Server 2005 I have used the sys.dm_db_index_usage_stats dynamic
management view to check for index usage. Essentially a new row is added for
the index when it is used. The catch is that the info in
sys.dm_db_index_usage_stats is reset when the SQL Server service is
restarted.
>
HTH,
>
Plamen Ratchevhttp://www.SQLStudio.com

|||PERFECT!! Thank you so much!!

On Apr 26, 12:25 am, "Plamen Ratchev" <Pla...@.SQLStudio.comwrote:

Quote:

Originally Posted by

Monitor Index Usage.

Hi,
Is there any tool out there that can monitor what indexes are being used in
SQL Server? Example a filter in SQL Server Profiler for index usage?
Marcel
mpighin wrote:
> Hi,
> Is there any tool out there that can monitor what indexes are being
> used in SQL Server? Example a filter in SQL Server Profiler for index
> usage?
I don't know a direct way at the moment, but you can get Profiler to write
text execution plans and search them for your index's name. HTH
Kind regards
robert

Friday, March 30, 2012

more questions

More on Full Text Searching across multiple tables

Wednesday, March 28, 2012

More Materialized View questions

More Materialized View questions

More index questions

More index questions

More index questions

Monday, March 12, 2012

monitoring index usage with profiler

Wednesday, March 7, 2012

Monitor Object Access

Monitor Index Usage.

Money Data Type

Blog Archive

About Me