Benchmark IT Consulting Rotating Header Image

Update Statistics Before or After an Index Rebuild?

So a Twitter friend of mine @peschkaj tweeted this morning:

“Just finished adding scripts to update stats and rebuild indexes after our nightly and weekly loads. Now on to… something else.”

So I asked the question:

“now the question is… did you update stats BEFORE or AFTER the index rebuild? :)

He replied with the answer:

“I update the stats after”

Is this the right answer?

And then it struck me :) What a great idea for a post.

Instead of just providing the answer let's go through the following scripts and find out…

Let's build ourselves a table and populate it with some sample data

SET NOCOUNT ON

DECLARE @RowID INT
SET 
@RowID 0

CREATE TABLE tblIndexStatsMaintenance(
  
ID INT,
  
NonIndexedData CHAR(2000)
  
CONSTRAINT PK_tblIndexStatsMaintenance PRIMARY KEY CLUSTERED (ID))

WHILE @RowID <> 15000
  
BEGIN
     SET 
@RowID @RowID 1
     
INSERT INTO tblIndexStatsMaintenance(IDNonIndexedData)
     
SELECT @RowID'This is Row ' CONVERT(VARCHAR(5), @RowID)
  
END

Now let's see what the statistics look like for the Primary Key

DBCC SHOW_STATISTICS ('tblIndexStatsMaintenance', PK_tblIndexStatsMaintenance) WITH STAT_HEADER

Hmmmm not very exciting… a bunch of NULLS. So now what do we do first rebuild the index or update the statistics?

Let's rebuild the index to start with

ALTER INDEX PK_tblIndexStatsMaintenance ON tblIndexStatsMaintenance REBUILD

Now what does our statistics look like?

DBCC SHOW_STATISTICS ('tblIndexStatsMaintenance', PK_tblIndexStatsMaintenance) WITH STAT_HEADER

Well look at that… rebuilding the index also updated the statistics. Awesome. But what about the statistics that are NOT part of an index?

Let's create a non-indexed statistic

CREATE STATISTICS stats_NonIndexedData ON tblIndexStatsMaintenance(NonIndexedData)

Now let's look at what that statistic looks like

DBCC SHOW_STATISTICS ('tblIndexStatsMaintenance', stats_NonIndexedData) WITH STAT_HEADER

OK looks good… now let's rebuild that index again

ALTER INDEX PK_tblIndexStatsMaintenance ON tblIndexStatsMaintenance REBUILD

and let's look at the statistic again

DBCC SHOW_STATISTICS ('tblIndexStatsMaintenance', stats_NonIndexedData) WITH STAT_HEADER

No change… well we can fix that let's run

EXEC sp_updatestats

and…

DBCC SHOW_STATISTICS ('tblIndexStatsMaintenance', stats_NonIndexedData) WITH STAT_HEADER

hmmmm…. why didn’t this statistic get updated?
Starting in SQL 2005 the sp_updatestats only updates statistics that it determines to need it. You can force it to update the statistics using UPDATE STATISTICS but that’s not the purpose of this post/example.

So back to our original question of the post: “Should you update stats before or after an index rebuild?”… you may of already known the answer when you started reading the post but you might not of fully understood why. You update statistics after an index rebuild ( @peschkaj is correct… was there any doubt? :) ).

Your index rebuild will automatically update the statistics for the index (with 100% sample size which is usually better then what you end up with when using sp_updatestats). After the index rebuild is complete you can use sp_updatestats to update the other statistics that require attention (2005+).

Enjoy!!

Post to Twitter Post to Delicious Post to Digg Post to StumbleUpon

13 Comments

  1. Glad I could be of service, sir.

    Here’s what we do in production:

    I use Michelle’s dba_indexDefragStandard_sp from SQL Server Pedia to defrag all of the indexes on tables (due to our constant replication they need it on a nightly basis). Afterwards, I have a script that runs UPDATE STATISTICS WITH FULLSCAN on all tables. This sequence of events happens once a week.

    Every night we also have several massive load processes that truncate summary tables and rebuild them from the previous day’s data. These get special attention with their own dba_indexDefragStandard_sp run and UPDATE STATISTICS WITH FULLSCAN.

    Even on our production DB the full index reorg/rebuild and stats update only takes 44 minutes. Of course, as things grow we’ll need to modify these behaviors, but for now it’s working well :)

  2. [...] Update Statistics Before or After Index Maintenance? Colin Stasiuk shows the answer with scripts you can run yourself.  After you’ve learned the right answer, go check your maintenance plans and put your tasks in the right order. [...]

  3. [...] Update Statistics Before or After an Index Rebuild? Colin Stasiuk talks about when you should update stats in relation to rebuilding/reorganizing indexes. I’m not just linking to this because I was the catalyst for his blog post, but because there are some great things in here and I learned a lot from it. [...]

  4. Amar Pogaku says:

    Colin,

    Thank you for clearly demonstrating your point with T-SQL. It’s always nice to visually affirm what you already understand theoretically :) However, I’d like to add a few caveats, which are worth considering when you are talking about maintaining the health of very large tables (1-4 B rows) or a database with dozens of large tables:

    1. Rebuilding (even with the Online option) can be very expensive overall if you have a very large (over 1TB) OLTP database that is often busy processing lot of transactions. Multiple times we noticed significant CPU spikes (taking 20-50% more) and some blocking in our production environment. Yes, it can happen even on a dedicated SQL 2005 server with 16 procs and 64 GB RAM. But you got options: use Reorganize instead. Although this method is not as effective a defrag tool and may possibly take longer to finish (depending on the amount of DML activity), you can at the least not worry about disrupting users.

    2. Also, when you are rebuilding indexes consider using the MAXDOP and SORT_IN_TEMPDB options to control the load on the server. I usually assign half the procs for index rebuilding but if you have a long slow period you might want to use all the CPUs and get done with that much faster.

    3. Another important consideration, which seems to be one of the less understood options is the STATISTICS_NORECOMPUTE flag. Part of the problem seems to be Microsoft’s wording itself and how it interconnects with DB options. However, this is an important option deserving special attention. But first a short detour: From what I gather most DBAs leave the AUTO_UPDATE_STATISTICS database option to the default setting, which is ON but in our experience we learned that this is not very desirable. (Hold back your knives dissenters, I will tell you why it might be better to turn it off in just a bit!). In our heavily transactional databases we noticed that the query plans chosen by the SQL engine are not always consistent or the best possible. On some of our frequently called procs the logical reads were varying a lot in PRD servers compared to DEV servers. But after reviewing the query plan in PRD we decided to tweak the procs using the query hint FORCE ORDER, which preserves the join order during query optimization. We noticed significant performance gains with this change. By using this hint we are asking the query optimizer to take it easy and not mess with the join order specified in the proc. In other words, behind the scenes the query optimizer is no longer issuing update stats on the fly when it judges the existing stats to be out of date. So, what’s my point? Turning auto stats off ain’t as bad as it sounds because the optimizer’s decision is well, not always the most intelligent. However, bear in mind that turning it OFF means some additional diligence on the DBA’s part because now you got to take care of updating stats. You will need a stats update job running few times a week and particularly soon after your index defrag job or after major DML activity. And that’s where STATISTICS_NORECOMPUTE comes into picture! If you choose to keep AUTO_UPDATE_STATISTICS turned ON but don’t want query optimizer altering the query plan then set STATISTICS_NORECOMPUTE to ON. By setting STATISTICS_NORECOMPUTE to ON you are basically marking these tables to be excluded from the auto update stats at the DB level. This is OK because your scheduled update stats job will take care of it during off hours.

    4. If you are Reorganizing your indexes (instead of Rebuilding) make sure you follow it up with either sp_updatestats or UPDATE STATISTICS because Reorganize unlike Rebuild doesn’t automatically update your index statistics. Take note that Alter Index will never update column stats.

    5. Finally, a note on updating stats. If you are using the highly customizable UPDATE STATISTICS method instead of sp_updatestats, which offers limited functionality you’d want to use the NORECOMPUTE option to ensure that you are not resetting auto update stats on these tables. (Same idea as STATISTICS_NORECOMPUTE option in ALTER INDEX.)

  5. Amar

    Great comments and I agree with alot of the points you brought up. My post was not an “all encompassing” how-to for EVERY environment so I hope it didn’t come across as such.

    I’ve also worked in very large environments where having auto update stats on was not the best option so you wont get any knives from me there :) hehehe but great add-on to that about the added responsibility of the DBAs when such a choice is made.

    You have some great thoughts and experiences and I think they should be shared… you should really think about putting an article together on very large tables and the “gotchas” around index rebuilds and statistics maintenance.

    Thanks again for the comment!

  6. Amar Pogaku says:

    Colin,

    I am glad you liked my comments. As you graciously suggested, I should have considered posting the caveats as an independent article instead of barging in on your post, which was obviously meant to simply prove an idea around indexes and stats!

    I am looking forward to your next research findings :)

  7. One point not addressed that might be of concern, if you’ve done a full rebuild of the index, and it uses a full scan to update the statistics, what happens if you run sp_updatestats and it decides to update the statistics based on the normal, sampled, approach? You can get good statistics turned bad. From a safety standpoint, if you’ve just rebuilt an index, best not to also update it’s statistics.

  8. Ross Mistry says:

    I was told that there isn’t a need to update statistics if you just rebuilt all of your indexes as the statistics are automatically updated. Is this True or False?

    Ross

  9. Ross – rebuilding (not reorganizing) your indexes will update ONLY the statistics for those indexes… but if you have other statistics then those stats will NOT get updated as part of your index rebuild.

    Also as Grant mentioned you’ll want to be cautious on how you update your statistics AFTER an index rebuild has taken place… cause if your sample size is smaller then a full scan then you might actually “get good statistics turned bad”

    Hopefully this helps and I’ve re-ran my code/test shown above again just to be sure but that “demo” should give you a hands-on solution to play with.

    Thanks!!

  10. BradC says:

    So for a SQL 2000 server, does this advice still apply?
    Since sp_updatestats appears to have really updated *all* stats in SQL 2000, I wouldn’t want to overwrite the “good” stats with bad ones.

    So should I do the sp_updatestats *before* the reindex on sql2000?

  11. Jay says:

    I did some testing on this, and sp_updatestats doesn’t appear to work the exact way I imagined.
    I deleted all the records where id > 2000 then rebuilt the index. I then ran sp_updatestats thinking that the stats for stats_NonIndexedData would have been updated, but i got the message below.
    [stats_NonIndexedData], update is not necessary…

    This is an example where obviously the stats should have been updated, but weren’t.

  12. Jay

    I ran the following:

    Delete from dbo.tblIndexStatsMaintenance where [ID] > 2000

    then ran:

    DBCC SHOW_STATISTICS (‘tblIndexStatsMaintenance’, stats_NonIndexedData) WITH STAT_HEADER

    which showed no change… (as expected) then ran:

    EXEC sp_updatestats

    and when I reran the SHOW_STATISTICS I saw the statistic was updated as expected.

    I re-read your comment… and I missed rebuilding the index after the deletion of the records… I re-ran to match your use case exactly and low and behold I saw the same thing you saw.

    Very weird and I can’t explain right now as to why this is. I re-ran the usecase both ways a few times and got the same results each time.

    Hopefully someone else on the thread will be able to shed some light on this one. I’ll also throw it back out there to Twitter to see if we can get some thoughts on why this would happen.

    Thanks for the comment!!

Leave a Reply

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.