Archive for the ‘ Partitioning ’ Category

Index Aligned Partitions and Resolution

Most of us are aware of partitioning in SQL Server.  It was introduced with SQL Server 2005 and then further enhanced in later releases.  The value of partitioning a table can vary.  Some use it as a means of archiving off older data, others use it for the sake of performance gains, either way, there are some important things to keep in mind when one partitions a table.

One of those items brings us to the point of the post, aligning your non-clustered indexes on the partition scheme.  This is an important rule and more often than not it should be followed.

One of the reasons that it’s important is if you ever want to switch partitions, all of the indexes associated to that object must be aligned.  ref: http://msdn.microsoft.com/en-us/library/ms187526.aspx

Another reason is around maintenance.  If the indexes are partitioned then only the active partition (based on your partitioning strategy) needs to be maintained (re-org’d).  For larger indexes, it can become prohibitive to re-build these when they become fragmented.

Probably the best reason to have your indexes aligned is that the amount of reads to satisfy your queries will significantly drop when that index is used.

I recently came across a scenario where I found that there were a couple of tables where the indexes weren’t aligned with the partition scheme.  Instead of going through every partitioned table in the database, I wrote a quick query that would tell me which objects had indexes that weren’t created on the partition scheme for the table.

This query will also show you the usage of the indexes which can be interesting as well.  Without further ado, here’s the query:

SELECT Db_name(database_id) AS dbname, 
       o.name               object_name, 
       i.name               index_name, 
       i.type_desc, 
       ds.name, 
       ds.type_desc, 
       s.user_seeks, 
       s.user_scans, 
       s.user_lookups, 
       s.user_updates, 
       s.last_user_seek, 
       s.last_user_update 
FROM   sys.objects AS o 
       JOIN sys.indexes AS i 
         ON o.object_id = i.object_id 
       JOIN sys.data_spaces ds 
         ON ds.data_space_id = i.data_space_id 
       LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s 
         ON i.object_id = s.object_id 
            AND i.index_id = s.index_id 
WHERE  o.TYPE = 'u' 
       AND i.Type in (1, 2) 
       AND o.name IN (SELECT a.name 
                      FROM   (SELECT ob.name, 
                                     ds.type_desc 
                              FROM   sys.objects ob 
                                     JOIN sys.indexes ind 
                                       ON ind.object_id = ob.object_id 
                                     JOIN sys.data_spaces ds 
                                       ON ds.data_space_id = ind.data_space_id 
                              GROUP  BY ob.name, 
                                        ds.type_desc) a 
                      GROUP  BY a.name 
                      HAVING COUNT(*) > 1) 
ORDER  BY object_name DESC  

That’s fun stuff… But then what happens when you actually want to FIX them?  If there are a lot of them, it makes for a fairly tedious task.  The code below will help.  Be absolutely certain that you check the options chosen.  For my database, I used MAXDOP = 32, ONLINE = ON, etc… Ensure that the the settings are appropriate for your environmnent… enjoy!

SET NOCOUNT ON

SELECT db_name(database_id) as DBName,o.name Object_Name,		
        i.name Index_name, 		
       i.Type_Desc as IDX_typeDesc ,DS.name , ds.type_desc
		,sch.name + '.'+o.name as SchemaName_TableName, CAST(NULL as varchar(128)) as partitonOrdinal
into #tmp1
 FROM sys.objects AS o		
      JOIN sys.indexes AS i		
  ON o.object_id = i.object_id 		
	JOIN sys.schemas sch on sch.schema_id = o.schema_id
	JOIN sys.data_spaces DS on DS.data_space_id = i.data_space_id		
	LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s ON i.object_id = s.object_id AND i.index_id = s.index_id		

 WHERE  o.type = 'u'		
   AND i.type IN (1, 2) -- Clustered and Non-Clustered indexes
   AND O.NAME in 		
	(	
	SELECT a.name from 		
		(SELECT OB.NAME, ds.type_desc 
		 FROM sys.objects OB 
			JOIN sys.indexes ind on ind.object_id = ob.object_id 
			JOIN sys.data_spaces ds on ds.data_space_id = ind.data_space_id		
		GROUP BY ob.name, ds.type_desc ) a 
  group by a.name having COUNT (*) > 1
	)	
order by Object_Name desc;

update a
	set partitonOrdinal = c.name 
	FROM #tmp1 A JOIN 
	  sys.tables          t ON A.Object_Name = t.name  
	join sys.schemas s on s.schema_id = t.schema_id
	join  sys.indexes         i 
		  on(i.object_id = t.object_id 
		  and i.index_id < 2)
	join  sys.index_columns  ic 
		  on(ic.partition_ordinal > 0 
		  and ic.index_id = i.index_id and ic.object_id = t.object_id)
	join  sys.columns         c 
		  on(c.object_id = ic.object_id 
		  and c.column_id = ic.column_id)
	WHERE I.name = a.Index_name and a.IDX_typeDesc = 'clustered'    ;    

update a
	set name = b.name, partitonOrdinal = b.partitonOrdinal
	FROM #tmp1 a join #tmp1 b on a.object_name = b.object_name where b.idx_typedesc = 'clustered';



WITH index_cte (table_name, index_name, type_desc, index_columns, index_included_columns,
filter_definition, is_unique, is_primary_key, is_unique_constraint,fill_factor,
is_padded, allow_row_locks, allow_page_locks, ignore_dup_key, data_compression_desc,
no_recompute, partitionname, partitonOrdinal)
as(
select sch.name+'.'+obj.name [table_name], i.name [index_name], i.type_desc,
REPLACE(
(select '['+col.name+'] '+
	CASE WHEN k.is_descending_key = 1 THEN 'DESC'
	ELSE 'ASC'
	END as [data()]
from sys.index_columns as k
inner join sys.columns col
		on k.object_id=col.object_id and k.column_id=col.column_id
	where k.object_id = i.object_id
	and k.index_id = i.index_id	and k.index_id<>0 and k.is_included_column<>1
	order by key_ordinal, k.column_id
for xml path(''))
,'C [','C, [') as [index_columns],
isnull(
REPLACE(
	(select '['+col.name+']' as [data()]
from sys.index_columns as k
inner join sys.columns col
		on k.object_id=col.object_id and k.column_id=col.column_id
	where k.object_id = i.object_id	and k.index_id = i.index_id and k.index_id<>0 and k.is_included_column<>0
	order by key_ordinal, k.column_id

for xml path(''))
,'] [','], [')
,'') as [index_included_columns],

	ISNULL(i.filter_definition,'') as filter_definition,
	is_unique, is_primary_key, is_unique_constraint, fill_factor, is_padded, allow_row_locks, allow_page_locks, ignore_dup_key, 
	data_compression_desc, no_recompute, a.name, a.partitonOrdinal 
from sys.indexes as i
inner join sys.objects obj
	on i.object_id=obj.object_id
inner join sys.schemas sch
	on obj.schema_id=sch.schema_id
inner join sys.partitions part
	on i.object_id=part.object_id
	and i.index_id=part.index_id
inner join sys.stats stats
	on i.object_id=stats.object_id
	and i.index_id=stats.stats_id

JOIN #tmp1 a ON a.index_name = i.name AND a.object_name = obj.name and a.type_desc = 'rows_filegroup'

where i.name is not null and obj.type<>'S' and sch.name<>'sys' and i.index_id>1
)
SELECT
'CREATE ' +
CASE is_unique
	WHEN 1 THEN 'UNIQUE '
	ELSE ''
END
+ index_cte.type_desc +' INDEX ' + index_cte.index_name COLLATE DATABASE_DEFAULT +' ON ' +table_name +' ('+index_columns+')'+
CASE index_included_columns
	WHEN '' THEN ''
	ELSE ' INCLUDE (' + index_included_columns + ') '
END+
CASE filter_definition
	WHEN '' THEN ''
	ELSE ' WHERE ' + filter_definition
END+
' 
	WITH ('+
	'FILLFACTOR = ' + CASE fill_factor WHEN 0 THEN '100' ELSE CAST(fill_factor as varchar) END+
	', PAD_INDEX = ' + CASE is_padded WHEN 1 THEN 'ON' ELSE 'OFF' END+
	', ALLOW_PAGE_LOCKS = ' + CASE allow_page_locks WHEN 1 THEN 'ON' ELSE 'OFF' END+
	', ALLOW_ROW_LOCKS = ' + CASE allow_row_locks WHEN 1 THEN 'ON' ELSE 'OFF' END+
	', IGNORE_DUP_KEY = ' + CASE ignore_dup_key WHEN 1 THEN 'ON' ELSE 'OFF' END+
	', STATISTICS_NORECOMPUTE = ' + CASE no_recompute WHEN 1 THEN 'ON' ELSE 'OFF' END+
	', DATA_COMPRESSION = ' + data_compression_desc +
	', SORT_IN_TEMPDB = ON'+
	', ONLINE = ON'+
	', MAXDOP = 32'+
	', DROP_EXISTING = ON'+
	')' + '
	 ON  ' +  partitionname + ' (' + partitonOrdinal + ')
 GO'
as 'USE Database statement goes here (TNW)'
from index_cte
 order by table_name, index_cte.type_desc, index_cte.index_name

drop table #tmp1

Statistics for partitions

As data volumes continue to expand and partitioning is further adopted in SQL Server 2008 the lack of control over statistics at the partition level will become a larger issue for more companies.

I placed this as a connect item on Microsoft’s site in June of 2009.  Per the final comment from Microsoft, they recommend that the more votes it has, the higher priority it will become.

If you are reading this post, please vote, here:  https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level

I’m posting the connect details below. 

I’d like to see Update Statistics modified to include partitionID in combination with the existing table or indexed view name.

With the improvements of 2k8 and partitioning, very large result sets are being seen at one of my clients that utilizes partitioning. Having to update statistics against partitions that haven’t changed in order to achieve the change on the partition that actually needs it, creates additional overhead that is not required. It also forces a change to plans that wouldn’t otherwise be altered which causes for memory churn in the proc cache.

One other related note, is that you can defrag an individual partition index, but it will not update the statistics.

Proposed Solution
Two parts:

1.)
Modify Update Statistics to include an option for declaring either a partition name or partitionID for either a table or an indexed view.
i.e…
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name }
{ index_or_statistics_name }
| ( { Partition_index_or_partition_statistics_name } [ ,...n ] )
}
]

2.) Upon completion of the defrag of an individual partition index have it update the statistics for that partition.

 
Benefits
Improved Reliability
Improved Administration
Improved Performance
Other (please provides details below)

Update Statistics for a partition

Last week, I was working with a client and we wanted to update statistics on a partition. After trying it out, then consulting with BOL and finally twitter, I determined that it could not be done. Thanks to Lara Rubbelke for her help and pointing out the second issue that is discussed below.

In addition to that problem, when a partition’s index is defragmented, statistics are not updated. I suppose an easy workaround is to update statistics, but the problem is that it updates statistics for all of the partitions and/or indexes of that object, not the specific one that was just defragmented.

So, I created a suggestion @connect.microsoft.com. I’ll post the entire subject / issue below, but please visit the site, vote and validate this request.

This is an https:// link, so if you aren’t logged into connect it will not work. That said, if you don’t have a login for this site, I’d recommend that you get one. Tons of great information can be found there. If you wish to search for it, go to connect.microsoft.com and type in “Update Statistics at the partition level” OR search for feedbackid 468517.  https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=468517

Full description and proposed solution below:

Description:
I’d like to see Update Statistics modified to include partitionID in combination with the existing table or indexed view name.

With the improvements of 2k8 and partitioning, very large result sets are being seen at one of my clients that utilizes partitioning. Having to update statistics against partitions that haven’t changed in order to achieve the change on the partition that actually needs it, creates additional overhead that is not required. It also forces a change to plans that wouldn’t otherwise be altered which causes for memory churn in the proc cache.

One other related note is that you can defrag an individual partition index, but it will not update the statistics.

Proposed Solution
Two parts:
1.)
Modify Update Statistics to include an option for declaring either a partition name or partitionID for either a table or an indexed view.
i.e…
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name }
{ index_or_statistics_name }
| ( { Partition_index_or_partition_statistics_name } [ ,...n ] )
}
]
2.) Upon completion of the defrag of an individual partition index have it update the statistics for that partition.

Benefits
Improved Reliability
Improved Administration
Improved Performance

Partitioning

Over the past few years I have implemented partitioning a number of ways.  In the world of SQL Server 2000, it was “home-made”.  Meaning that I would create a number of tables whose schema was identical, create check constraints and then link all of the data together with views.  This method had tremendous upside, but as you can imagine, the overhead ov moving data from one “partition” to another was very time intensive and expensive from an I/O perspective.

With SQL Server 2005, Microsoft introduced partitioning as part of the relation engine… This was huge news for me specifically.  I could actually do this “Right” for my clients.  As many of you know, paritioning has some limitations in SQL Server 2005 that made a great option, but the limitation of parallelism that accompanied the feature left for a bad taste in the mouths of many.

This brings us to SQL Server 2008 where, based on my testing, the parrallism issue is truly resolved and the optimizer is actually recoqnizing the partitions as it should. 

So, what have I done?  I’ve implemented it all over the place.  That is the point of this post.  At one client, it’s worked just as I imagined that it would… at another, I’ve had a bit of stumbling block.  I did not realize the impact that non-clustered indexes would have if they weren’t created with the partition.  So what I ended up with, over time were very, very, very heavily fragmented indexes.  In fact, some of them were so bad that the optimizer was choosing different paths to the data.  (That’s B.A.D.).

I thought my fix would be a tough one.  But it turned out to be uber simple.  I dropped the indexes that existed on the primary filegroup and re-created them on the partition.  Viola, piece of cake.  Everything was awesome, until about 2 weeks later when the fragmentation of those indexes caught up with me.  This left me cursing the amount of time it was going to take to rebuild these huge indexes and wondering how often I’d have to do it…

So, I stopped trying to do it my way and went to my favorite resource.  BOL.  Of course, there is a way to do this and it’s not painful.

It’s the function $PARTITION.  If you are using partitioning in your environment and you are unaware of this function, become aware. 

What this function enabled, in a very easy and straight foward manner was for me to quickly and dynamically identify which partition was the active partition.  Once I had that information, I simply wrote an alter index statement with rebuild where the partition equaled my variable value.

Code snippet below.

Declare @PartitionID INT

Select @PartitionID  =  (SELECT $PARTITION.PartitionName (getdate()));

ALTER INDEX IndexName ON owner.TableName

REBUILD Partition = @PartitionID;

    –  Note that I can utilize the getdate function with the $Partition function.  Since my partition function (key) is a datetime value.  Simple logic could be used if it were an int.

The obvious advantage to this is that when I run the rebuild index script, it is only running for the active partition.  This brings down the amount of time required for this action and enables me to run it nightly as the data becomes very fragmented from day to day in the current partition.

So, my lesson for this day?

Easy, think ahead about what will happen down the road.  Always check BOL and test, test, test. 

Happy Partitioning!