Database indexes

547 views
Skip to first unread message

Andy Butland

unread,
Aug 7, 2013, 5:02:37 PM8/7/13
to umbra...@googlegroups.com
Having had a look through the indexing on the Umbraco database, there look to be a quite a lot of fields that I would have thought might be indexed but aren't. 

Not being particularly expert in this area, I've tended to get by with some rules of thumb for deciding when to put indexes on fields in a database, generally:
  • Put an index on all fields used as foreign keys and hence likely to be involved in join operations
  • Put an index on all fields likely to be used in a WHERE criteria
  • Put an index on all fields likely to be used in ORDER BY expressions
  • Not put additional indexes on any tables that should be optimised for write operations
 I've attached a list of the ones that seem to fall into these categories but aren't indexed.

And so I was wondering what peoples thoughts are on this?  

Is it the case that the guidelines I've been using above are a bit simplistic, and that actually the indexing selections have been already analysed when the schema was created, and are considered pretty optimal already?  Or is it actually that this has been overlooked a bit, and that it would make sense to have more indexes added?

My assumption would be that other than perhaps the umbracoLog table, it would be a rare Umbraco system that wouldn't want the other tables optimised for reads, and hence for most sites such additional indexes would do no harm, and would possibly have some performance benefits.
non-indexed fields.txt

Morten Christensen

unread,
Aug 8, 2013, 3:50:17 AM8/8/13
to umbra...@googlegroups.com
Hi Andy,

This is definitely something we are looking into for the upcoming versions. It was discussed during one of the open space sessions at CodeGarden 13, which gave us some suggestions as there the main pain points might be.
Also, Mikkel Johansen posted a sql server query, on OUR umbraco, that can be used to generate stats and suggestions for indexes on running Umbraco sites - see last comment here: http://our.umbraco.org/forum/developers/extending-umbraco/42361-This-is-a-bit-dirtybut-I-need-to-mess-with-the-database
We are very interested in getting feedback from people that have LIVE 6.x sites with a lot of activity, both with regards to visitors and content editiing, that are able to run this script against the database. Once we have some data we can start analyzing where indexes would benefit our db schema.

- Morten Christensen


--
You received this message because you are subscribed to the Google Groups "Umbraco development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to umbraco-dev...@googlegroups.com.
To post to this group, send email to umbra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/umbraco-dev/f2ea93b3-daed-4050-b0f3-44d11aa722ad%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Andy Butland

unread,
Aug 12, 2013, 3:35:35 PM8/12/13
to umbra...@googlegroups.com
Makes sense Morten.  Here's one result we have - thanks Ali.  Just reports one amend but this isn't a particularly high traffic site:

CREATE INDEX [mi_0B580441] ON [db].[dbo].[cmsDocument] ([published]) INCLUDE ([versionId])

Jeavon Leopold

unread,
Aug 13, 2013, 6:33:08 AM8/13/13
to umbra...@googlegroups.com
We don't have any very high activity sites running v6 yet, however the attached is from a average usage site, hope it's helpful.

I guess running this on a v4 db is not useful?
v6SiteResults.xlsx

Andy Butland

unread,
Aug 27, 2013, 5:14:17 AM8/27/13
to umbra...@googlegroups.com
I've attached another result - looks to be a bit more useful data in this one.
indexes-report.xlsx

Andy Butland

unread,
Aug 27, 2013, 5:27:21 AM8/27/13
to umbra...@googlegroups.com
And one more.
Results.csv

Jeffrey Schoemaker

unread,
Oct 1, 2013, 8:16:08 AM10/1/13
to umbra...@googlegroups.com
Hi Morten,

attached an Excel with two tabs for two different Umbraco 6 sites. Especially the first tab contains data about a pretty high activity site (both visitors as content editing).

Hope it helps you out!

Jeffrey
Indexes.xlsx

David Lupat

unread,
Mar 12, 2014, 7:07:16 PM3/12/14
to umbra...@googlegroups.com
Hi Morten,

Attached are data from two high activity sites in a single Umbraco 6.1.6 installation. 

Dave.
indexes.xls

karl...@gmail.com

unread,
Mar 12, 2014, 8:34:57 PM3/12/14
to umbra...@googlegroups.com
And another:

 improvement_measure_pct table create_index_statement
8954.352601 [cmsContentVersion] CREATE INDEX [mi_5631BB75] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId])
8261.388753 [cmsContentVersion] CREATE INDEX [mi_92E10D5C] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId]) INCLUDE ([id], [VersionId], [VersionDate], [LanguageLocale])
760.903086 [umbracoNode] CREATE INDEX [mi_5C3F4893] ON [thehappiesthour].[dbo].[umbracoNode] ([uniqueID])
275.2460914 [umbracoNode] CREATE INDEX [mi_11036BC3] ON [thehappiesthour].[dbo].[umbracoNode] ([nodeObjectType]) INCLUDE ([path])
176.4036591 [cmsDocument] CREATE INDEX [mi_71E86973] ON [thehappiesthour].[dbo].[cmsDocument] ([newest]) INCLUDE ([nodeId], [versionId])
161.0495198 [cmsDocument] CREATE INDEX [mi_EB7DD938] ON [thehappiesthour].[dbo].[cmsDocument] ([published]) INCLUDE ([nodeId], [versionId])

Shannon Deminick

unread,
Mar 13, 2014, 6:51:08 AM3/13/14
to umbra...@googlegroups.com, karl...@gmail.com
Would be interesting to see performance gains from indexes that don't utilize INCLUDE. We cannot create 'INCLUDE' indexes in Umbraco currently since that is only supported in MSSQL 2012+ (AFAIK)

Also, what is 'improvement_measure' telling us?

Gareth Evans

unread,
Mar 13, 2014, 7:03:45 PM3/13/14
to umbra...@googlegroups.com
Are there any stats on mysql vs microsoft sql database engines for Umbraco? I am not suggesting we drop support though.

I think that having INCLUDE would be a performance benefit for Microsoft SQL, and AFAIK it's supported all the way back to SQL2005
(for the non technical among us, it's marking extra columns to include with the index so the query can be served directly without touching the underlying db table via an index lookup - e.g. covering index - it would make the biggest difference on very large tables or very wide tables where the data is spread between multiple pages per row)

I think "improvement measure" is an internal metric about how much MSSQL thinks this index will improve the performance. I am not sure what unit it is in.

Can we not create different indexes per DB engine? e.g. generate a different create index statement for each engine? 
I would think each engine would need different indexes anyway to provide the best performance, an index that SQL server needs may not be required for mysql performance and vice versa since the query planner is completely different.


Gareth



On Thu, Mar 13, 2014 at 11:51 PM, Shannon Deminick <sdem...@gmail.com> wrote:
Would be interesting to see performance gains from indexes that don't utilize INCLUDE. We cannot create 'INCLUDE' indexes in Umbraco currently since that is only supported in MSSQL 2012+ (AFAIK)

Also, what is 'improvement_measure' telling us?

--
You received this message because you are subscribed to the Google Groups "Umbraco development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to umbraco-dev...@googlegroups.com.
To post to this group, send email to umbra...@googlegroups.com.

Shannon Deminick

unread,
Mar 13, 2014, 7:15:22 PM3/13/14
to umbra...@googlegroups.com
Oh cool, i was under the impression it was a far newer sql server feature, if it's 2005 than no problems there and yes of course we can create different index types per db type.
I'd like to ensure we don't go over-creating indexes, having too many is equally as bad as having too few since inserts and updates will be worse for performance and the db file size will grow quite a lot.

What I'd love to know though is that some have mentioned adding indexes with the INCLUDE keyword on already indexed columns. For example:
CREATE INDEX [mi_EB7DD938] ON [thehappiesthour].[dbo].[cmsDocument] ([published]) INCLUDE ([nodeId],  [versionId]) 

cmsDocument.published already has an index on that column, I would assume the performance benefits of adding yet-another-index to an already indexed column would be fairly moot. This same point goes for cmsDocument.newest. Also in the above list it mentions adding 2 indexes to the same column:
8954.352601        [cmsContentVersion]        CREATE INDEX [mi_5631BB75] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId]) 
8261.388753        [cmsContentVersion]        CREATE INDEX [mi_92E10D5C] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId]) INCLUDE ([id],  [VersionId],  [VersionDate],  [LanguageLocale]) 

The good news is that 6.2 adds this index: cmsContentVersion.ContentId but not with the INCLUDE statement. I wonder if the normal index got added what the tool that is generating these recommendations would then say - would it still recommend adding the 2nd index to that column with INCLUDE?

We'll have to explore some of these things in future versions, far too late in the game to look at implementing some of these for 7.1/6.2 but perhaps we can include some in the next minor versions.



Shannon Deminick

unread,
Mar 13, 2014, 7:21:24 PM3/13/14
to umbra...@googlegroups.com
Also note that in the previous report it mentions that umbracoNode.parentId should have an index on it for a huge perf increase - this index has existed since 6.0

David Lupat

unread,
Mar 13, 2014, 8:13:43 PM3/13/14
to umbra...@googlegroups.com
Sorry, accidentally hit "reply to author"

This is what I wrote before:
Shannon, could it be that upgrading from an older version to v6+ doesn't automatically create these indexes? I never manually checked whether there are new indexes in every version upgrade and I doubt many people do. Shouldn't it be handled by the installer / update script though?

Shannon Deminick

unread,
Mar 13, 2014, 8:22:41 PM3/13/14
to umbra...@googlegroups.com
db indexes are automatically added when we have them in a migration script. That said, if the umbracoNode.parentId index did not exist before 6.0 (not sure), there is no migration script to add that particular index for 6.0


Gareth Evans

unread,
Mar 13, 2014, 10:05:13 PM3/13/14
to umbra...@googlegroups.com
Also I think it's worth noting that without up-to-date statistics, SQL server won't always use the index since it doesn't have current statistics on whether the index will be useful or not.
SQL 2005 will automatically build those statistics when the index is created but won't keep them up to date unless AUTO_UPDATE_STATISTICS (database value) is ON [default = ON]
Additionally, it will wait for the statistics update before returning the query result unless AUTO_UPDATE_STATISTICS_ASYNC is ON [default = OFF] (when it will use out of date statistics but begin an asynchronous update)
Detail here (2005) http://technet.microsoft.com/en-us/library/ms190397(v=sql.90).aspx
So if the index was added some time ago, when the umbraco DB was first created before it was populated, and then all the data has been added via admin, if the statistics haven't been updated yet, it may still not use the index.
Apparently sql server is pretty smart about whether it should initiate a statistics rebuild but it's something to bear in mind if you've just imported a lot of data.
Of course, this is moot if you have an actual DBA who is looking after your SQL databases and ensuring things like indexes are being rebuilt and statistics updated regularly via maintenance plans.

>What I'd love to know though is that some have mentioned adding indexes with the INCLUDE keyword on already indexed columns. For example:
>CREATE INDEX [mi_EB7DD938] ON [thehappiesthour].[dbo].[cmsDocument] ([published]) INCLUDE ([nodeId],  [versionId]) 

My understanding (disclaimer: I am a developer, not a DBA) is the predicate (e.g. WHERE or ON clause) is used to identify if the index will be beneficial, via the actual columns used - in this case published. Then, if the entire select column list is in the INCLUDE then it's a covering index and the query results can be satisfied from the index alone.
If you have a column which is not in the INCLUDE in your select, then SQL will not use the covering index for the data and instead treat it as a normal index.
You of course don't want all the columns in the INCLUDE since that is redundant, it would be a duplicate of data, adding overhead and you might as well go back to the raw rows.
So that query there would improve: select nodeId,versionId from cmsDocument where published = 1 
It would also improve select id, nodeId, versionId from cmsDocument where published = 1 but the INCLUDE aspect wouldn't be any advantage, it would go back to the main db rows on disk/cache/memory to get the data for the columns.

>cmsDocument.published already has an index on that column, I would assume the performance benefits of adding yet-another-index to an already indexed column would be fairly moot. This same >point goes for cmsDocument.newest. Also in the above list it mentions adding 2 indexes to the same column:
>8954.352601        [cmsContentVersion]        CREATE INDEX [mi_5631BB75] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId]) 
>8261.388753        [cmsContentVersion]        CREATE INDEX [mi_92E10D5C] ON [thehappiesthour].[dbo].[cmsContentVersion] ([ContentId]) INCLUDE ([id],  [VersionId],  [VersionDate], [LanguageLocale]) 

This is likely because the query set analysed had multiple queries with the same predicate but a different set of output columns. It really depends on how much data is in the base table, if it's a LOT then it will make a bigger difference because by having two indexes, you have two potential queries that can be satisfied by the covering index, rather than both queries having to back to the table row pages on disk.

Gareth



On Fri, Mar 14, 2014 at 1:22 PM, Shannon Deminick <sdem...@gmail.com> wrote:
db indexes are automatically added when we have them in a migration script. That said, if the umbracoNode.parentId index did not exist before 6.0 (not sure), there is no migration script to add that particular index for 6.0


--
You received this message because you are subscribed to the Google Groups "Umbraco development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to umbraco-dev...@googlegroups.com.
To post to this group, send email to umbra...@googlegroups.com.

Karl Kopp

unread,
Mar 16, 2014, 7:28:24 PM3/16/14
to Shannon Deminick, umbra...@googlegroups.com
The script was pulled from here:


'improvement_measure' = 


migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure_pct


On Thu, Mar 13, 2014 at 9:51 PM, Shannon Deminick <sdem...@gmail.com> wrote:
Would be interesting to see performance gains from indexes that don't utilize INCLUDE. We cannot create 'INCLUDE' indexes in Umbraco currently since that is only supported in MSSQL 2012+ (AFAIK)

Also, what is 'improvement_measure' telling us?



--
Karl Kopp

jes.m...@gmail.com

unread,
Mar 10, 2015, 9:00:10 AM3/10/15
to umbra...@googlegroups.com
Hi,

I am new to Umbraco but have already seen a lot of badly optimized and not optimized Umbraco databases. I have optimized big databases and my starting point is always to give all tables a primary-key, to give all tables a Unique key and always make sure that all foreign-keys are - as pointed out by Andy - supported by an index. Below you find a script to create missing foreign-key indexes.

Naming conventions are usefull when trying to get an overview of the database. A common naming convention: pk_<tablename>, uk_<tablename>, fk_<tablename>_<foreignkeytablename>.

On small databases like (most) Umbraco installations the more indexes the happier? I dont recommend using scripts using information about missing indexes or Database Engine Tuning Advisor on large databases. They generate too many indexes. At least, make sure to have the foreing-key indexes in place first.

Jes

IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1

IF OBJECT_ID('tempdb..#FKTable') IS NOT NULL DROP TABLE #FKTable

--Create index temp table
CREATE TABLE #t1
(
do integer default(0),
index_name varchar(100),
index_descrip varchar(200),
index_keys varchar(200),
table_name varchar(100))

--Create FK temp table
CREATE TABLE #FKTable
(
fk_name varchar(100),
fk_keys varchar(200),
fk_keyno int,
table_name varchar(100))

--Collect and uppdate all index info
EXEC sp_msforeachtable "insert #t1 (index_name, index_descrip, index_keys) exec sp_helpindex '?'; update #t1 set table_name = '?', do = 1 where do = 0"
UPDATE #t1 SET table_name = replace(table_name , '[', '')
UPDATE #t1 set table_name = replace(table_name , ']', '')


--Collect all index info
INSERT INTO #FKTable
SELECT OBJECT_NAME(constid) AS FKName, COL_NAME(fkeyid, fkey) AS FKColumn, keyno,
s.name + '.' + OBJECT_NAME(fkeyid) AS TabName
FROM sysforeignkeys k
JOIN sys.objects c
ON k.constid = c.object_id
JOIN sys.schemas s
ON c.schema_id = s.schema_id


--If FK have two or more columns add them in one row to be able to compare with index columns.

DECLARE @FKName AS VARCHAR(200), @FKColumn as VARCHAR(100)

DECLARE FKCurusor CURSOR FOR
SELECT OBJECT_NAME(constid) AS FKName, COL_NAME(fkeyid, fkey) AS FKColumn
FROM sysforeignkeys k
JOIN sysobjects c
ON k.constid = c.id
WHERE keyno > 1
ORDER BY keyno

DELETE FROM #FKTable WHERE fk_keyno > 1

OPEN FKCurusor
FETCH NEXT FROM FKCurusor INTO @FKName,@FKColumn
WHILE (@@FETCH_STATUS = 0)
BEGIN

UPDATE #FKTable SET
fk_keys = fk_keys + ', ' + @FKColumn
WHERE fk_name = @FKName


FETCH NEXT FROM FKCurusor INTO @FKName,@FKColumn

END

CLOSE FKCurusor
DEALLOCATE FKCurusor
/*
SELECT * FROM #FKTable
ORDER BY table_name

SELECT * FROM #t1
ORDER BY table_name
*/
PRINT '
---------------------------------------------------------------------
FK MISSING Indexes
----------------------------------------------------------------------------
'

SELECT DISTINCT table_name,fk_keys, 'CREATE NONCLUSTERED INDEX [' + fk_name + '] ON ' + table_name + '(' +fk_keys + ' ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]'
FROM #FKTable f1
WHERE NOT EXISTS (
SELECT fk_name
FROM #FKTable f
INNER JOIN #t1 t
ON f.table_name = t.table_name
WHERE (f1.fk_name = f.fk_name
AND fk_keys = index_keys)
OR ( f1.fk_name = f.fk_name
AND fk_keys = SUBSTRING (index_keys, 1 ,
CASE
WHEN CHARINDEX( ',',index_keys)= 0 THEN 0
ELSE CHARINDEX( ',',index_keys) -1
END
)))
Reply all
Reply to author
Forward
0 new messages