A colleague of mine sent me this link, and wanted me to have a look at point 5:
Choose sys_guid() instead of sequences for populating meaningless PK-columns.
Traditional sequences are used for populating an ID-column.
Sequences have several disadvantages:
• Sequences have to be created, maintained and deployed.
• Sequences need PL/SQL-code to be fetched (the ‘select into’ construction is obsolete in 11g, but defaulting a column with sequence.nextval still does not work).
• When you introduce triggers to fetch the .nextval value, you introduce significant undesired performance penalties
• When moving your data from data from one environment to another (e.g. copy PRODUCTION data to ACCEPTANCE), you have to synchronize the .LASTVAL of all sequences.
• Fetching sequences can be a performance-critical in big environment (bulk fetch in Data warehouse for instance) which has to be fixed by increasing the CACHE property
Use guid's to overcome the disadvantages of sequences:
create table employees
( guid raw(32) default sys_guid() not null primary key);
I knew about the SYS_GUID() function, but didn't realize you can use it as a default value for a primary key column. That sure is convenient. Where other RDBMS all have some sort of autonumber/identity column to automatically fill an ID column, Oracle uses sequences, which is a little more cumbersome to implement. However, I found the story not really balanced. Before addressing each point and mentioning some of the disadvantages of SYS_GUID, let's see how it works. Using sequences, a simple setup can be like this:
rwijk@ORA11GR1> create sequence seq cache 1000
Here I create the sequence with "cache 1000", because I want a session to fetch 1000 numbers at once, instead of one each time. This reduces the overhead for Oracle to maintain the SYS.SEQ$ table. Gaps in the numbers might be large this way, but this shouldn't bother you. If it does bother you, you likely shouldn't be using a sequence, as they are never gap-free. A table is created without any reference to the sequence:
rwijk@ORA11GR1> create table t_seq
2 ( id number(9) primary key
3 , filler varchar2(1000)
To fill the id column with the sequence value, I prefer to just use the nextval function directly in my api. Another option is to use database triggers, but I'm on a mission to avoid those whenever possible. The next procedure represents such an api:
rwijk@ORA11GR1> create procedure p_seq
4 insert into t_seq (id,filler)
5 select seq.nextval
6 , lpad('*',1000,'*')
7 from dual
8 connect by level <= 20000
Using the SYS_GUID() function, a similar setup would look like this:
rwijk@ORA11GR1> create table t_sysguid
2 ( id raw(16) default sys_guid() primary key
3 , filler varchar2(1000)
rwijk@ORA11GR1> create procedure p_sysguid
4 insert into t_sysguid (filler)
5 select lpad('*',1000,'*')
6 from dual
7 connect by level <= 20000
One object less (the sequence), and you don't need to bother about the ID column in the insert statement. The mentioned disadvantages of sequences per point:
Sequences have to be created, maintained and deployed
While obviously true, I don't think it is a major point. A sequence never comes alone, so we are already creating, maintaining and deploying database objects. But it adds up a little indeed.
Sequences need PL/SQL-code to be fetched (the ‘select into’ construction is obsolete in 11g, but defaulting a column with sequence.nextval still does not work)
You don't necessarily need PL/SQL to use a sequence, as shown above. So this is untrue. Although it is true that you cannot default a column with sequence.nextval.
When you introduce triggers to fetch the .nextval value, you introduce significant undesired performance penalties
So don't introduce database triggers to fetch the .nextval value :-)
When moving your data from data from one environment to another (e.g. copy PRODUCTION data to ACCEPTANCE), you have to synchronize the .LASTVAL of all sequences
This is true. Very minor disadvantage though, in my opinion. If we refresh the acceptance database, we don't only refresh the data, but also the objects itself.
Fetching sequences can be a performance-critical in big environment (bulk fetch in Data warehouse for instance) which has to be fixed by increasing the CACHE property.
The solution to the possible problem is already presented here.
No advantages of sequences over SYS_GUID are mentioned, so here are some of mine:
A RAW(16) column takes up more space than a NUMBER column.
The SYS_GUID is a RAW(16) taking up 16 bytes, whereas as an ID populated by a sequences typically is a NUMBER(6), NUMBER(9) or NUMBER(12). A reasonable average is NUMBER(9) which takes up only 4 bytes. When you have foreign key columns in your table, this effect is multiplied.
Ad-hoc querying has become more cumbersome.
Which one would you rather type. This:
select id from t_sysguid where id = '48000DA3C76E41A789C7E8925A91E28A'Or this:
select id from t_seq where id = 1234
You can no longer see the order of inserts by the ID column.
When using the cache property and multiple sessions, this is neither necessarily true for sequence populated ID columns, but there is a strong correlation between the number and the time it was populated. This "information" is not there with SYS_GUID() populated columns.
Performance seems to be slightly less when using SYS_GUID()
The results of my test varied a lot. But generally it favoured the sequence approach. Here is the output of one of the more representative tests:
Run1 draaide in 1670 hsecs
Run2 draaide in 2220 hsecs
Run1 draaide in 75.23% van de tijd
Naam Run1 Run2 Verschil
STAT.recursive calls 1,162 49 -1,113
LATCH.enqueue hash chains 1,626 508 -1,118
STAT.hot buffers moved to head of LRU 1,213 75 -1,138
STAT.dirty buffers inspected 5,613 3,266 -2,347
STAT.free buffer requested 6,215 3,482 -2,733
STAT.free buffer inspected 6,678 3,577 -3,101
STAT.redo subscn max counts 4,147 753 -3,394
STAT.heap block compress 3,678 7,119 3,441
LATCH.cache buffer handles 4,922 0 -4,922
STAT.Heap Segment Array Inserts 12,512 7,156 -5,356
STAT.HSC Heap Segment Block Changes 12,582 7,156 -5,426
STAT.calls to get snapshot scn: kcmgss 6,752 152 -6,600
LATCH.checkpoint queue latch 17,879 9,818 -8,061
LATCH.cache buffers lru chain 27,793 13,845 -13,948
LATCH.object queue header operation 40,308 25,913 -14,395
STAT.consistent gets 26,810 7,302 -19,508
STAT.consistent gets from cache 26,810 7,302 -19,508
STAT.consistent gets from cache (fastpath) 19,687 145 -19,542
STAT.redo entries 42,552 71,492 28,940
STAT.db block gets from cache (fastpath) 21,080 57,202 36,122
STAT.session logical reads 98,920 144,900 45,980
LATCH.simulator hash latch 5,752 56,028 50,276
STAT.db block gets from cache 72,110 137,598 65,488
STAT.db block gets 72,110 137,598 65,488
STAT.db block changes 63,379 129,357 65,978
LATCH.sequence cache 150,100 0 -150,100
LATCH.cache buffers chains 396,737 572,383 175,646
STAT.undo change vector size 2,564,000 4,921,872 2,357,872
STAT.physical read total bytes 17,776,640 23,355,392 5,578,752
STAT.physical read bytes 17,776,640 23,355,392 5,578,752
STAT.cell physical IO interconnect bytes 17,776,640 23,355,392 5,578,752
STAT.physical IO disk bytes 17,776,640 23,355,392 5,578,752
STAT.redo size 59,361,376 66,097,504 6,736,128
Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
679,069 707,672 28,603 95.96%
PL/SQL procedure successfully completed.
So while I am certainly not saying no to SYS_GUID(), I thought there was a little more to the story than the all-positive points in the link.
If you know some more advantages or disadvantages from one over the other, please say so in a comment.