Tag Archives: Performance

Tips and techniques for loading large volumes of data into Oracle

– Drop indexes on target table (unless needed to enforce constraints during load, but ideally sort this out as part of the process to extract the data)
– Turn off archive logging (either at database level, or if you have the option just at table level ALTER TABLE xxx_table NOLOGGING
– If target table already exists, use the APPEND hint (INSERT INTO xxx_table /*+ APPEND */ VALUES…)
– Also consider adding a PARALLEL hint if you have enough available resource on the target database (INSERT INTO xxx_table /*+ PARALLEL(xxx_table 8) */
– Remember to collect statistics on the target table once loaded. Best practice in 10/11g is to use DBMS_STATS with default options e.g.exec dbms_stats.gather_table_stats(ownname=>’xxx_owner’,tabname=>’xxx_table’,estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree=>DBMS_STATS.DEFAULT_DEGREE)
– Once target table is loaded, collect statistics and then create indexes. Statistics are automatically generated as part of the index creation process.

BULK COLLECT with LIMIT

Quite often these days I find myself writing update and insert statements to correct data issues we’ve found in our production dataset.

I’m working on one of the largest address databases in the UK and so performance is always at the forefront of my mind when writing these queries. Also, to get these fixes run on production requires downtime and means both the BAU guys and management always want to know “how far along is the data fix?” and “how long has it left to run?”.

With a “traditional” update statement with a commit at the end it is very hard, if not impossible, to answer these questions. You quickly find yourself looking at V$SESSION and associated views trying to work out the relationship between BLOCKS_CHANGED, CONSISTENT_CHANGES and so on.

Add to that the following risks with long running UPDATE and INSERT statements against millions of rows:

  • Rollback space could fill up and cause the statement to fail
  • ORA-30926: unable to get a stable set of rows in the source tables and similar errors can occur
  • If the job fails or has to be killed part way through, rollback time can be typically 1.5 – 2 times longer than the elapsed time before the statement terminated
  • If the job is killed you lose all progress made up to that point

Fortunately we can use the PL/SQL collection construct to get around most of these problems.

The summary process is:
1) Define a cursor that selects all records to be processed
2) Use BULK COLLECT with the LIMIT clause to process the contents of your cursor in manageable chunks

The actions performed in Step 2 will be:
a) Insert/ Update records as required
b) Commit
c) Maintain a progress table

I’m a great fan of seeing working examples, so here’s one:
PROCEDURE BULK_COLLECT_EXAMPLE IS
BEGIN
DECLARE

-- Create a cursor by selecting in all cases to be considered by the data fix
-- Only select the columns you need to be able to run the data fix
-- Use an ORDER BY clause if you want to be able to restart the fix

CURSOR c_data IS
SELECT DISTINCT ROW_ID FROM CONTACT_TBL WHERE FST_NAME = 'Jhon'
ORDER BY ROW_ID;

l_counter number(15) := 0 ;

-- Use l_stop_point to automatically stop the run after processing a set number of records
-- IMPORTANT - only set this in multiples of your LIMIT value set below
-- If you want to process all records in the cursor ensure this is set to -1
l_stop_point number(15) := -1 ;

-- Update l_restart_point if restarting, leave at 0 for a full run
-- IMPORTANT - only set this in multiples of your LIMIT value set below
l_restart_point number(15) := 0 ;

-- Create local arrays for each column you have selected in the cursor above
-- Use the _Table version of the standard SQL datatypes to create these arrays
l_row_id dbms_sql.Varchar2_Table;

l_current_count number(15) := 0 ;

BEGIN
OPEN c_data;
LOOP
FETCH c_data BULK COLLECT INTO l_row_id LIMIT 1000 ; -- Collect into the local arrays declared above.
-- Set the LIMIT to the number of rows to process per COMMIT

l_counter := l_counter + l_row_id.count ;

IF l_counter >= l_restart_point AND (l_stop_point = -1 OR l_counter 0
THEN
INSERT INTO MONITORING_TBL(TABLE_NAME, DATE_STAMP, ROWS_PROCESSED, LAST_KEY) VALUES ('CONTACTS_TBL',SYSDATE, l_counter, l_row_id(l_current_count)) ;
END IF ;
COMMIT ;

END IF ;

EXIT WHEN l_row_id.count = 0 ;
END LOOP ;
IF c_data%isopen then
CLOSE c_data;
END IF ;
END;

END BULK_COLLECT_EXAMPLE;

Performance Tuning Success with Index Organized Tables

Just had a fantastic success improving the performance of an update statement running on a table in an Oracle 10g database.

I’d originally written this update as three seperate SQL statments to be run sequentially.

Script 1 created a table containing the records to be considered for update.
Script 2 created a view that ranks a set of entity relationships and produces one row per row in the table created by script 1 ready for the update
Script 3 joins the view from script 2 and the base table to run the update.

There were only 150,000 records to update, out of a table containing many millions of records.

The explain plan for the update in script 3 was horrendously long with all manner of hash and merge joins, sorts etc. The total optimiser cost of the original query was 66,406 and it had been running for 19 hours without completing.

There were no indexes on the table created in Script 1. I created a non-unique index on the two columns used in the joins within the view created in Script 2 and then ran an explain plan on Script 3 again. The total optimiser cost came down to 40,221.

I restarted the update but wasn’t 100% convinced that I’d brought the run time down enough to get my job completed on time.

As I sat and looked at my update statement I realised I only required 2 columns in my update statement – the target table’s primary key and the value to be updated.

With such a narrow table my first thought was to create an index on both columns so that the update could run entirely from the index without having to force Oracle to read any blocks from the underlying table.

Then I remembered a colleague talking about Index Organized Tables (IOTs). These are eseentially B-Tree indexes where all the table data is stored within the index. A quick read of the Oracle Data Sheet on IOTs explained that they are most useful in improving query performance when accessed directly by their primary key.

So I created a new IOT, then inserted into by selecting the two required columns from my view in Script 2 above. I then modified my update statement to run from the new IOT. When I ran an explain plan on my latest version of the update statement I couldn’t believe my eyes…the cost was 3!!!

As a natural pessimist I thought “Yeah right, the cost might be 3, but I bet the run time is still hours.”

However I barely had chance to view the update through TOAD’s session browser as it completed within 1 minute! 🙂

How to collect statistics on a table in TeraData

This is a two step process.

First you need to understand that statistics are collected at an individual column level in TeraData, so to being you need to determine which columns need to be included.

You can do this by analysing common SQL queries which will be run against your table(s), and can even ask TeraData to tell you which columns should have stats collected.

Once you’ve come up with your list, run the following code to collect statistics on those columns:
COLLECT STATISTICS ON table_name COLUMN ( column_name1, column_name2, ... );

You only need to run the above code once. To perform subsequent stats refreshes you can now just run:
COLLECT STATISTICS table_name

Note: you can also collect statistics on indexes, hash indexes and join indexes.

Get TeraData to tell you which columns need stats collecting

If you have TeraData SQL queries that you run regularly (e.g. as part of a report or ETL process) you’ll want to ensure that those queries are performing as efficiently as possible.

Often, keeping up to date statistics on your tables can be one way to keeping query times on track.

TeraData has a helpful session setting that will give you a listing of all columns that should have stats collected.

Here’s how to use it.

1) Open a new connection in SQL Assistant
2) Run the following statement: diagnostic helpstats on for session
3) Run an explain plan on your SQL query (just put the keyword “explain” in front of your select e.g. explain select * from w_day_d)
4) If you need any stats collecting you’ll find the following message at the bottom of your explain plan in the results window:
BEGIN RECOMMENDED STATS ->
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN ISSUE_DT”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN GRATIS_INDICATOR”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN O75_TYPE”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN START_DT”.
(HighConf)
<- END RECOMMENDED STATS