Tag Archives: Performance

Tips and techniques for loading large volumes of data into Oracle

July 19, 201110g, 11gbulk loading, large volume, migration, PerformanceBI Architect

– Drop indexes on target table (unless needed to enforce constraints during load, but ideally sort this out as part of the process to extract the data)
– Turn off archive logging (either at database level, or if you have the option just at table level ALTER TABLE xxx_table NOLOGGING
– If target table already exists, use the APPEND hint (INSERT INTO xxx_table /*+ APPEND */ VALUES…)
– Also consider adding a PARALLEL hint if you have enough available resource on the target database (INSERT INTO xxx_table /*+ PARALLEL(xxx_table 8) */
– Remember to collect statistics on the target table once loaded. Best practice in 10/11g is to use DBMS_STATS with default options e.g.exec dbms_stats.gather_table_stats(ownname=>’xxx_owner’,tabname=>’xxx_table’,estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree=>DBMS_STATS.DEFAULT_DEGREE)
– Once target table is loaded, collect statistics and then create indexes. Statistics are automatically generated as part of the index creation process.

BULK COLLECT with LIMIT

October 7, 201010g, Databasebulk collect with limit, Performance, pl/sqlBI Architect

Quite often these days I find myself writing update and insert statements to correct data issues we’ve found in our production dataset.

I’m working on one of the largest address databases in the UK and so performance is always at the forefront of my mind when writing these queries. Also, to get these fixes run on production requires downtime and means both the BAU guys and management always want to know “how far along is the data fix?” and “how long has it left to run?”.

With a “traditional” update statement with a commit at the end it is very hard, if not impossible, to answer these questions. You quickly find yourself looking at V$SESSION and associated views trying to work out the relationship between BLOCKS_CHANGED, CONSISTENT_CHANGES and so on.

Add to that the following risks with long running UPDATE and INSERT statements against millions of rows:

Rollback space could fill up and cause the statement to fail
ORA-30926: unable to get a stable set of rows in the source tables and similar errors can occur
If the job fails or has to be killed part way through, rollback time can be typically 1.5 – 2 times longer than the elapsed time before the statement terminated
If the job is killed you lose all progress made up to that point

Fortunately we can use the PL/SQL collection construct to get around most of these problems.

The summary process is:
1) Define a cursor that selects all records to be processed
2) Use BULK COLLECT with the LIMIT clause to process the contents of your cursor in manageable chunks

The actions performed in Step 2 will be:
a) Insert/ Update records as required
b) Commit
c) Maintain a progress table

I’m a great fan of seeing working examples, so here’s one:
PROCEDURE BULK_COLLECT_EXAMPLE IS BEGIN DECLARE


        -- Create a cursor by selecting in all cases to be considered by the data fix

        -- Only select the columns you need to be able to run the data fix

        -- Use an ORDER BY clause if you want to be able to restart the fix   
        CURSOR c_data IS

         SELECT DISTINCT ROW_ID FROM CONTACT_TBL WHERE FST_NAME = 'Jhon'

         ORDER BY ROW_ID;
        l_counter       number(15) := 0 ;
        -- Use l_stop_point to automatically stop the run after processing a set number of records

        -- IMPORTANT - only set this in multiples of your LIMIT value set below

        -- If you want to process all records in the cursor ensure this is set to -1

        l_stop_point    number(15) := -1 ; 
        -- Update l_restart_point if restarting, leave at 0 for a full run

        -- IMPORTANT - only set this in multiples of your LIMIT value set below

        l_restart_point number(15) := 0 ;
        -- Create local arrays for each column you have selected in the cursor above

        -- Use the _Table version of the standard SQL datatypes to create these arrays

        l_row_id  dbms_sql.Varchar2_Table;
        l_current_count       number(15) := 0 ;
    BEGIN

        OPEN c_data;

        LOOP

            FETCH c_data BULK COLLECT INTO l_row_id LIMIT 1000 ;    -- Collect into the local arrays declared above.

                                                                        -- Set the LIMIT to the number of rows to process per COMMIT 
            l_counter := l_counter + l_row_id.count ;
            IF l_counter >= l_restart_point AND (l_stop_point = -1 OR l_counter  0

                    THEN

                     INSERT INTO MONITORING_TBL(TABLE_NAME, DATE_STAMP, ROWS_PROCESSED, LAST_KEY) VALUES ('CONTACTS_TBL',SYSDATE, l_counter, l_row_id(l_current_count)) ;

                    END IF ;

                    COMMIT ;
            END IF ;
            EXIT WHEN l_row_id.count = 0 ;

        END LOOP ;

        IF c_data%isopen then

            CLOSE c_data;

        END IF ;

    END;

END BULK_COLLECT_EXAMPLE;

Performance Tuning Success with Index Organized Tables

May 31, 201010g, 11gIndex Organised Table, PerformanceBI Architect

Just had a fantastic success improving the performance of an update statement running on a table in an Oracle 10g database.

I’d originally written this update as three seperate SQL statments to be run sequentially.

Script 1 created a table containing the records to be considered for update.
Script 2 created a view that ranks a set of entity relationships and produces one row per row in the table created by script 1 ready for the update
Script 3 joins the view from script 2 and the base table to run the update.

There were only 150,000 records to update, out of a table containing many millions of records.

The explain plan for the update in script 3 was horrendously long with all manner of hash and merge joins, sorts etc. The total optimiser cost of the original query was 66,406 and it had been running for 19 hours without completing.

There were no indexes on the table created in Script 1. I created a non-unique index on the two columns used in the joins within the view created in Script 2 and then ran an explain plan on Script 3 again. The total optimiser cost came down to 40,221.

I restarted the update but wasn’t 100% convinced that I’d brought the run time down enough to get my job completed on time.

As I sat and looked at my update statement I realised I only required 2 columns in my update statement – the target table’s primary key and the value to be updated.

With such a narrow table my first thought was to create an index on both columns so that the update could run entirely from the index without having to force Oracle to read any blocks from the underlying table.

Then I remembered a colleague talking about Index Organized Tables (IOTs). These are eseentially B-Tree indexes where all the table data is stored within the index. A quick read of the Oracle Data Sheet on IOTs explained that they are most useful in improving query performance when accessed directly by their primary key.

So I created a new IOT, then inserted into by selecting the two required columns from my view in Script 2 above. I then modified my update statement to run from the new IOT. When I ran an explain plan on my latest version of the update statement I couldn’t believe my eyes…the cost was 3!!!

As a natural pessimist I thought “Yeah right, the cost might be 3, but I bet the run time is still hours.”

However I barely had chance to view the update through TOAD’s session browser as it completed within 1 minute! 🙂

How to collect statistics on a table in TeraData

March 10, 2010TeradataPerformance, statisticsBI Architect

This is a two step process.

First you need to understand that statistics are collected at an individual column level in TeraData, so to being you need to determine which columns need to be included.

You can do this by analysing common SQL queries which will be run against your table(s), and can even ask TeraData to tell you which columns should have stats collected.

Once you’ve come up with your list, run the following code to collect statistics on those columns:
COLLECT STATISTICS ON table_name COLUMN ( column_name1, column_name2, ... );

You only need to run the above code once. To perform subsequent stats refreshes you can now just run:
COLLECT STATISTICS table_name

Note: you can also collect statistics on indexes, hash indexes and join indexes.

Get TeraData to tell you which columns have statistics collected

March 10, 2010TeradataPerformance, statisticsBI Architect

Running
help statistics table_name
will list all columns with statistics together with the following information per column:

Get TeraData to tell you which columns need stats collecting

December 22, 2009Teradataexplain plan, Performance, statisticsBI Architect

If you have TeraData SQL queries that you run regularly (e.g. as part of a report or ETL process) you’ll want to ensure that those queries are performing as efficiently as possible.

Often, keeping up to date statistics on your tables can be one way to keeping query times on track.

TeraData has a helpful session setting that will give you a listing of all columns that should have stats collected.

Here’s how to use it.

1) Open a new connection in SQL Assistant
2) Run the following statement: diagnostic helpstats on for session
3) Run an explain plan on your SQL query (just put the keyword “explain” in front of your select e.g. explain select * from w_day_d)
4) If you need any stats collecting you’ll find the following message at the bottom of your explain plan in the results window:
BEGIN RECOMMENDED STATS ->
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN ISSUE_DT”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN GRATIS_INDICATOR”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN O75_TYPE”.
(HighConf)
“COLLECT STATISTICS R2PRD1.WC_ASSET_SCD COLUMN START_DT”.
(HighConf)
<- END RECOMMENDED STATS

Mike's Tech Blog

The greatest WordPress.com site in all the land!