Setting up 64bit Oracle 11gR1 on 64bit OEL5.4 with ramfs and hugepage configuration

Jephe Wu - http://linuxtechres.blogspot.com


Environment:  HP Proliant Server DL360P6, 16G RAM, 8x300G SAS hard disk, Oracle Enterprise Linux(OEL) 5.4 x86-64, Oracle 11g 64bit
Objective: Setting up Oracle server with Oracle recommended RAID and using ramfs and hugepage to improve performance.


Part I:  RAID, Partition and Installation of OEL and Oracle


Steps:

1. Oracle suggest to use RAID1 for control file, redo log, archive log and temp tablespace. RAID10 for database files which means you can put /u01 partition on RAID1+0 (search metalink for

Doc ID: 30286.1 -  I/O Tuning with Different RAID Configurations )
2. For installation of OEL and Oracle, you can refer to otn.oracle.com tutorial part.
Note: for the installation of OEL, you should run 'mkdir -p /u01/app/oracle /u01/app/oraInventory/' after following the tutorial and before installing Oracle 11g.

note: for 64bit OEL, when putting shmmax and shmall, don't follow the tutorial if the existing default value is already more than the one you will be putting.

3. You might need to install PAE kernel on 32bit OEL to recognize more than 4G memory from OS level.

Part II: Optimize the system from OS point of view
Steps:

1. kernel boot parameter to optimize I/O
Edit the /etc/grub.conf file and add the following parameter to the kernel that's being used,
title Red Hat Enterprise Linux Server (2.6.18-8.el5)
        root (hd0,0) 
        kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/sda2 elevator=deadline
        initrd /initrd-2.6.18-8.el5.img
2. add the following mount parameter to /etc/fstab for /u01 partition
defaults,noatime.
3.  remove all unnecessary services, use 'ntsysv --level 2345' to remove them, only leave services like crond, sysstat, sendmail, syslog, sshd, irq-balance, portmap, nfslock, network.
4. install HP Support Pack for hardware monitoring.

Part III: Optimize Oracle with ramfs and hugepage

Steps:

1.  using ramfs instead of default tmpfs which comes with OEL
  • Edit the /etc/rc.local file and add the following entries to it to configure the computer to mount ramfs over the /dev/shm directory, whenever you start the computer:





    umount /dev/shm
    mount -t ramfs ramfs /dev/shm
    chown oracle:oinstall /dev/shm
    In the preceding commands, oracle is the owner of Oracle software files and oinstall is the group for Oracle owner account. So we only have Oracle installed which will be using ramfs.

    ramfs cannot be pageable, tmpfs can be pageable/swpable. There is no page replacement mechanism overhead.HugePages are universally regarded as pinned.







    Test

    tmpfs
    (previously shmfs)
    ramfs
    Fill maximum space and continue writin display error continue writing
    Fixed Size Yes No
    Uses Swap Yes No
    Volatile Storage Yes Yes



  • Run the following command to check if the /dev/shm directory is mounted with the ramfs type:





    # mount | grep shm
    ramfs on /dev/shm type ramfs (rw)
  • Run the following command to check the permissions on the /dev/shm directory:
# ls -ld /dev/shm
drwxr-xr-x  3 oracle oinstall 0 Jan 13 12:12 /dev/shm
 
note: about tmpfs (previously called shmfs) - from http://en.wikipedia.org/wiki/Tmpfs
tmpfs is supported by the Linux kernel from version 2.4 and up.[3] tmpfs (previously known as shmfs) distinguishes itself from the Linux ramdisk device by allocating memory dynamically and by allowing less-used pages to be moved onto swap space. ramfs, in contrast, does not make use of swap (which can be an advantage or disadvantage). In addition, MFS and some older versions of ramfs did not grow and shrink dynamically and instead used a fixed amount of memory at all times.

Usage of tmpfs for example is "mount -t tmpfs -o size=1G,nr_inodes=10k,mode=0700 tmpfs /space" which will allow up to 1 GiB in RAM/swap with 10240 inodes and only accessible by the owner of /space. Owner can be overridden with the uid/gid mount options, and will otherwise default to root. The filesystem's maximum size can also be changed on-the-fly, like "mount -o remount,size=2G /space".
/var/run and /var/lock can be tmpfs filesystems, to alleviate having to clean them up at each reboot.
 
 
2.  configuring memlock  
  • Increase max locked memory limit. Edit /etc/security/limits.conf and add





    oracle   soft   memlock 12582912 
    oracle   hard   memlock 12582912
    Logon to oracle then check max locked mem limit:        
    $ ulimit -l
    
    12582912
The memlock parameter specifies how much memory the oracle user can lock into its address space.  The memlock setting is specified in KB and must match the memory size of the number of Huge Pages that Oracle should be able to allocate.If memlock is too small, then no single Huge Page will be allocated when the Oracle database starts.

You can specify the maximum lockable memory for oracle user, above 12582912k (=12G*1024*1024), my total physical memory is 16G which is 3/4.

This value can be specified by calculating the number of hugepage, e.g. using the following ~/bin/hugepages_settings.sh to get 1000 hugepages, then 1000*2=2000m(=2000*1024k=2048000k), which you can put 2048000 in this memlock settings, so if you need to increase hugepage size, get the number first then increase here for memlock also.

Hugepage uses 2M continuous memory space per page.

3.  Configuring hugepage
According to Doc ID: 361468.1-HugePages on 64-bit Linux

HugePages is a feature of the Linux kernel  which allows larger pages to manage memory as the alternative to the small 4K pagesize. There is a general misconception where the HugePages is a feature specific to 32-bit Linux. The major function of 64-bit architecture and O/S with 64-bit support is the ability to address Very Large Memory (VLM) natively. With 32-bit architectures, generally speaking, VLM is accessed through a VLM window (See Note 200266.1), which is a data structure in the process address space (virtually max 4GB) that provides access to whole virtual address space from a "window" of a specific size.  On 32-bit Linux, we need to set the USE_INDIRECT_DATA_BUFFERS=TRUE and mount a shmfs/tmpfs/ramfs type of in-memory filesystem over /dev/shm.

 64-bit Linux does not need these mechanisms and can use the physical memory directly as the address space is virtually 16 EB (exabyte = 2^60 bytes) at maximum.

To check whether HugePages are supported/available on a running configuration, run:

    $ grep Huge /proc/meminfo 
and check the output:

    HugePages_Total:    
    HugePages_Free: 
    Hugepagesize: 
Regardless of the values printed, if you can see the lines above, the system supports HugePages.

 The AMM and HugePages are not compatible. One needs to disable AMM on 11g to be able to use HugePages

HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations

HugePages can be used without indirect buffers on 64-bit systems (normal configuration will do).

Steps:
  • startup instance by runing
sqlplus / as sysdba
sql> startup nomount pfile='initorcl.ora'
sql> show parameter shared_pool_size
sql> show sga
sql> show parameter filesystemio_options
sql> show parameter disk_asynch_io
  • estimate the number of hugepages and finalize it
put the following program in $ORACLE_HOME/bin


#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.

# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`

# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`

# Start from 1 pages to be on the safe side and guarantee 1 free HugePage
NUM_PG=1

# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | awk '{print $5}' | grep "[0-9][0-9]*"`
do
   MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
   if [ $MIN_PG -gt 0 ]; then
      NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
   fi
done

# Finish with results
case $KERN in
   '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
          echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
   '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
    *) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac

# End

----------
  • Run the following command to change the permission of the file:





    $ chmod +x hugepages_settings.sh
    

Run the hugepages_settings.sh script to compute the values for hugepages configuration:
$ ./hugepages_settings.sh
  1. Set the following kernel parameter:




    # sysctl -w vm.nr_hugepages=value_displayed_above
    
  2. To make the value of the parameter available for every time you restart the computer, edit the /etc/sysctl.conf file and add the following entry:




    vm.nr_hugepages=value_displayed_above
    
  3. Run the following command to check the available hugepages:




    $ grep Huge /proc/meminfo
    
  4. Restart the instance.
  5. Run the following command to check the available hugepages (1 or 2 pages free):




    $ grep Huge /proc/meminfo
note:
if you noticed the total hugepage number in /proc/meminfo is lower than the number you specified in /etc/sysctl.conf, you might have not enough memory or system cannot find the enough continuous 2M memory pages(in this case, hugepage cannot be used for shared memory segment although it occupied some but not enough memory area), so you might need to reboot server.

the purpose above is to startup instance only(startup nomount) then estimate how many hugepage should be specified in /etc/sysctl.conf, after that, shutdown instance, then make sure oracle user can see the new specified hugepage parameter /proc/meminfo, then startup instance and database again)

you can verify them using 'ipcs -m' and ls -l /dev/shm/ for buffer cache memory file system and shared pool.

Note:
1. hugepage and AMM
    The 11g AMM feature is enabled by the MEMORY_TARGET / MEMORY_MAX_TARGET instance initialization parameters (see Note 465048.1 for further information). That is also the case with a default database instance created using Database Configuration Assistant (DBCA).

    With AMM all SGA memory is allocated by creating files under /dev/shm. When Oracle DB does SGA allocations that way HugePages are not reserved/used. The use of AMM is absolutely incompatible with HugePages.

    Please also note that ramfs (instead of tmpfs mount over /dev/shm) is not supported for AMM at all. With AMM the Oracle database needs to grow and reduce the size of SGA dynamically. This is not possible with ramfs where it possible and supported with tmpfs (which is the default for the OS installation).

    If you want to use HugePages make sure that both MEMORY_TARGET / MEMORY_MAX_TARGET initialization parameters are disabled for the database instance.

    2.  hugepage and shared memory segment
    Huge Pages pool is not being used by the ramfs shared memory filesystems, only being used by shared memory segment which is what the command 'ipcs -m' is showing. Not ls -l /dev/shm/

     Part IV: 32bit Oracle - using VLM (no need special setting for 64bit which support VLM natively)
    On 32bit Oracle with VLM configuration, Note that huge pages will only apply to a small part of your SGA (shared, java, large pool) since your data blocks are not handled by huge pages but by indirect data buffers.

    You should use 64-bit operating system and Oracle. Using indirect block buffers has a performance overhead. If you use a 64-bit operating system all your problems are gone and you can use huge pages for the whole sga including buffer cache.
     
    for 32bit OS and Oracle, in order to configure VLM, do this:

    1. add the following into initorcl.ora (8G (=8192k*1048576) buffer cache and 2G shared pool)
    use_indirect_data_buffers=true
    db_block_buffers=1048576
    db_block_size=8192
    shared_pool_size=2831155200
    filesystemio_options='setall'
    
    note:
    change all DB_CACHE_SIZE (remove DB_xK_CACHE_SIZE parameters ) with DB_BLOCK_BUFFERS parameter.
    *.shared_pool_size has high priority than orcl.__shared_pool_size
    shared_io_poolsize=0 is reserved setting, don't change
     
    Part V: using async io and direct io (for both 32bit and 64bit Oracle) 
    
    • by default, RHEL5 and Oracle 11g




      disk_asynch_io=true (to confirm, use show parameter disk_asynch_io)

    • If you use filesystems




      filesystemio_options=setall  (to confirm, after startup database, show parameter filesystemio_opitons)

    • direct I/O
    Direct I/O is a feature of the file system whereby file reads and writes go directly from the applications to the storage device, bypassing the operating system read and write caches. Direct I/O is used by only a few applications that manage their own caches, such as databases.

    In the following example, $ORACLE_HOME/bin/oracle was relinked with async I/O:
    $ ldd $ORACLE_HOME/bin/oracle | grep libaio
    libaio.so.1 => /usr/lib/libaio.so.1 (0x0093d000)
    $ nm $ORACLE_HOME/bin/oracle | grep io_getevent
    w io_getevents@@LIBAIO_0.1
    $
    In the following example, $ORACLE_HOME/bin/oracle has NOT been relinked with async I/O: $ ldd $ORACLE_HOME/bin/oracle | grep libaio
    $ nm $ORACLE_HOME/bin/oracle | grep io_getevent
    w io_getevents
    $
    use the first number which is 515 to confirm the system is using asyncio, If it's not async
    system, the 515 will be zero:

    [oracle@jephe scripts]$ cat /proc/slabinfo | grep kioctx
    kioctx 515 540 256 15 1 : tunables 120 60 8 : slabdata 36 36 0

    so, on 32bit Oracle with VLM feature. We should use direct I/O and async filesystem I/O for Oracle, and hugepages for shared pool and ramfs for buffer cache, both hugepages and ramfs will lock the memory to prevent swapping

    Part VI: Appendix:

    1. http://www.puschitz.com/TuningLinuxForOracle.shtml
     2. my 64bit Oracle initorcl.ora file:
    -------------
    orcl.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
    *.audit_file_dest='/u01/app/oracle/admin/orcl/adump'
    *.audit_trail='db'
    *.compatible='11.1.0.0.0'
    *.control_files='/u01/app/oracle/oradata/orcl/control01.ctl','/u01/app/oracle/oradata/orcl/control02.ctl','/u01/app/orac
    le/oradata/orcl/control03.ctl'
    *.control_management_pack_access='DIAGNOSTIC+TUNING'
    *.cursor_sharing='similar'
    *.db_block_size=8192
    *.db_cache_size=6g
    *.db_domain=''
    *.db_name='orcl'
    *.db_recovery_file_dest='/u01/app/oracle/flash_recovery_area'
    *.db_recovery_file_dest_size=53687091200
    *.diagnostic_dest='/u01/app/oracle'
    *.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
    *.filesystemio_options='setall'
    *.java_pool_size=256m
    *.large_pool_size=256m
    *.log_archive_dest_1='location=/data/arch'
    *.log_archive_dest_2=''
    *.open_cursors=20000
    *.optimizer_mode='choose'
    *.pga_aggregate_target=1g
    *.processes=1000
    *.remote_login_passwordfile='EXCLUSIVE'
    *.session_cached_cursors=100
    *.shared_pool_size=2g
    *.streams_pool_size=64m
    *.sort_area_size=20971520
    *.sort_area_retained_size=2097152
    *.undo_tablespace='UNDOTBS1'
    db_writer_processes=2
    *.hash_area_size=20971520
    *.log_buffer=20971520
    *.java_jit_enabled=FALSE
    ------------------
    Summary:
    1.  using ramfs instead of tmpfs to prevent swapping whenever it's possible
    2.  using 64bit OS and Oracle whenever it's possible
    3.  on 64bit OS+Oracle, hugepage can be used for the whole SGA, while on 32bit OS +Oracle, hugepage can only be used for whole SGA except for data block buffer which is using indirect data buffer. On 32-bit Linux, we need to set the USE_INDIRECT_DATA_BUFFERS=TRUE and mount a shmfs/tmpfs/ramfs type of in-memory filesystem over /dev/shm.
    4. *.java_jit_enabled=FALSE, so that 64bit Oracle 11g can use exp to export data.

    5. to speed up oracle processing, you might consider to collect statistics for schema such as:
    exec dbms_stats.gather_schema_stats('JEPHE', cascade => true);
    exec dbms_stats.delete_table_stats('SYS','CDEF$');
    exec dbms_stats.delete_table_stats('SYS','CON$');
    exec dbms_stats.delete_table_stats('SYS','USER$');
    exec dbms_stats.gather_table_stats('SYS','CDEF$');
    exec dbms_stats.gather_table_stats('SYS','CON$');
    exec dbms_stats.gather_table_stats('SYS','USER$');