Linux SysAdmin and Oracle/MySQL DBA: Setting up 64bit Oracle 11gR1 on 64bit OEL5.4 with ramfs and hugepage configuration

Jephe Wu - http://linuxtechres.blogspot.com

Environment: HP Proliant Server DL360P6, 16G RAM, 8x300G SAS hard disk, Oracle Enterprise Linux(OEL) 5.4 x86-64, Oracle 11g 64bit
Objective: Setting up Oracle server with Oracle recommended RAID and using ramfs and hugepage to improve performance.

Part I: RAID, Partition and Installation of OEL and Oracle

Steps:

1. Oracle suggest to use RAID1 for control file, redo log, archive log and temp tablespace. RAID10 for database files which means you can put /u01 partition on RAID1+0 (search metalink for

Doc ID:

30286.1 - I/O Tuning with Different RAID Configurations )

2. For installation of OEL and Oracle, you can refer to otn.oracle.com tutorial part.
Note: for the installation of OEL, you should run 'mkdir -p /u01/app/oracle /u01/app/oraInventory/' after following the tutorial and before installing Oracle 11g.

note: for 64bit OEL, when putting shmmax and shmall, don't follow the tutorial if the existing default value is already more than the one you will be putting.

3. You might need to install PAE kernel on 32bit OEL to recognize more than 4G memory from OS level.

Part II: Optimize the system from OS point of view
Steps:

1. kernel boot parameter to optimize I/O
Edit the /etc/grub.conf file and add the following parameter to the kernel that's being used,

title Red Hat Enterprise Linux Server (2.6.18-8.el5)
        root (hd0,0) 
        kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/sda2 elevator=deadline
        initrd /initrd-2.6.18-8.el5.img

2. add the following mount parameter to /etc/fstab for /u01 partition

defaults,noatime.

3. remove all unnecessary services, use 'ntsysv --level 2345' to remove them, only leave services like crond, sysstat, sendmail, syslog, sshd, irq-balance, portmap, nfslock, network.
4. install HP Support Pack for hardware monitoring.

Part III: Optimize Oracle with ramfs and hugepage

Steps:

1. using ramfs instead of default tmpfs which comes with OEL

Edit the /etc/rc.local file and add the following entries to it to configure the computer to mount ramfs over the /dev/shm directory, whenever you start the computer:
```
umount /dev/shm
mount -t ramfs ramfs /dev/shm
chown oracle:oinstall /dev/shm
```
In the preceding commands, oracle is the owner of Oracle software files and oinstall is the group for Oracle owner account. So we only have Oracle installed which will be using ramfs.

ramfs cannot be pageable, tmpfs can be pageable/swpable. There is no page replacement mechanism overhead.HugePages are universally regarded as pinned.

Test

tmpfs
(previously shmfs) ramfs

Fill maximum space and continue writin display error continue writing

Fixed Size Yes No

Uses Swap Yes No

Volatile Storage Yes Yes
Run the following command to check if the /dev/shm directory is mounted with the ramfs type:
```
# mount | grep shm
ramfs on /dev/shm type ramfs (rw)
```
Run the following command to check the permissions on the /dev/shm directory:

Test	tmpfs (previously shmfs)	ramfs
Fill maximum space and continue writin	display error	continue writing
Fixed Size	Yes	No
Uses Swap	Yes	No
Volatile Storage	Yes	Yes

# ls -ld /dev/shm
drwxr-xr-x  3 oracle oinstall 0 Jan 13 12:12 /dev/shm

note: about tmpfs (previously called shmfs) - from http://en.wikipedia.org/wiki/Tmpfs

tmpfs is supported by the Linux kernel from version 2.4 and up.^[3] tmpfs (previously known as shmfs) distinguishes itself from the Linux ramdisk device by allocating memory dynamically and by allowing less-used pages to be moved onto swap space. ramfs, in contrast, does not make use of swap (which can be an advantage or disadvantage). In addition, MFS and some older versions of ramfs did not grow and shrink dynamically and instead used a fixed amount of memory at all times.

Usage of tmpfs for example is "mount -t tmpfs -o size=1G,nr_inodes=10k,mode=0700 tmpfs /space" which will allow up to 1 GiB in RAM/swap with 10240 inodes and only accessible by the owner of /space. Owner can be overridden with the uid/gid mount options, and will otherwise default to root. The filesystem's maximum size can also be changed on-the-fly, like "mount -o remount,size=2G /space".
/var/run and /var/lock can be tmpfs filesystems, to alleviate having to clean them up at each reboot.

2.  configuring memlock

Increase max locked memory limit. Edit /etc/security/limits.conf and add

oracle   soft   memlock 12582912

oracle   hard   memlock 12582912

Logon to oracle then check max locked mem limit:        
$ ulimit -l

12582912

The memlock parameter specifies how much memory the oracle user can lock into its address space. The memlock setting is specified in KB and must match the memory size of the number of Huge Pages that Oracle should be able to allocate.If memlock is too small, then no single Huge Page will be allocated when the Oracle database starts.

You can specify the maximum lockable memory for oracle user, above 12582912k (=12G*1024*1024), my total physical memory is 16G which is 3/4.

This value can be specified by calculating the number of hugepage, e.g. using the following ~/bin/hugepages_settings.sh to get 1000 hugepages, then 1000*2=2000m(=2000*1024k=2048000k), which you can put 2048000 in this memlock settings, so if you need to increase hugepage size, get the number first then increase here for memlock also.

Hugepage uses 2M continuous memory space per page.

3. Configuring hugepage

According to Doc ID:

361468.1-HugePages on 64-bit Linux

HugePages is a feature of the Linux kernel which allows larger pages to manage memory as the alternative to the small 4K pagesize. There is a general misconception where the HugePages is a feature specific to 32-bit Linux. The major function of 64-bit architecture and O/S with 64-bit support is the ability to address Very Large Memory (VLM) natively. With 32-bit architectures, generally speaking, VLM is accessed through a VLM window (See Note 200266.1), which is a data structure in the process address space (virtually max 4GB) that provides access to whole virtual address space from a "window" of a specific size. On 32-bit Linux, we need to set the USE_INDIRECT_DATA_BUFFERS=TRUE and mount a shmfs/tmpfs/ramfs type of in-memory filesystem over /dev/shm.

64-bit Linux does not need these mechanisms and can use the physical memory directly as the address space is virtually 16 EB (exabyte = 2^60 bytes) at maximum.

To check whether HugePages are supported/available on a running configuration, run:

    $ grep Huge /proc/meminfo

and check the output:

    HugePages_Total:    
    HugePages_Free: 
    Hugepagesize:

Regardless of the values printed, if you can see the lines above, the system supports HugePages.

The AMM and HugePages are not compatible. One needs to disable AMM on 11g to be able to use HugePages

HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations

HugePages can be used without indirect buffers on 64-bit systems (normal configuration will do).

Steps:

startup instance by runing

sqlplus / as sysdba
sql> startup nomount pfile='initorcl.ora'
sql> show parameter shared_pool_size
sql> show sga
sql> show parameter filesystemio_options
sql> show parameter disk_asynch_io

estimate the number of hugepages and finalize it

put the following program in $ORACLE_HOME/bin

#!/bin/bash

#

# hugepages_settings.sh

#

# Linux bash script to compute values for the

# recommended  HugePages/HugeTLB configuration

#

# Note: This script does calculation for all shared memory

# segments available when  the script is run, no matter it

# is an Oracle RDBMS shared memory segment or not.



# Check for the kernel version

KERN=`uname  -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`



# Find out the HugePage size

HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`



# Start from 1 pages to be on the safe side and guarantee 1 free HugePage

NUM_PG=1



# Cumulative  number of pages required to handle the running shared memory segments

for SEG_BYTES in `ipcs -m | awk '{print $5}' | grep "[0-9][0-9]*"`

do

   MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`

   if [ $MIN_PG -gt 0 ]; then

      NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`

   fi

done



# Finish with results

case $KERN in

   '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;

          echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;

   '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;

    *) echo "Unrecognized kernel version $KERN. Exiting." ;;

esac



# End

----------

Run the following command to change the permission of the file:
```
$ chmod +x hugepages_settings.sh
```

Run the hugepages_settings.sh script to compute the values for hugepages configuration:

$ ./hugepages_settings.sh

Set the following kernel parameter:

# sysctl -w vm.nr_hugepages=value_displayed_above

To make the value of the parameter available for every time you restart the computer, edit the /etc/sysctl.conf file and add the following entry:
```
vm.nr_hugepages=value_displayed_above
```
Run the following command to check the available hugepages:
```
$ grep Huge /proc/meminfo
```
Restart the instance.
Run the following command to check the available hugepages (1 or 2 pages free):
```
$ grep Huge /proc/meminfo
```

note:
if you noticed the total hugepage number in /proc/meminfo is lower than the number you specified in /etc/sysctl.conf, you might have not enough memory or system cannot find the enough continuous 2M memory pages(in this case, hugepage cannot be used for shared memory segment although it occupied some but not enough memory area), so you might need to reboot server.

the purpose above is to startup instance only(startup nomount) then estimate how many hugepage should be specified in /etc/sysctl.conf, after that, shutdown instance, then make sure oracle user can see the new specified hugepage parameter /proc/meminfo, then startup instance and database again)

you can verify them using 'ipcs -m' and ls -l /dev/shm/ for buffer cache memory file system and shared pool.

Note:
1. hugepage and AMM

The 11g AMM feature is enabled by the MEMORY_TARGET / MEMORY_MAX_TARGET instance initialization parameters (see Note 465048.1 for further information). That is also the case with a default database instance created using Database Configuration Assistant (DBCA).

With AMM all SGA memory is allocated by creating files under /dev/shm. When Oracle DB does SGA allocations that way HugePages are not reserved/used. The use of AMM is absolutely incompatible with HugePages.

Please also note that ramfs (instead of tmpfs mount over /dev/shm) is not supported for AMM at all. With AMM the Oracle database needs to grow and reduce the size of SGA dynamically. This is not possible with ramfs where it possible and supported with tmpfs (which is the default for the OS installation).

If you want to use HugePages make sure that both MEMORY_TARGET / MEMORY_MAX_TARGET initialization parameters are disabled for the database instance.

2. hugepage and shared memory segment
Huge Pages pool is not being used by the ramfs shared memory filesystems, only being used by shared memory segment which is what the command 'ipcs -m' is showing. Not ls -l /dev/shm/

Part IV: 32bit Oracle - using VLM (no need special setting for 64bit which support VLM natively)
On 32bit Oracle with VLM configuration, Note that huge pages will only apply to a small part of your SGA (shared, java, large pool) since your data blocks are not handled by huge pages but by indirect data buffers.

You should use 64-bit operating system and Oracle. Using indirect block buffers has a performance overhead. If you use a 64-bit operating system all your problems are gone and you can use huge pages for the whole sga including buffer cache.

for 32bit OS and Oracle, in order to configure VLM, do this:

add the following into initorcl.ora (8G (=8192k*1048576) buffer cache and 2G shared pool)

use_indirect_data_buffers=true

db_block_buffers=1048576
db_block_size=8192
shared_pool_size=2831155200
filesystemio_options='setall'

note:
change all DB_CACHE_SIZE (remove DB_xK_CACHE_SIZE parameters ) with DB_BLOCK_BUFFERS parameter.
*.shared_pool_size has high priority than orcl.__shared_pool_size
shared_io_poolsize=0 is reserved setting, don't change

Part V: using async io and direct io (for both 32bit and 64bit Oracle)

by default, RHEL5 and Oracle 11g

disk_asynch_io=true (to confirm, use show parameter disk_asynch_io)

If you use filesystems

filesystemio_options=setall  (to confirm, after startup database, show parameter filesystemio_opitons)

direct I/O

Direct I/O is a feature of the file system whereby file reads and writes go directly from the applications to the storage device, bypassing the operating system read and write caches. Direct I/O is used by only a few applications that manage their own caches, such as databases.

In the following example, $ORACLE_HOME/bin/oracle was relinked with async I/O:
$ ldd $ORACLE_HOME/bin/oracle | grep libaio
libaio.so.1 => /usr/lib/libaio.so.1 (0x0093d000)
$ nm $ORACLE_HOME/bin/oracle | grep io_getevent
w io_getevents@@LIBAIO_0.1
$
In the following example, $ORACLE_HOME/bin/oracle has NOT been relinked with async I/O: $ ldd $ORACLE_HOME/bin/oracle | grep libaio
$ nm $ORACLE_HOME/bin/oracle | grep io_getevent
w io_getevents
$
use the first number which is 515 to confirm the system is using asyncio, If it's not async
system, the 515 will be zero:

[oracle@jephe scripts]$ cat /proc/slabinfo | grep kioctx
kioctx 515 540 256 15 1 : tunables 120 60 8 : slabdata 36 36 0

so, on 32bit Oracle with VLM feature. We should use direct I/O and async filesystem I/O for Oracle, and hugepages for shared pool and ramfs for buffer cache, both hugepages and ramfs will lock the memory to prevent swapping

Part VI: Appendix:

1. http://www.puschitz.com/TuningLinuxForOracle.shtml
2. my 64bit Oracle initorcl.ora file:
-------------
orcl.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
*.audit_file_dest='/u01/app/oracle/admin/orcl/adump'
*.audit_trail='db'
*.compatible='11.1.0.0.0'
*.control_files='/u01/app/oracle/oradata/orcl/control01.ctl','/u01/app/oracle/oradata/orcl/control02.ctl','/u01/app/orac
le/oradata/orcl/control03.ctl'
*.control_management_pack_access='DIAGNOSTIC+TUNING'
*.cursor_sharing='similar'
*.db_block_size=8192
*.db_cache_size=6g
*.db_domain=''
*.db_name='orcl'
*.db_recovery_file_dest='/u01/app/oracle/flash_recovery_area'
*.db_recovery_file_dest_size=53687091200
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.filesystemio_options='setall'
*.java_pool_size=256m
*.large_pool_size=256m
*.log_archive_dest_1='location=/data/arch'
*.log_archive_dest_2=''
*.open_cursors=20000
*.optimizer_mode='choose'
*.pga_aggregate_target=1g
*.processes=1000
*.remote_login_passwordfile='EXCLUSIVE'
*.session_cached_cursors=100
*.shared_pool_size=2g
*.streams_pool_size=64m
*.sort_area_size=20971520
*.sort_area_retained_size=2097152
*.undo_tablespace='UNDOTBS1'
db_writer_processes=2
*.hash_area_size=20971520
*.log_buffer=20971520
*.java_jit_enabled=FALSE
------------------
Summary:
1. using ramfs instead of tmpfs to prevent swapping whenever it's possible
2. using 64bit OS and Oracle whenever it's possible
3. on 64bit OS+Oracle, hugepage can be used for the whole SGA, while on 32bit OS +Oracle, hugepage can only be used for whole SGA except for data block buffer which is using indirect data buffer. On 32-bit Linux, we need to set the USE_INDIRECT_DATA_BUFFERS=TRUE and mount a shmfs/tmpfs/ramfs type of in-memory filesystem over /dev/shm.
4. *.java_jit_enabled=FALSE, so that 64bit Oracle 11g can use exp to export data.

5. to speed up oracle processing, you might consider to collect statistics for schema such as:
exec dbms_stats.gather_schema_stats('JEPHE', cascade => true);
exec dbms_stats.delete_table_stats('SYS','CDEF$');
exec dbms_stats.delete_table_stats('SYS','CON$');
exec dbms_stats.delete_table_stats('SYS','USER$');
exec dbms_stats.gather_table_stats('SYS','CDEF$');
exec dbms_stats.gather_table_stats('SYS','CON$');
exec dbms_stats.gather_table_stats('SYS','USER$');

Linux SysAdmin and Oracle/MySQL DBA

Setting up 64bit Oracle 11gR1 on 64bit OEL5.4 with ramfs and hugepage configuration

Total Pageviews

Bookmark

Search This Blog

Followers

keywords

All Posts

About me