July 31, 2012

Getting mutrace to work on zLinux

Recently I got the question on how to trace mutex contention in libpthread on zLinux. There are several solutions to this
  1. Use SystemTap with the futexes.stp sample script
  2. Use Valgrind with the drd tool (see also this article) 
  3. Use mutrace as a lightweight tool
The last tool I've discovered in searching for solutions. However it hasn't been clear if it runs on zLinux or not. Usually it's just a ./configure and make to get a tool running but this one turned out to be a little bit more difficult.

I started of on a standard SLES11 SP2 with some of the development tools installed. So I downloaded the source from the mutrace-git and installed it in a directory. Then I called the ./bootstrap.sh script. Sure enough it was failing:
+ aclocal -I m4
configure.ac:21: error: Autoconf version 2.68 or higher is required
configure.ac:21: the top level
autom4te: /usr/bin/m4 failed with exit status: 63
aclocal: autom4te failed with exit status: 63

SLES11 SP2 hast autoconf 2.63, which isn't that ancient and SUSE had been patching and fixing it now in the second service pack. So I gave it a try and modified configure.ac to accept a minimum level of 2.63. Next run:
checking for library containing bfd_init... no
configure: error: *** libbfd not found

This means that the system is missing the devel package of the binutils. After installing the binutil-devel package with
zypper install binutils-devel
the bootstrap script finished successfully. At the end I noted that it used a -O0 in the gcc options, which from a performance perspective is really bad on zLinux. So I changed that in the Makefile to a -O2.

So now only the compile had to work and sure enough it ended with a 
mutrace.c: In function setup:
mutrace.c:441: error: #pragma GCC diagnostic not allowed inside functions
mutrace.c:442: error: #pragma GCC diagnostic not allowed inside functions
mutrace.c:444: error: #pragma GCC diagnostic not allowed inside functions
make[1]: *** [libmutrace_la-mutrace.lo] Error 1

So this tool was using an advanced gcc feature that the gcc-4.3 from SUSE didn't have. Fortunately SUSE includes an updated version gcc-4.6 that can be installed along with the standard system compiler. The package name is gcc46 and instead of gcc you call gcc-4.6. After changing the Makefile once more the compile went smoothly.
Finally tried it on a small test program and it seems to work fine.
mutrace: Showing statistics for process a.out (PID: 12050).
mutrace: 1 mutexes used.

Mutex #0 (0x0x80003088) first referenced by:
        /root/mutrace-e23dc42/.libs/libmutrace.so(pthread_mutex_lock+0x9e) [0x3fffd07c28e]
        ./a.out(functionCount1+0x20) [0x80000e34]
        /lib64/libpthread.so.0(+0x836e) [0x3fffd05436e]
        /lib64/libc.so.6(+0xef17e) [0x3fffcfb417e]

mutrace: Showing 1 mutexes in order of (write) contention count:

 Mutex #   Locked  Changed    Cont. cont.Time[ms] tot.Time[ms] avg.Time[ms] Flags
       0       83       13        7         0.155        0.089        0.001 M-.--.
     ...      ...      ...      ...           ...          ...          ... ||||||
          Object:                                      M = Mutex, W = RWLock /||||
           State:                                  x = dead, ! = inconsistent /|||
             Use:                                  R = used in realtime thread /||
      Mutex Type:                   r = RECURSIVE, e = ERRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                       i = INHERIT, p = PROTECT /

mutrace: Note that rwlocks are shown as two lines: write locks then read locks.

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: 1 condition variables used.

Condvar #0 (0x0x800030b0) first referenced by:
        /root/mutrace-e23dc42/.libs/libmutrace.so(pthread_cond_wait+0x7a) [0x3fffd07caea]
        ./a.out(functionCount1+0x32) [0x80000e46]
        /lib64/libpthread.so.0(+0x836e) [0x3fffd05436e]
        /lib64/libc.so.6(+0xef17e) [0x3fffcfb417e]

mutrace: Showing 1 condition variables in order of wait contention count:

  Cond #    Waits  Signals    Cont. tot.Time[ms] cont.Time[ms] avg.Time[ms] Flags
       0        6       67        0        0.106         0.000        0.000     -.
     ...      ...      ...      ...          ...           ...          ...     ||
           State:                                     x = dead, ! = inconsistent /
             Use:                                     R = used in realtime thread

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 0.319 ms.

mutrace: Results for SMP with 16 processors.

July 27, 2012

DB2 Connect high CPU utilization

DB2 Connect servers are usually a good target for consolidation. Recently we observed relatively high CPU utilization even though only a small workload was being used. The oprofile and strace output showed that the system was busy doing semget() calls that failed. So a resource was missing. In the end it turned out to be a known problem in DB2 which is fixed starting DB2 9.7 FP5 and can also be circumvented in older versions by issuing a "db2trc alloc" during startup.
This is a typical consolidation problem - on individual dedicated servers this usually is not even noticed - however after consolidation this will be visible. An additional positive effect of fixing the problem is an improved throughput at higher transaction rates.

July 4, 2012

New Whitepaper "Using the Linux cpuplugd Daemon to manage CPU and memory resources from z/VM Linux guests"

CPU and memory resources are normally shared in a virtualized environment. Therefore multiple guests are fighting for the same resources. Before the release of SLES11 SP2 and RHEL6.2 the automatic management of the number of virtual CPUs and the memory used in a virtual guest has been quite limited. Each system was requiring special attention and tuning of cpuplugd, the daemon that does the autonomic management in Linux on System z. Even then it had to be disabled for many systems.
The newer releases have a vastly improved daemon, that now allows for more detailed rules for adding and removing CPUs and memory to a guest. The drawback of more tuning knobs are more tuning knobs.
So this whitepaper tries to develop a recommended set of parameters to get the most benefit with the least effort. Furthermore results of measurements and experiments are shown together with the used parameters for advanced tuning.
Be aware that if you want cpuplugd to control memory and you run with more than one virtual CPU, you really want the fix for APAR VM65060 installed.
Also there has been a bug discovered in the base of SLES11 and RHEL 6 that's been fixed with maintweb kernel 3.0.42-0.7.3 for SLES 11 and the base RHEL 6.4 for Red Hat.

July 3, 2012

New Redpaper "Silent Installation Experiences with Oracle Database 11gR2 Real Application Clusters on Linux on System z"

IBM has published a new Redpaper covering the so called "silent mode" installation of Oracle. This installation method eliminates the GUI and user input during installation. Instead all parameters are specified in a file. Note that this applies to the latest Oracle only. This paper also covers the most current Oracle documents for installing on Linux on System z. Here is the content of this paper:
  • Oracle Database general information
  • Oracle environment
  • Oracle Grid Infrastructure silent mode installation
  • Oracle RAC silent mode installation
  • Upgrading with the latest patch set update
  • Optional methods to install and clone Oracle Database on Linux on z
    • Installing and cloning a single instance
    • Installing and cloning a two-node RAC database
    • Installing and cloning a new cluster
  • Cleaning up after a failed installation to perform a fresh installation
The standard installation of Oracle RAC is covered in "Installing Oracle 11gR2 RAC on Linux on System z" which I already recommended in an article here.