Measuring CPU Usage

From EDM2
Jump to: navigation, search

by Sergey I. Yevtushenko - Source code: cpu.zip

Foreword

This article will explain techniques which can be used to build CPU usage measurement applications.

Overview of known techniques for CPU usage measurement

Well, there are a lot of them. The first and most widely known one is using an idle priority thread. This is easy to implement method (you can look, for example, into the sources of MemSize_4-00.zip) but has many disadvantages. The first disadvantage is that they frequently interfere with other programs which use idle priority threads for their own purposes. Another disadvantage is based on the fact that when processor not used, it is sleeping at the HLT instruction inside the OS/2 kernel. In this state the processor consumes less power, has lower temperature and this increases system stability.

The second method is based on using the DosPerfSysCall system call. This function is documented in the "SMP Programming Addendum" for Warp Server SMP. This function uses performance counters available in Pentium and its successors. This is a good method for cases where we need to know only the total system CPU usage and it is widely used. At least WarpCenter and PU Monitor 1.10f uses them. Of course this call will not work on a 486. Also, this call is not available in all version of OS/2 Warp. But Warp 3.0 and Warp 4.0 with the latest fixpacks have them. Another advantage of this method is the ability to measure CPU usage for more than one CPU. This example code comes from the "SMP Programming Addendum", with some modifications:

This example uses the DosPerfSysCall to obtain CPU Utilization information on a uniprocessor.

#define INCL_BASE

#include <os2.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

APIRET APIENTRY DosPerfSysCall(ULONG ulCommand, ULONG ulParm1,
                               ULONG ulParm2, ULONG ulParm3);

#define ORD_DOS32PERFSYSCALL   976
#define CMD_KI_RDCNT           (0x63)

typedef struct _CPUUTIL {
  ULONG ulTimeLow;     /* Low 32 bits of time stamp      */
  ULONG ulTimeHigh;    /* High 32 bits of time stamp     */
  ULONG ulIdleLow;     /* Low 32 bits of idle time       */
  ULONG ulIdleHigh;    /* High 32 bits of idle time      */
  ULONG ulBusyLow;     /* Low 32 bits of busy time       */
  ULONG ulBusyHigh;    /* High 32 bits of busy time      */
  ULONG ulIntrLow;     /* Low 32 bits of interrupt time  */
  ULONG ulIntrHigh;    /* High 32 bits of interrupt time */
} CPUUTIL;

typedef CPUUTIL *PCPUUTIL;

/* Convert 8-byte (low, high) time value to double */
#define LL2F(high, low) (4294967296.0*(high)+(low))

/* This is a 1 processor example */
void main (int argc, char *argv[])
{
   APIRET      rc;
   int         i, iter, sleep_sec;
   double      ts_val, ts_val_prev;
   double      idle_val, idle_val_prev;
   double      busy_val, busy_val_prev;
   double      intr_val, intr_val_prev;
   CPUUTIL     CPUUtil;

   if ((argc < 2) || (*argv[1] < '1') || (*argv[1] > '9')) {
       fprintf(stderr, "usage: %s [1-9]\n", argv[0]);
       exit(0);
   }
   sleep_sec = *argv[1] - '0';

   iter = 0;
   do {
       rc = DosPerfSysCall(CMD_KI_RDCNT,(ULONG) &CPUUtil,0,0);
       if (rc) {
           fprintf(stderr, "CMD_KI_RDCNT failed rc = %d\n",rc);
           exit(1);
      }
      ts_val = LL2F(CPUUtil.ulTimeHigh, CPUUtil.ulTimeLow);
      idle_val = LL2F(CPUUtil.ulIdleHigh, CPUUtil.ulIdleLow);
      busy_val = LL2F(CPUUtil.ulBusyHigh, CPUUtil.ulBusyLow);
      intr_val = LL2F(CPUUtil.ulIntrHigh, CPUUtil.ulIntrLow);

      if (iter > 0) {
          double  ts_delta = ts_val - ts_val_prev;
          printf("idle: %4.2f%%  busy: %4.2f%%  intr: %4.2f%%\n",
                 (idle_val - idle_val_prev)/ts_delta*100.0,
                 (busy_val - busy_val_prev)/ts_delta*100.0,
                 (intr_val - intr_val_prev)/ts_delta*100.0);
      }

      ts_val_prev = ts_val;
      idle_val_prev = idle_val;
      busy_val_prev = busy_val;
      intr_val_prev = intr_val;

      iter++;
      DosSleep(1000*sleep_sec);

   } while (1);
}

The third method is based on the well-known (but also undocumented) system calls DosQProcStat and DosQuerySysState. These system calls return interesting system information about threads, processes, modules and so on. Other information, also returned by these system calls is rarely used, although they can help measure CPU usage for each process (even for each thread of process) separately. We will look at structures returned by DosQProcStat (I'll return to DosQuerySysState later):

typedef struct {
    ULONG       rectype;
    USHORT      threadid;
    USHORT      slotid;
    ULONG       sleepid;
    ULONG       priority;
    ULONG       systime;        <---------------------
    ULONG       usertime;       <---------------------
    UCHAR       state;
    UCHAR       _reserved1_;    /* padding to ULONG */
    USHORT      _reserved2_;    /* padding to ULONG */
} QTHREAD, *PQTHREAD;

As you can see there is information about how each thread uses CPU, both in processes and in system calls. But this information comes in a form which makes direct use nearly impossible: each counter is increased by the system each time the process goes through scheduler. On the other hand this is real information, which comes directly from the operating system and therefore contains the most precise information about CPU usage by each thread. So, we ought to store information about each process in our program and by periodically calling DosQProcStat, check by how many each counter has been changed by system. This is much more complicated way than the simple calculations needed by the two previous methods. To make life a lot easier I have used simple C++ Collection class which has served me for about five years now:

class Collection
{
    protected:
        Ptr * ppData;
        DWord     dwLast;
        DWord     dwCount;
        DWord     dwDelta;
        Byte      bDuplicates;
    public:
        //***** New
        Collection(DWord aCount =10, DWord aDelta =5);
        ~Collection();
        Ptr Get(DWord);
        Ptr Remove(DWord);

        virtual void Add(Ptr);
        virtual void At(Ptr, DWord);
        virtual void Free(Ptr p)     { delete p;}

        void  ForEach(ForEachFunc);
        DWord Count()                {return dwLast;}
        void  RemoveAll();
};
class SortedCollection:public Collection
{
    public:
        //***** New
        SortedCollection(DWord aCount = 10, DWord aDelta = 5):
            Collection(aCount,aDelta){ bDuplicates = 1;};
        virtual int Compare(Ptr p1,Ptr p2)
                            {return *((int *)p1) - *((int *)p2);}
        virtual DWord Look(Ptr);
        //***** Replaced
        virtual void Add(Ptr);
};

And for storing information about each process I have used the following structure:

struct ProcInfo
{
    int iPid;
    int iSid;
    int iType;
    int iTouched;
    int lOldUser;
    int lDeltaUser;
    int lOldSystem;
    int lDeltaSystem;
    int iUsage;

    char cProxName[265];
};

For do manipulation with this structures (adding, removing, refreshing counters and so on) from SortedCollection was inherited new class ProcessCollection:

class ProcessCollection:public SortedCollection
{
        ProcInfo * CheckAndInsert(ProcInfo *, int&);
        ProcInfo * ProcessCollection::LocatePID(int pid);
        unsigned long ulTotalUsr;
        unsigned long ulTotalSys;

    public:
        ProcessCollection() {}
        ~ProcessCollection(){}

        void Print(int i, char* cStr);
        int  Pid(int i);
        int  CPULoad(int i);
        void CollectData();
        virtual int Compare(Ptr p1,Ptr p2)
            {return PInfo(p1)->iPid - PInfo(p2)->iPid;}
};

Most of the interesting work is done in ProcessCollection::CollectData. First step is going through the structures returned by QuerySysInfo() (this is just a wrapper for DosQProcStat or DosQuerySysState). We check for information about each process in the collection and, if needed, fill in ProcInfo structures from relevant fields from structures returned by QuerySysInfo(). At this step each process is also marked for their presence in collection.

In the second step we remove processes not marked at first step and calculate total counters for later use.

The rest is simple: ProcessCollection::Print fills the buffer with information for specified processes. This method is used later for, of course, printing purposes, but also calculates CPU usage from information provided by ProcessCollection::CollectData. This method is mentioned here because of one problem related to VDM processes. For such processes, the system indicates that most of the time is spent inside system. This is, of course, true, because each call to int 21h (do you remember what it is?) or int 16h or any other interrupt (for example, simulation of clock timer, int 8h, or keyboard, int 9h) cause VDM to transfer execution inside a virtual device or the kernel. To avoid loss of interesting information for VDM processed, their system time is also counted as process time.

A few Words About DosQProcStat and DosQuerySysState. The first function is a 16-bit undocumented system call, available in almost all versions of OS/2, and therefore it is a reliable way to obtain system information. DosQuerySysState is relatively new system call (also undocumented, as far as I know), available from somewhere around fixpack 17 for OS/2 Warp 3.0. Often it was broken in intermediate fixpacks. Because it is a 32-bit system call it is able to use buffer for data longer than 64K, but there are many problems: from my experience, DosQuerySysState is very sensitive to length and placement of buffer and bufsz parameters (see DOSQSS.H header file). They should be smaller than the real buffer. If this is not so, in some circumstances, you can get trap 0E when many (or just frequent) VDM sessions are started and stopped. So be careful when using this function. For programs not targeted at monitoring big servers with thousands of processes and threads in PROTECTONLY=YES mode, I advice to use DosQProcStat, because it is stable.

Closing Words

The source code presented in this article uses undocumented calls to the operating system. These calls can be dangerous to your data because, in theory, they can crash your system. So use the code at your own risk.