Jump to content

UsingThreads:SynchronizationTimings: Difference between revisions

From EDM2
Draft safev
Ak120 (talk | contribs)
mNo edit summary
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
All thread related material I've come across the 'net that mention that the OS/2 API [[OS2 API:DosEnterCritSec | DosEnterCritSec()]] is slow and should not be used. Here I present my own timings of it, compared to [[OS2 API:DosRequestMutexSem | DosRequestMutexSem()]].
All thread related material I've come across the 'net that mention that the OS/2 API [[DosEnterCritSec]]() is slow and should not be used. Here I present my own timings of it, compared to [[DosRequestMutexSem]]().


== The Timing Method ==
== The Timing Method ==
 
First, let me introduce the method I used for timing the system calls: [[DosTmrQueryTime]](). This system call returns a snapshot of the high resolution timer. This is a 64 bit value, directly from the high resolution timer device [or, so I believe]. It is returned in a [[QWORD]] structure, and I use the following method to get it into OpenWatcoms 64 bit long long, given a QWORD time value:
First, let me introduce the method I used for timing the system calls: [[OS2 API:DosTmrQueryTime | DosTmrQueryTime()]]. This system call returns a snapshot of the high resolution timer. This is a 64 bit value, directly from the high resolution timer device [or, so I believe]. It is returned in a [[OS2 API:DataTypes::QWORD | QWORD]] structure, and I use the following method to get it into OpenWatcoms 64 bit long long, given a [[OS2 API:DataTypes::QWORD | QWORD]] time value:
  unsigned long long int timer_snapshot =
 
  ( static_cast< unsigned long long int >( time.ulHi ) << 32 ) | time.ulLo;
  unsigned long long int timer_snapshot = ( time.ulHi << 32 ) | time.ulLo;
 
To time a function, a snapshot is taken just before, and just after the function call, like so:
To time a function, a snapshot is taken just before, and just after the function call, like so:
  QWORD start, end;
  QWORD start, end;
   
   
Line 14: Line 11:
  DosEnterCritSec();
  DosEnterCritSec();
  DosTmrQueryTime( &end );
  DosTmrQueryTime( &end );
 
Now, the idea is to measure the time it takes the system to perform [[DosEnterCritSec]]() compared to [[DosRequestMutexSem]]() and not the actual real time elapsed for the duration.  To do this, I measured 300 samples of both system calls, and compared the smallest time value of both. Since I'm measuring the relative speed, the actual frequency of the high resolution timer is irrelevant.
<!--
However, since [[OS2 API:DosTmrQueryTime | DosTmrQueryTime()]] returns a snapshot of the timer device, and I want to time the system call itself,
-->
 
Now, the idea is to measure the time it takes the system to perform [[OS2 API:DosEnterCritSec | DosEnterCritSec()]] compared to [[OS2 API:DosRequestMutexSem | DosRequestMutexSem()]] and not the actual real time elapsed for the duration.  To do this, I measured 300 samples of both system calls, and compared the smallest time value of both. Since I'm measuring the relative speed, the actual frequency of the high resolution timer is irrelevant.


== The Timings ==
== The Timings ==
Here is the code I used to make the measurments:
Here is the code I used to make the measurments:
 
// file: perf.c++
  #include <iostream>
  #include <iostream>
  #include <limits>
  #include <limits>
Line 47: Line 38:
     DosExitCritSec();
     DosExitCritSec();
   
   
     unsigned long long int long_start = ( start.ulHi << 32 ) | start.ulLo;
     unsigned long long int long_start =
     unsigned long long int long_end = ( end.ulHi << 32 ) | end.ulLo;
      ( static_cast< unsigned long long int >( start.ulHi ) << 32 ) | start.ulLo;
     unsigned long long int long_end =
      ( static_cast< unsigned long long int >( end.ulHi ) << 32 ) | end.ulLo;
   
   
     unsigned long long int current_crit = long_end - long_start;
     unsigned long long int current_crit = long_end - long_start;
Line 66: Line 59:
     DosCloseMutexSem( mutex );
     DosCloseMutexSem( mutex );
   
   
     long_start = ( start.ulHi << 32 ) | start.ulLo;
     long_start = ( static_cast< unsigned long long int >( start.ulHi ) << 32 )  
     long_end = ( end.ulHi << 32 ) | end.ulLo;
                  | start.ulLo;
     long_end = ( static_cast< unsigned long long int >( end.ulHi ) << 32 )
                | end.ulLo;
   
   
     unsigned long long int current_mutex = long_end - long_start;
     unsigned long long int current_mutex = long_end - long_start;
Line 80: Line 75:
   return 0;
   return 0;
  }
  }
This can be compiled with OpenWatcom 1.4 or later, like so:
>wcl386 -cc++ -bm "perf.c++"
Or with GCC like so:
>g++ -Zmt "perf.c++" -o perf.exe
This gives the following output, on my eComStation 1.2MR system, most of the time:
DosEnterCritSec(): 13
DosRequestMutexSem(): 12
I have also seen the following numbers:
DosEnterCritSec(): 14
DosRequestMutexSem(): 12
And:
DosEnterCritSec(): 12
DosRequestMutexSem(): 12
So my sample space of 300 experiments is probably too low, or there may always be some fluctuations in timings. Also, it does not seem to matter how many threads there are in the process, the timings are always the same. It is left as an excersize for the reader to test with more than one thread.
== Conclusion ==
Almost all literature that goes into the OS/2 threading API states that using [[DosEnterCritSec]]() will kill performance, indicating that the system call itself is expensive. I have shown that it is not so, at least with later OS/2 kernels.
However, it is true that using it is probably overkill, so think very carefully before using it in your project.
----
[[UsingThreads]]
[[Category:C++ Articles]]

Latest revision as of 02:50, 21 February 2020

All thread related material I've come across the 'net that mention that the OS/2 API DosEnterCritSec() is slow and should not be used. Here I present my own timings of it, compared to DosRequestMutexSem().

The Timing Method

First, let me introduce the method I used for timing the system calls: DosTmrQueryTime(). This system call returns a snapshot of the high resolution timer. This is a 64 bit value, directly from the high resolution timer device [or, so I believe]. It is returned in a QWORD structure, and I use the following method to get it into OpenWatcoms 64 bit long long, given a QWORD time value:

unsigned long long int timer_snapshot =
  ( static_cast< unsigned long long int >( time.ulHi ) << 32 ) | time.ulLo;

To time a function, a snapshot is taken just before, and just after the function call, like so:

QWORD start, end;

DosTmrQueryTime( &start );
DosEnterCritSec();
DosTmrQueryTime( &end );

Now, the idea is to measure the time it takes the system to perform DosEnterCritSec() compared to DosRequestMutexSem() and not the actual real time elapsed for the duration. To do this, I measured 300 samples of both system calls, and compared the smallest time value of both. Since I'm measuring the relative speed, the actual frequency of the high resolution timer is irrelevant.

The Timings

Here is the code I used to make the measurments:

// file: perf.c++
#include <iostream>
#include <limits>

#define INCL_DOSPROFILE
#define INCL_DOSPROCESS
#define INCL_DOSSEMAPHORES
#include <os2.h>

int main( int argc, char *argv[] )
{
  unsigned long long int critical_min = std::numeric_limits< unsigned long long int >::max();
  unsigned long long int mutex_min = std::numeric_limits< unsigned long long int>::max();

  for ( int i = 0; i < 300; i++ )
  {

    QWORD start, end;
    DosTmrQueryTime( &start );
    DosEnterCritSec();
    DosTmrQueryTime( &end );
    DosExitCritSec();

    unsigned long long int long_start =
      ( static_cast< unsigned long long int >( start.ulHi ) << 32 ) | start.ulLo;
    unsigned long long int long_end =
      ( static_cast< unsigned long long int >( end.ulHi ) << 32 ) | end.ulLo;

    unsigned long long int current_crit = long_end - long_start;

    if ( critical_min > current_crit )
      critical_min = current_crit;

    // Mutex timer

    HMTX mutex;
    DosCreateMutexSem( NULL, &mutex, 0, FALSE );

    DosTmrQueryTime( &start );
    DosRequestMutexSem( mutex, SEM_INDEFINITE_WAIT );
    DosTmrQueryTime( &end );
    DosReleaseMutexSem( mutex );
    DosCloseMutexSem( mutex );

    long_start = ( static_cast< unsigned long long int >( start.ulHi ) << 32 ) 
                 | start.ulLo;
    long_end = ( static_cast< unsigned long long int >( end.ulHi ) << 32 )
               | end.ulLo;

    unsigned long long int current_mutex = long_end - long_start;
    if ( mutex_min > current_mutex )
      mutex_min = current_mutex;

  }

  std::cout << "DosEnterCritSec(): " << critical_min << std::endl
            << "DosRequestMutexSem(): " << mutex_min << std::endl;

  return 0;
}

This can be compiled with OpenWatcom 1.4 or later, like so:

>wcl386 -cc++ -bm "perf.c++"

Or with GCC like so:

>g++ -Zmt "perf.c++" -o perf.exe

This gives the following output, on my eComStation 1.2MR system, most of the time:

DosEnterCritSec(): 13
DosRequestMutexSem(): 12

I have also seen the following numbers:

DosEnterCritSec(): 14
DosRequestMutexSem(): 12

And:

DosEnterCritSec(): 12
DosRequestMutexSem(): 12

So my sample space of 300 experiments is probably too low, or there may always be some fluctuations in timings. Also, it does not seem to matter how many threads there are in the process, the timings are always the same. It is left as an excersize for the reader to test with more than one thread.

Conclusion

Almost all literature that goes into the OS/2 threading API states that using DosEnterCritSec() will kill performance, indicating that the system call itself is expensive. I have shown that it is not so, at least with later OS/2 kernels.

However, it is true that using it is probably overkill, so think very carefully before using it in your project.


UsingThreads