ously (ie. Semaphores enable processes to ensure mutual exclusivity). Most of those courses will simplify the notion of a semaphore by saying that a semaphore can hold two values: 0 or 1. A value of 0 usually indicates that a process can take the semaphore and a value of 1 usually indicates that a process already has the semaphore therefore no other processes can take the semaphore until it is released. Unix System V semaphores are much more flexible than that. First of all, the term "semaphore" is used slightly dif- ferently. The following picture illustrates the difference: Individual Semaphores __________________|__________________ | | | | | V V V V V ___ ___ ___ ___ ___ ____| |____| |____| |____| |____| |____ Semaphore | |___| |___| |___| |___| |___| | Set |_________________________________________________| A semaphore, in Unix System V, is a collection of one or more of the semaphores as taught in those courses. To distinguish which we are talking about, we will call the Unix System V semaphores "semaphore sets" and we will call each semaphore an "individual semaphore." The kernel parameter SEMMNI determines how many semaphore sets to allocate system wide. To determine the number of semaphore sets currently in use, use the ipcs(ADM) command: # ipcs -s IPC status from /dev/kmem as of Fri Oct 16 11:09:29 1992 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 0 0x6c737372 --ra-ra-ra- root other root other 3 11:39:44 11:39:43 The ipcs output above shows that only one semaphore set is in use. By counting the number of semaphore sets listed by ipcs, you can determine the maximum number of semaphore sets that your applications use. Set SEMMNI to that number plus enough extra to allow for a margin of safety. Each semaphore set can hold one or more individual semaphores. The maximum number of individual semaphores per semaphore set is determined by SEMMSL. Determining the appropriate value for SEMMSL requires knowing how each of your appliations utilize the semaphore resources. To find out how many individual semaphores a particular application needs per semaphore set, you must con- tact the writer of the application. SEMMNS determines how many individual semaphores to allocate system wide. To determine the number of individual semaphores currently in use, we can again use the ipcs(ADM) command: # ipcs -s IPC status from /dev/kmem as of Fri Oct 16 11:09:29 1992 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 0 0x6c737372 --ra-ra-ra- root other root other 3 11:39:44 11:39:43 Note that NSEMS column shows that this one one semaphore set uses 3 individual semaphores. By counting the number of indivi- dual semaphores listed by ipcs, you can determine the maximum number of individual semaphores that your applications use. Set SEMMNS to that number plus enough extra to allow for a margin of safety. Stop lights are a good example of semaphores. As a matter of fact, in parts of Latin America, stop lights are called "semaforos". Stop lights in the U.S. and Latin America use three different colors to signify three different conditions. You might say that stop lights can have three "values": Green, yellow and red. Just like the stop light example, each individual semaphore can hold a range of values. For example: ___ ___ ___ ___ ___ ____| 0 |____| 2 |____| 7 |____| 9 |____| 3 |____ | |___| |___| |___| |___| |___| | |_________________________________________________| You can set a maximum value on this using the SEMVMX kernel parameter. The value that this kernel parameter should be set to can only be determined by the writer(s) of your application(s). However, since setting SEMVMX to its maximum (ie. 32767) is no more costly than setting it lower, it is recommended that you simply leave SEMVMX to its default value of 32767. Determining values for the remaining semaphores (SEMMNU, SEMOPM, SEMUME, SEMAEM, and XSEMMAX) require highly detailed knowledge of how your applications utilize the semaphore resources. To determine appropriate values for these kernel parameters, you must contact the writer(s) of your application(s). B) Message Queues Another form of interprocess communication provided by the Unix kernel is messages queues. These allow multiple processes to pass data back and forth in the form of mes- sages. This section has been left blank intentionally. C) Shared Data The third form of interprocess communication provided by the Unix kernel is shared data. This IPC allows unrelated processes to share portions of the system's memory. This section has been left blank intentionally. Section VI .................... Tuning the Streams Subsystem Xsight, TCP/IP and other services use a kernel resource known as streams. Streams is a form of communication that can connect two processes, two pieces of hardware, or a combination of a process and a piece of hardware. In order for these services to run optimally, you must have enough streams resources allocated. A) Streams and Queues As the following diagram illustrates, a stream is composed of "stream heads" and the queues which connect them: _____________ | | | User | | Application | |_____________| | ^ | | User Level | | ======================================================= downstream | | | | | Kernel Level | __v_|___ v | | | Stream | | Head | |________| | ^ Queue ---> | | <--- Queue __v__|__ | | | Stream | | Head | |________| | ^ Queue ---> | | <--- Queue ^ __v__|__ | | | | | Driver | upstream |________| ^ | | v Hardware Stream heads, in short, are simply data structures which con- tain information about the stream. The NSTREAM kernel param- eter determines the number of stream heads system-wide. Under the current releases of SCO Unix, there are two ways of determining how high to set NSTREAM: 1) Either you must know your applications so intimately that you know their require- ments, or 2) you must use the available tools to determine when you don't have enough stream heads and then increase the number of stream heads. Each stream head uses up to four queues. These four queues allow 1) reading from the stream head above, 2) writing to the stream head above, 3) reading from the stream head below, and 4) writing to the stream head below. Because each stream head will use up to four queues and since each queue carries a relatively low overhead, it is recom- mended that you set NQUEUE to 4*NSTREAM. Since NQUEUE is based on NSTREAM, we return to our discussion of how to determine an appropriate value for NSTREAM. To determine if you have NQUEUE and NSTREAM set high enough, use the "strstat" option of the crash(ADM) utility (or as an alternative, use netstat -m). # crash > strstat ITEM CONFIG ALLOC FREE TOTAL MAX FAIL streams 128 73 55 2227 80 0 queues 512 326 186 4484 354 0 message blocks 2290 146 2144 966420 211 0 data block totals 1832 146 1686 775526 191 0 data block size 4 512 1 511 23900 12 0 data block size 16 512 2 510 8863 12 0 data block size 64 256 14 242 608688 43 0 data block size 128 256 96 160 69936 126 0 data block size 256 128 25 103 29588 37 0 data block size 512 72 8 64 33825 23 0 data block size 1024 32 0 32 191 8 0 data block size 2048 32 0 32 527 9 0 data block size 4096 32 0 32 8 3 0 Count of scheduled queues: 0 The way this information is used is as follows: Each entry in the "ITEMS" column corresponds to a configurable kernel parameter. If any of these entries list a failure in the "FAIL" column, then you must bump up the corresponding kernel parameter. Any such failures will result in deminished per- formance. If the "queues" entry lists any failures, then the NQUEUE kernel parameter needs to be increased. As stated earlier, a stream head utilizes up to four queues, therefore NQUEUE should be set to 4*NSTREAM. If the "streams" entry lists any failures, then the NSTREAM kernel parameter needs to be increased. The value of NSTREAM is highly application-dependent, so the only way to know what value to use is to either know your applications intimately or to rely on the strstat of crash(ADM) to show you when you don't have enough. When you don't have NSTREAM set high enough, you will receive failures, at which point you should increase NSTREAM. Setting NSTREAM between 32 and 40 usually suffices. B) Data Blocks and Message Blocks Data blocks are sent across the streams and are used by both stream heads and queues. Data blocks can be a variety of sizes: 4, 16, 64, 128, 256, 512, 1024, 2048, and 4096. The kernel parameters NBLK4, NBLK16, NBLK64, NBLK128, NBLK256, NBLK512, NBLK1024, NBLK2048, and NBLK4096, respectively, are used to determine how many blocks of each size are provided system-wide. If the "data block size 512" entry lists failures, then the NBLK512 kernel parameter needs to be increased. Similarily, if the "data block size 4" entry lists failures, then the NBLK4 kernel parameter needs to be increased. In general, if the "data block size n" entry lists failures, then the NBLKn kernel parameter needs to be increased. After increasing the NBLKn, the kernel must be relinked and then you must reboot off that new kernel. If failures con- tinue, you need increase these values higher. The "data block totals" entry, as its name suggests, is sim- ply the sum of all the data blocks. Therefore, this is not directly configurable via a kernel parameter. Messages blocks, like data blocks, are sent across the streams and are used by both stream heads and queues. The number of message blocks is not directly configurable via a kernel parameter. Instead, the kernel automatically sets the number of message blocks equal to the total number of data blocks times 1.25. If the "message blocks" entry lists failures, then increase the number of data blocks. Doing this will, as a side-effect, increase the number of message blocks. NOTE: Advanced users of SCO Unix 3.2v4 and SCO Open Desktop Releases 2.0 and 3.0 can examine the file /etc/conf/pack.d/str/space.c to see how the number of message blocks is set by the kernel. This is done on the line that begins "mblk_t mblock[NBLK4096+..." While this line can be modified to change the constant 1.25 to another appropriate value, knowing what "appropriate value" to use requires intimate knowledge of the set of applications that you are using. There- fore, SCO recommends that you do not modify this con- stant and instead increase the number of message blocks by increasing the total number of data blocks. C) Data Block Prioritization An individual data block can be assigned high, medium or low priority. STRLOFRAC is the percentage of data blocks of a given class (ie. size) at which low-priority blocks alloca- tion requests fail. For example, if STRLOFRAC is 40 and there are forty-eight 256-byte data blocks (ie. NBLK256=48), a low-priority allocation request will fail if more than nineteen 256-byte data blocks are already allocated. STRMEDFRAC, like STRLOFRAC, is the percentage of data blocks of a given class (ie. size) at which medium-priority blocks allocation requests fail. If a failure results due to the STRLOFRAC or STRMEDFRAC cut- offs, then the strstat option of crash(ADM) will list a failure in the "FAIL" column for the data block size which could not be allocated. One way of fixing this problem is to simply increase the value of NBLKn where "n" is the size of the data block class that failed. Alternatively, such a failure can be corrected by increasing either STRLOFRAC or STRMEDFRAC, but knowing which of these two kernel parameters to increase requires intimate knowledge of the applications being used. The default values for STRLOFRAC and STRMEDFRAC work well for most applications. So it is recommened that you use the defaults unless you have been told by your application writer to set them otherwise. If you do choose to modify STRLOFRAC and STRMEDFRAC, you set them such that 0 <= STRLOFRAC <= STRMEDFRAC <= 100. Note that there is no kernel parameter for setting the high prior- ity data block cutoff. This is because there is no cutoff fraction for high-priority allocation requests. It is effec- tively 100%. That is, a high-priority allocation request will only fail if all of the data blocks of a certain class (ie. size) are in use. Section VII ................... External Memory Cache Issues Insufficient external memory cache can have a detrimental impact on performance. Incorrectly implemented external memory cache, on the other hand, can have a disasterous impact on performance. While the primary focus of this article has been to discuss increasing performance by tuning the kernel, we now turn to a discussion of external memory cache due to its potential impact on performance. A) The "anti-cache" problem Most motherboards which have external memory cache require a certain amount of cache to correspond to system memory. For instance, in the most typical design, 64K of cache are required for each 16Mb of RAM. If the system has more RAM than its cache can handle, it gen- erates a signal whenever memory outside the cachable area is accessed. This signal tells the external memory cache not to pay attention to this particular memory access. On most motherboards, this same signal causes the Intel 486 or Pentium internal cache to ignore that memory access. The result is that memory below a certain address is cached by both the internal and external caches, while memory above that address is not cached at all. This causes a striking difference in performance; up to a factor of eleven in cer- tain tests. This problem is sometimes referred to as "anti- caching". If you suspect this problem, enter the following shell script (save it as /tmp/times.sh): : while : do set `timex sh -c 'echo "for (i=0 ; i<10000 ; i++)" | nice -2 bc' 2>&1` echo "`date` real=$2 user=$4 sys=$6" done | tee /tmp/times.log Run it for several hours while your system is being used for various normal tasks. If the longest "user" time is more than 1.5 times the shortest "user" time, your system probably has this problem. (It is normal for "real" time to vary a great deal). If running the script shows the problem, take the following steps: 1. Check BIOS setup. Many BIOS's can be set to cache only part of memory. Some are configured that way from the factory. The BIOS should normally be set to allow caching of all of main memory. 2. Check system or motherboard documentation, or contact the hardware manufacturer. Make sure you have enough cache for your RAM. If you are unsure and cannot find the necessary information, upgrade the motherboard to the max- imum amount of cache it supports. Having extra cache will not degrade performance. 3. While waiting for a cache upgrade to be installed, the problem may in some cases be partially alleviated by load- ing the operating system entirely within the first 16Mb of memory. To do this, add the string "mem= the end of the DEFBOOTSTR entry in /etc/default/boot. You can also do this for a single boot session by entering it at the "Boot:" prompt as follows: Boot : defbootstr mem=/L 4. Alternatively, you could temporarily restrict your system to only use memory which is cachable by the currently installed cache. For instance, if the system has 64Mb RAM and 128K cache, it is likely that only the first 32Mb of RAM is cachable. Until upgrade hardware arrives, you could set the system to use only the first 32Mb of RAM as follows: Boot : defbootstr mem=1m-32m See boot(HW) for more information about "boot" keywords. It is important to realize that the steps outlined above treat the symptoms of the problem. The cause of "anti- cache" problems is insufficient or broken external memory cache and to fix this problem, you _must_ correct the hardware problem. Extensive experience has shown that tuning the kernel has very little effect when anticaching is the problem. B) Benchmark Tests The sar(ADM) utility can be used to quickly spot anti-cache problems: scsibox scsibox 3.2 2 i386 07/27/94 00:00:01 %usr %sys %wio %idle 01:00:00 0 2 0 0 02:00:01 0 2 0 0 03:00:01 0 2 0 0 04:00:01 0 2 0 0 Average 0 2 0 0 ^ | This indicates how much time the CPU is idle. If %idle is 0%, this is an indication of one of two situa- tions. The first is that your system is CPU-bound (ie. you have simply exhausted the capacity of the CPU). The second is that the CPU is never idle because it is always waiting on memory fetches (ie. the anti-cache problem). Both of these situations indicate hardware problems. Section VIII .................. Determining the effectiveness of tun- ing SCO Unix and SCO Open Desktop provide a number of utilities to assist a system administrator in kernel performance tuning. A) sar The sar(ADM) utility has been used and discussed throughout this article. It is highly recommended that you familiarize the -u, -b, -d, -v, -m, -p and -r options. B) vmstat While the procedures outlined in this article have not used vmstat, the vmstat utility can be useful when tuning kernels for performance. The information below is provided for those who wish to understand vmstat better. The vmstat(C) utility reports statistics kept by the system on processes, paging, and CPU and trap activity. The most useful information provided by vmstat is generated by using the -s flag: # vmstat -s 76176 free swap space 8817 demand zero and demand fill pages 379 pages on swap 4649 pages in cache 5773 pages on file 22191 protection fault 1503 pages are freed 5 success in swapping out a process 0 fail in swapping in a process 2 success in swapping in a process 18 swapping out a region 9 swapping in a region 1145913 cpu context switches 7927061 system calls The line by line explanation of the above output: 76176 free swap space The terms "swap space" and "swapping" should be fam- iliar to Unix System Administrators. Swapping a mechanism by which operating systems implement vir- tual memory. When the operating system needs addi- tional space in RAM to run a program, it gets the memory needed by swapping out an entire other pro- cess. The line above is a measure of the free space available on the swap device (ie. the hard disk). 8817 demand zero and demand fill pages This line indicates the number of dynamically allo- cated pages (ie. those created with malloc(S)). 379 pages on swap CPU's such as the 386, 486 and the Pentium have built-in support for paging. Paging is a mechanism used by operating systems to increase the size of virtual memory by keeping the data being "paged" in RAM as opposed to swapping it out to the swap dev- ice. This line reports a measure of the number of of pages that the operating system could not put into RAM and which had to be put on the swap device. 4649 pages in cache The operating system maintains a cache in which it stores "pages" of data. The line above reports the total number of "pages" of data currently in the cache (ie. in RAM). 5773 pages on file When a program is run, its code is loaded into the text segment (as opposed to a program's data which is loaded into the data segment). Since self- modifying code is not allowed, paging can operate more efficiently by _not_ writing text pages to the disk. After all, why should the operating system write out data that we're guaranteed isn't going to change? Text pages needn't be written to the swap space. Rather, text pages are simply read from the filesystem when needed. This line is a measure of the number of such pages. 22191 protection fault When a "page" is not in the cache, it must be brought in from the swap device. This condition is called a "protection fault." The line above is a measure of the number of protection faults. 1503 pages are freed When a "page" of virtual memory is no longer needed by a process, then it is freed. This line reports the number of pages that are currently free. 5 success in swapping out a process If there is not sufficient RAM available for the current process, multi-user operating systems sys- tems may swap out another process during a context switch to free space for the current process. The term "context switch" refers to the process of tak- ing the CPU time away from the current process, put- ting it into a queue in which it will wait till it gets another turn (ie a "time slice"), and giving the next time slice to another process. The line above reports the number of successful attempts in swaping out processes. 0 fail in swapping in a process If a process can not be successfully swapped in, then the line above will reflect the number of times that this error condition has happened. 2 success in swapping in a process This line indicates the number of successful attempts to swap in a process have occured. 18 swapping out a region The term "region" is an AT&Tism. It does not refer to physical memory, but rather it is an abstraction just above that. The operating system maintains a table of virtual memory that is divided into regions. The operating system may swap out a region. This line indicates the number of success- ful attempts to do so. 9 swapping in a region This line indicates the number of successful attempts to swap in a region have occured. 1145913 cpu context switches This line reports the number of context switches that have occured. 7927061 system calls The operating system has a number of special func- tions that programs may call. These are called "system calls". System calls enable programs to request that the kernel perform some privileged operation such as reading from the hard disk drive. The line above reports the total number of system calls that have been made. C) pstat The portions of pstat(C) that are useful when tuning kernels have been discussed earlier in this article. D) crash The crash(ADM) utility has been discusses earlier in this article. Readers interested in more information should read in the crash(ADM) manual page about the following crash(ADM) commands: buffer, file, inode, lck, od, region, strstat, user, and var. Section IX .................... Where to Go for More Information The following manual pages provide information necessary to tune systems: - sar(ADM) - swap(ADM) - configure(ADM) - crash(ADM) - netstat(NADM) - pstat(C) - vmstat(C) - mtune(F) - stune(F) - mdevice(F) - sdevice(F) - boot(HW) - System Administrator's Guide, "Tuning system performance" There is a public domain performance monitoring tool written by Warren Tucker. SCO makes this tool available through our bul- letin board. The tool is called "u386mon" and is available on sosco in the TLS directory. u386mon is TLS012. The files in the TLS directory are offered for experimental or educational use. They are NOT supported by SCO Support depart- ment. In some cases, these are components or updates to exist- ing SCO Software; other material may be educational or supple- mentary documentation. Stallion Technologies has a performance monitoring tool avail- able called "Monitor". In addition, they also have a disk optimizer called Crocodile. Stallion can be reached at (800) 347-7979 or (408) 395-5775. There are also a number of benchmark programs available through third party vendors such as the Dhrystone 2 benchmark test suite and the AIM benchmark test suite. These can be obtained via anonymous ftp from ftp.netcom.com. PTR Prentice Hall has released a book titled "The SCO Perfor- mance Tuning Handbook" (ISBN 0-13-102690-9). This book is writ- ten by two Senior Kernel Engineers at The Santa Cruz Operation, Gina Miscovich and David Simons. O'Reiley and Associates has a book on kernel tuning called "Sys- tem Performance Tuning" (ISBN 0-937175-60-9). SCO Training offers courses in Unix internals. You can reach SCO Training at (800) SCO-UNIX (800 726-8649) or (408) 425-7222. In addition to SCO Training, many universities offer courses in Unix internals. You local universities may be an excellent source of educational opportunities and Unix gurus. Section X .................... Contacting SCO Technical Support SCO Technical Support's policy on kernel tuning is __________________________. When requesting kernel tuning assistance from SCO Technical Sup- port, we request that you send in the following information: - The output of "sar -A". Please make certain that the data reported includes the time frame in which your sys- tem was experiencing performance problems. - The files /etc/conf/cf.d/mtune and /etc/conf/cf.d/stune. - A _complete_ description of the hardware involved. In particular, detail the type of CPU, the amount of RAM installed, the amount of external memory cache installed, the type of hard disk subsystem installed (eg. SCSI, ESDI, IDE, ST-506, etc.), and a description of the SCSI subsystem if present (eg. what devices are installed on the SCSI bus and how are the SCSI target ID's set for each device on the SCSI bus?). - A _complete_ description of the software involved. In particular, list the release of SCO operating system that you are using, the release of SCO's TCP/IP that you are using, any applications installed on the system, any databases installed, and any third party device drivers installed. Failure to provide this detailed information reduces SCO's abil- ity to address your performance issues. Appendix A .................... Filesystem Internals When a user opens a file, the operating system records this information in two places. The first is a table in the operat- ing system called the "User File Descriptor Table". Simply put, this table records the files that a user has open. The other table, the "File Table", records all the files that are open on the system. The User File Descriptor Table gets its name from the fact that when you open a file with the open(S) system call, you get back a "file descriptor". A file descriptor is simply an index into the User File Descriptor Table for a particular user. We'll return to the concept of a file descriptor shortly. User File Descriptor File Table (user #1) Table _____ _____ | | | | 0 | | | | |_____| |_____| Note that there is only one File Table in | | | | the system. This table records all open 1 | | | | files, system wide. |_____| |_____| | | | | In contrast, there is a User File Descriptor 2 | | -->| | Table for each user logged into the system. |_____| / |_____| This table records the files that each user | | / | | has opened. A user's table cannot be 3 | o--|--- | | accessed by any other user. |_____| |_____| | | | | 4 | o--|------->| | |_____| |_____| | | | | 5 | | | | |_____| |_____| | | User File | | Descriptor |_____| Table (user #2) | | _____ | | | | |_____| 0 | | | | |_____| | | | | |_____| 1 | | | | |_____| | | | | |_____| 2 | | | | |_____| --->| | | | / |_____| 3 | o--|--- | | |_____| | | | | |_____| 4 | | | | |_____| --->| | <--- In the example below, we assume that this is | | / |_____| the File Table entry for the file "my_file". 5 | o--|--- | | |_____| | | |_____| Returning to our discussion of the "file descriptor" concept, consider the illustration above. Let's suppose that we are the user using the second User File Descriptor Table. If that user runs a program which opens a file, it will issue a statement such as: fd=open("my_file"); As stated earlier, the operating system will open the file and record this fact in the File Table. This is illustrated above. The open(S) system call will return a value known as the file descriptor. Simply put, the file descriptor is an index into the user's User File Descriptor Table. In our example above, the open(S) system call returned a file descriptor of 5. Now when the user's program wants to read from the file, it passes the file descriptor as an argument to the read(S) sys- tem call: count=read(fd,buffer,size); The statement above would read in the file pointed to by file descriptor number 5. The amount of data read would be lim- ited to "size". It would place the data read into "buffer" and would set "count" equal to the number of bytes success- fully read. The write(S) system call works in a similar fashion. The notion of a file, however, is only provided as a convi- ence to the user. Unix implements a layer under the notion of a file. This mechanism is called the inode. Simply put, an inode is a data structure that contains (amongst other things) the size of a file, its permissions, and pointers to the data blocks on the disk. Each time a file is opened, the system locates the file in the directory tree, finds the inode associated with that file, and opens the data blocks that are listed in the inode. So a more complete version of the diagram above follows: User File Descriptor File Inode Table (user #1) Table Table _____ _____ _____ | | | | | | 0 | | | | | | |_____| |_____| |_____| | | | | | | 1 | | | | -->| | |_____| |_____| / |_____| | | | | / | | 2 | | -->| o--|--- | | |_____| / |_____| |_____| | | / | | | | 3 | o--|--- | | | | |_____| |_____| |_____| | | | | | | 4 | o--|------->| o--|------->| | |_____| |_____| |_____| | | | | | | 5 | | | | | | |_____| |_____| |_____| | | | | User File | | | | Descriptor |_____| |_____| Table (user #2) | | | | _____ | | | | | | |_____| |_____| 0 | | | | | | |_____| | | | | | | |_____| |_____| 1 | | | | | | |_____| | | | | | | |_____| |_____| 2 | | | | | | |_____| --->| o--|------->| | | | / |_____| |_____| 3 | o--|--- | | | | |_____| | | -->| | <-- In our example, this Inode Table | | |_____| / |_____| entry would contain the inode 4 | | | | / | | information for the file "my_file". |_____| --->| o--|--- | | | | / |_____| |_____| 5 | o--|--- | | | | |_____| | | | | |_____| |_____| Appendix B .................... Data Structures A) Linked Lists Suppose we wanted to store a number of first names. For sim- plicity, we will limit this example by restricting the names to those beginning with the letters "a" through "g". A linked list is a legitimate but slow method of storing data such as names. Pictorially, a linked list can be illustrated as follows: -------- -------- -------- -------- -------- | Albert |-->| Greg |-->| Bela |-->| Chip |-->| Don | -------- -------- -------- -------- -------- The reason that this method tends to be slow is that to find a given name in the linked list, you can only look at one name at a time. That is, you can't jump from the beginning of the list (ie. "Albert") right to a name. To illustrate the slow nature of link lists, consider what happens when you try to look up a name that isn't in the table. You will have to look at every name in the list to simply determine that the name you're looking for isn't in the list! You could overcome this somewhat by alphabetizing the linked list as illustrated below: -------- -------- -------- -------- -------- | Albert |-->| Bela |-->| Chip |-->| Don |-->| Greg | -------- -------- -------- -------- -------- But on average, you will still have to look at half of the names before finding the one you are looking for. And this is assuming that you look up a given name just as often as any other name. B) Hash Tables With the limitations of linked lists, it is fairly clear why alternatives are welcome. Hash tables are a useful alterna- tive because they allow you to limit your search to a smaller subset of the data (ie. the names, in our example). One form of hash table works like this: _____ | | -------- A | o--|---->| Albert | |_____| -------- | | -------- -------- B | o--|---->| Bela |-->| Brian | |_____| -------- -------- | | -------- C | o--|---->| Chip | |_____| -------- | | -------- -------- D | o--|---->| David |-->| Don | |_____| -------- -------- | | E | o--|----> |_____| | | F | o--|----> |_____| | | -------- G | o--|---->| Greg | |_____| -------- The hash table is illustrated above as a verticle rectangle with seven slots labled "a" through "g". When you want to put a new name in this hash table, you will put it onto one of the chains, but to figure out which chain, we must use a "hash function". In this simple example, you could state the hash function as "use the first letter in the name to figure out which slot to use." When you want to look up a name, you know which slot to look in based on the hash function. Once you know the slot, you just look on that chain. If the name is already in the hash table, you know that the name will be on that chain. Since the hash function allows you to jump right to the correct chain (thereby eliminating many other names), hash tables can be much faster than looking at one long chain. But for a hash table to function well, the hash function must be fast. Another thing that is important to do is to make sure that the data in the hash table is evenly distributed. This can be accomplished, in part, by having a good hash function. But another thing that is done to make hash tables work well is to make the size of the hash table a prime number. To understand why prime numbers are well suited when choosing the size of a hash table, let us modify our example slightly. Suppose we continue using the same hash table size (ie. seven slots) but we allow any name (ie. not limited to names begin- ning with "a" through "g"). One way you could do that is illustrated below: _____ | | -------- -------- A,H,O,W | o--|---->| Albert |-->| Henry | |_____| -------- -------- | | -------- -------- B,I,P,X | o--|---->| Bela |-->| Brian | |_____| -------- -------- | | -------- C,J,Q,Y | o--|---->| Chip | |_____| -------- | | -------- -------- D,K,S,Z | o--|---->| David |-->| Don | |_____| -------- -------- | | -------- E,L,T | o--|---->| Tracie | |_____| -------- | | F,M,U | o--|----> |_____| | | -------- -------- G,N,V | o--|---->| Greg |-->| Nancy | |_____| -------- -------- Those familiar with modula arithmetic should spot this as a case where the modula operator is very handy. The symbol for the modula operator is "%". Simply put, the modula operator works like this: X % Y = the remainder of X / Y. For exam- ple: 1 % 3 = 1 2 % 3 = 2 3 % 3 = 3 4 % 3 = 1 5 % 3 = 2 6 % 3 = 0 Using the modula operator, you could state the hash function for the above illustration as follows: "hash_table_slot = first_initial % hash_table_size" Because modula arithmetic is used so frequently in hash tables and since modula arithmetic is more random when Y is prime, the size of hash tables is commonly set to a prime number. Appendix C .................... Cache Caching is often best understood when explained through an analogy. Imagine a librarian working at an enormous library. In this library there are millions of books which, while organized well, can take a while to locate due to the sheer number of books contained in the collection. Some books are more popular than others, and as a consequence the librarian receives many requests for "Gone with the Wind", "War and Peace" and other classics. Rather than run- ning all over the library to grab these popular books, the librarian keeps the books on his/her desk. This desk, like a cache, is a area where a small number of items can be stored and retrieved quickly.