why is my system slow?

June 9th, 2008

Computer users expect their systems to work well at all times, but unfortunately this isn't always the case. If your system becomes slow, there certainly is something you can do about it. This article will help you understand what's happening on the system, whether it's the computer in front of you or a system you're accessing remotely.

When we say the system is slow we mean that it isn't responding to our input in a reasonable time, or taking too long to complete a task. This can happen when there is *another* program using too many system resources, starving *your* program of resources, causing it to run slowly

There are three common ways in which this can happen, and all three of these scenarios can be equally crippling and put your system into a state where it seems to be frozen. None of these situations are harmful to your system (ie. they make it slow until the problem is resolved, but they don't damage anything).

  1. A program is monopolizing the cpu.
    A program is using all of the cpu cycles, blocking access to the cpu to other programs. This may be intentional (programs that do heavy processing) or accidental (programs get stuck repeating something over and over).
  2. You're nearly out of physical memory.
    You are either running too many programs, or programs that use too much memory. Your physical memory is almost entirely exhausted, and the running programs are using the harddisk as fallback memory, which is very slow.
  3. A program is doing heavy I/O.
    You may be copying a large file, for instance. The program that is doing the copying is requesting lots of data from the harddrive, but while it's doing this the cpu is actually waiting for this data to be read from the harddrive, blocking access to other programs.

The impact of both cpu heavy and I/O heavy programs can be mitigated by tuning the kernel to be more responsive. If you are running a kernel supplied by one of the major distributions (Ubuntu, Fedora etc), then it's already finely tuned for your system, but even so you may still run into these problems sometimes, on your own system or some other one.

1. Cpu bound programs

How it happens

The most common cycle for a program is to 1) accept some input, 2) do some work, 3) give some output. And this sequence is repeated for as long as the program is running. Typically, the work that has to be done takes a very short time compared to the time spent waiting for input, which gives all the other running programs a chance to use the cpu in the meantime.

If a program instead takes no input and gives no output and only does work all the time, then there is much less time in which the cpu is free for other programs to use. This will make the whole system very slow, because all programs have to wait a long time to get their turn.

Demonstration

It's very easy to simulate this scenario. Here is an example. This program is a loop, which checks the condition (which is always true) and then performs the action in the loop (running the command true, which does nothing). No input, no output.

$ while true; do true; done

Hit Ctrl+C when you've had enough.

How to detect it

The easiest way to check for a cpu bound program is to use top. See if there is a program that's using almost 100% of the cpu. To be sure that it doesn't just occasionally spike leave it running for a while (or hit <space> to refresh the display a few times).

$ top
Cpu0  : 98.2%us,  1.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  :  1.4%us,  0.5%sy,  0.0%ni, 98.2%id,  0.0%wa,  0.0%hi,  0.0%si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26210 alex      20   0 19428 2368 1492 R   99  0.1   1:14.27 bash
 4075 alex      20   0  606m 195m  28m S    0  6.5  23:19.58 firefox
 6337 root      20   0  505m 116m 6716 S    0  3.9  34:30.25 Xorg

Here we see that cpu0 is idle (free) 0% of the time, which means it is as busy as it possibly can be. 98.2% is due to user programs (ie. the program we just demonstrated). And when we look in the list of programs, we see that bash (the shell in which we ran our one-liner) is using 99% of the cpu.

What you can do about it

There are two cases of cpu bound programs - the intentional and the accidental.

If you're running a program that does a lot of work on purpose, for instance a video encoding task, then the more cpu it uses the quicker it will finish, so using a lot of cpu is good. But if it's making your whole system slow, then you can lower the priority at which the programs gets access to the cpu, so that it only uses as much cpu as the other programs leave available. To do this, use the renice command:

$ renice 20 26210

The first number you give to renice describes how "nice" you are being to other programs, on a scale from -20 (very selfish) to 20 (very nice). The other number is the process id (pid) of the program, which is listed by top above. Doing this will probably still make bash use almost 100% of the cpu, but not at the expense of the other programs.

On the other hand, if the program isn't supposed to be using this much cpu, then it's either a bug in the program (certain versions of firefox used to spike to 100% cpu) or it's just heavier than the cpu can handle. You can still renice the program, but this will make your system more responsive at the expense of the program (so for instance, firefox may become unusable). The last resort is to kill it:

$ kill 26210

On multi-cpu systems this is less of a problem, because most programs can only use one cpu, which leaves the other cpus to serve all the other programs and keep your system responsive.

2. Physical memory is almost full

How it happens

There are two types of memory on your system: physical (RAM) and virtual (swap). Physical memory is relatively small and very quick to access, while virtual memory is just part of your harddisk being used as extra memory (very slow to access). As long as all the running programs can store their work in physical memory, everything is fine. (This is why it's good to have a lot of it.)

But once you fill all of the physical memory, the operating system will start moving some of the work into swap (onto the harddisk) to make space for new programs. You probably won't notice that this is happening. But when you switch from one program to another, and the second program has its work in swap, this work now has to be moved back into memory, and some of the stuff currently in memory has to be moved out to swap. This will definitely be noticeable and will make your system slow until it's finished.

The effect of this situation is that your system will feel normal for some of the time (when using the same program), and then very unresponsive from time to time (when switching between programs).

Demonstration

This effect is best demonstrated with a desktop program. Start the gimp and create a canvas so large that it exceeds your available physical memory. For instance, try a canvas 10,000x10,000 pixels (gimp will tell you how much memory it needs to create it). It will probably take a while to create the canvas, so just let it finish. (In order to make room for this image in memory, other programs are being moved into swap, this is called swapping.) Then do some painting on the canvas. Now switch back to another program (firefox, for instance). You should now sense that your system is slow to respond, but this is temporary for as long as it takes to restore firefox into memory.

How to detect it

It's a good idea to know how much memory your system uses under normal conditions, that way you can keep an eye on things. The command free -m will tell you about the state of your memory:

$ free -m
             total       used       free     shared    buffers     cached
Mem:          3015       1640       1375          0          9        124
-/+ buffers/cache:       1505       1509
Swap:         2878          0       2878

Here we see that we have 3015mb of physical memory, half of which is free. We also have almost as much swap memory, but none of that is in use.

After we create our huge canvas with the gimp we can run free again and see what has changed.

$ free -m
             total       used       free     shared    buffers     cached
Mem:          3015       2993         22          0          2        957
-/+ buffers/cache:       2033        982
Swap:         2878        819       2059

We're now using 819mb of swap memory, so clearly we've exceeded the capacity of physical memory.

While swapping takes place top will also show that there is a lot of I/O activity taking place. (Press Shift+M to sort the program listing by memory use.)

$ top
Cpu(s):  4.0%us,  1.6%sy,  0.0%ni, 51.0%id, 42.1%wa,  0.6%hi,  0.8%si
Mem:   3088224k total,  3043496k used,    44728k free,     3548k buffers
Swap:  2947888k total,   998812k used,  1949076k free,   957956k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2579 alex      20   0 1593m 1.4g  13m S    0 47.0   0:22.26 gimp

Here we see that user and system activity adds up to only 6%, the cpu is idle 51% of the time, but spending 42% of the time waiting for I/O operations (ie. harddisk activity). And among the programs, the gimp alone is using 1.4gb of memory.

What you can do about it

If you notice that your system becomes unresponsive when switching between programs, you have a pretty good idea that it's because of swapping. There is no way to make swapping faster, so what you should do is less swapping. Keep an eye on how much memory your system is using and you should also have an idea about the memory use of various programs (the big ones). When you notice heavy swapping, the easiest way to fix it is to shut down the program that's using the most memory. When you do this you free up physical memory. The data in swap will not automatically be moved into memory (because this is expensive), but you should notice that your system is performing normally again.

3. IO bound programs

How it happens

Input/Output (I/O, also written io) is an umbrella term for *everything* that happens on your system that does not involved the cpu, the memory or the video card (gpu). When talking about performance, io usually means the harddrive, because that's what your system uses most heavily (and therefore what we spend the most time waiting for), but it can also refer to your network card, your cdrom drive, your keyboard etc.

A program running on the cpu, which does a lot of io (such as reading/writing large files), will spend a lot of time waiting for this io to complete. This leaves the cpu busy and other programs have less opportunity to run. The effect is that the whole system may become consistently unresponsive until the heavy io is completed.

Demonstration

We can demonstrate the effect of heavy io by reading and writing a lot of data to the harddrive. Here we find the device that your root partition is on (probably /dev/sda1) and then read 5gb from it, writing it to a file /tmp/dummy (you may want to check that you have enough free space).

$ device=`mount | grep " / " | awk '{ print $1 }'`
$ sudo dd if=$device of=/tmp/dummy bs=5120 count=1048576

This should take around 10 minutes, so you can see how your system behaves while this is happening.

How to detect it

We can detect heavy io with top.

$ top
Cpu0  : 13.7%us,  6.5%sy,  0.0%ni,  0.0%id, 79.0%wa,  0.8%hi,  0.0%si
Cpu1  :  4.7%us,  3.9%sy,  0.0%ni, 73.2%id, 17.3%wa,  0.8%hi,  0.0%si

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30956 root      20   0 10388 1812  640 D    5  0.1   0:02.76 dd

Here we see that cpu0 is spending 79% of its time waiting for io. In the list of programs we see the program dd that we ran. It's only using 5% cpu, which seems to conflict with the number 79%, but then we see it has status D, which means waiting for io. The reason for this is that while dd is only doing actual work on the cpu 5% of the time, it's still using a lot of cpu time because of all the io.

Most io is harddrive io, but we can see if this is the case with the tool atop, which is similar to top.

$ atop
CPU | sys     14% | user     21% | irq       2% | idle     39% | wait    125% |
cpu | sys     10% | user     11% | irq       2% | idle      0% | cpu000 w 78% |
cpu | sys      4% | user     10% | irq       0% | idle     38% | cpu001 w 48% |
DSK |         sda | busy     98% | read    1371 | write   1011 | avio    4 ms |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/4
30956   0.60s   0.00s     0K     0K 113.4M 114.0M  **   * D   6% dd

Here we again see that the program dd has status D (io wait), and cpu0 is spending 78% waiting for io. In addition, we see that the harddrive sda (which is the one we are reading and writing to, /dev/sda) is busy 98% of the time. So we know that it's the harddrive that's responsible for using 78% of cpu0.

What you can do about it

If your system becomes unresponsive because of io, it is because the cpu is not being shared among the programs in a way that allows them all to stay responsive. So the answer is to prioritize certain programs over others. ionice is the io counterpart to nice.

$ ionice -p30956 -n7

Here we are telling ionice first the process id of the program, and then the io priority it should have, on a scale from 0 (highest priority) to 7 (lowest).

:: random entries in this category ::