The Linux Partition mini-HOWTO
by Kristian Koehntopp, kris@koehntopp.de

$Id: howto.txt,v 1.3 97/01/24 13:08:02 kris Exp Locker: kris $

$Log:	howto.txt,v $
Revision 1.3  97/01/24  13:08:02  kris
Some minor spelling corrections.

Revision 1.2  96/11/29  13:42:04  kris
Revised and edited for second public release.
Added:
- Introduction
- Introductory chapter on partitions.
- Example
Modified:
- Turned File Systems and Fragmentation into an extra chapter.

Revision 1.1  96/11/29  10:54:06  kris
Initial revision

WHAT IS THIS?
=============

This is a Linux Mini-HOWTO text. A Mini-HOWTO is a small text
explaining some business related to Linux installation and
maintenance tutorial style. It's mini, because either the text
or the topic it discusses are to small for a real HOWTO or even
a book. A HOWTO is not a reference:  that's what manual pages
are for.

WHAT IS IN IT? AND RELATED HOWTO DOCUMENTS
==========================================

This particular Mini-HOWTO teaches you how to plan and layout
disk space for your Linux system. It talks about disk hardware,
partitions, swap space sizing and positioning considerations,
file systems, file system types and related topics. The intent
is to teach some background knowlegde, so we are talking mainly
principles and not tools in this text. 

Ideally, this document should be read before your first
installation, but this is somehow difficult for most people.
First timers have other problems than disk layout optimization,
too. So you are probably someone who just finished a Linux
installation and is now thinking about ways to optimize this
installation or how to avoid some nasty miscalculations in the
next one. Well, exspect some desire to tear down and rebuild
your installation when you are finished with this text. :-)


This Mini-HOWTO limits itself to planning and layouting disk
space most of the time. It does not discuss the usage of fdisk,
LILO, mke2fs or backup programs. There are other HOWTOs that
address these problems. Please see the Linux HOWTO Index for
current information on Linux HOWTOs. There are instructions for
obtaining HOWTO documents in the index, too.

For instructions and considerations regarding disks with more
that 1024 cylinders, see "Linux Large Disk mini-HOWTO", Andries
Brouwer <aeb@cwi.nl>.

For instructions on disk spanning and striping, see "Linux
Multiple Disks Layout mini-HOWTO", by Gjoen Stein
<gjoen@nyx.net>.

For instructions on limiting disk space usage per user
(quotas), see "Linux Quota mini-HOWTO", by Albert M.C. Tam
<bertie@scn.org>.

Currently, there is no general document on disk backup, but
there are several documents with pointers to specific backup
solutions. See "Linux ADSM Backup mini-HOWTO", by Thomas Koenig
<Thomas.Koenig@ciw.uni-karlsruhe.de> for instructions on
integrating Linux into an IBM ADSM backup environment. See
"Linux Backup with MSDOS mini-HOWTO", by Christopher Neufeld
<neufeld@physics.utoronto.ca> for information about MS-DOS
driven Linux backups.

For instructions on writing and submitting a HOWTO document, see
the Linux HOWTO Index, by Greg Hankins <gregh@sunsite.unc.edu>.

Browsing through /usr/src/linux/Documentation can be very
instructive, too. See ide.txt and scsi.txt for some background
information on the properties of your disk drivers and have a
look at the file systems/ subdirectory.


WHAT IS A PARTITION ANYWAY?
===========================

When PC hard disks were invented people soon wanted to install
multiple operating systems, even if their system had only one
disk.  So a mechanism was needed to divide a single physical
disk into multiple logical disks. So that's what a partition
is: A contiguous section of blocks on your hard disk that is
treated like a completely seperate disk by most operating
systems.

It is fairly clear that partitions must not overlap: An
operating system will certainly not be pleased, if another
operationg system installed on the same machine were
overwriting importating information because of overlapping
partitions. There should be no gap between adjacent partitions,
too. While this constellation is not harmful, you are wasting
precious disk space by leaving space between partitions.

A disk need not be partitioned completely. You may decide to
leave some space at the end of your disk that is not assigned
to any of your installed operating systems, yet. Later, when it
is clear which installation is used by you most of the time,
you can partition this left over space and put a file system on
it.


Partitions can not be moved nor can they be resized without
destroying the file system contained in it. So repartitioning
usually involves backup and restore of all file systems touched
during the repartitioning.  In fact it is fairly common to mess
up things completely during repartitioning, so you should back
up anything on any disk on that particular machine before even
touching thinks like fdisk.

Well, some partitions with certain file system types on them
actually CAN be split in two without losing any data (if you
are lucky). For example there is a program called "fips" for
splitting MS-DOS partitions into two to make room for a Linux
installation without having to reinstall MS-DOS. You are still
not going to touch these things without carefully backing up
everything on that machine, aren't you? 

Tapes are your friend for backups. They are fast, reliable and
easy to use, so you can make backups often, preferably
automatically and without hassle.

Step on soapbox: And I am talking about real tapes, not that
disk controller driven ftape crap. Consider buying SCSI: Linux
does support SCSI natively. You don't need to load ASPI
drivers, you are not losing precious HMA under Linux and once
the SCSI host adapter is installed, you just attach additional
disks, tapes and CD-ROMs to it. No more I/O addresses, IRQ
juggling or Master/Slave and PIO-level matching. 

Plus: Proper SCSI host adapters give you high I/O performance
without much CPU load. Even under heavy disk activity you will
experience good response times. If you are planning to use a
Linux system as a major USENET news feed or if you are about to
enter the ISP business, don't even think about deploying a
system without SCSI. Climb of soapbox.


The number of partitions on an Intel based system was limited
from the very beginning: The original partition table was
installed as part of the boot sector and held space for only
four partition entries.  These paritions are now called primary
partitions. When it became clear that people needed more
partitions on their systems, extended partitions were invented.
The number of extended partitions is not limited: Each extended
partition contains a pointer to the next extended partition,
so you can have a potentially unlimited chain of partition
entries.

For compatibility reasons, the space occupied by all extended
partitions had to be accounted for. If you are using extended
partitions, one primary partition entry is marked as "extended
partition" and its starting and ending block mark the area
occupied by your extended partitions. This implies that the
space assigned to all extended partitions has to be contiguous.

For certain reasons having to do with device numbering, Linux
cannot handle more than 4 extended partitions per drive. So in
Linux you have 4 primary partitions (3 of the useable, if you
are using extended partitions) and at most 4 extended
partitions.

In Linux, partitions are represented by device files, /dev/hd*
for IDE disks and /dev/sd* for SCSI disks. Disks are numbered
a, b, c and so on, so /dev/hda is your first IDE disk and
/dev/sda is your first SCSI disk. Both devices represent raw
disks, starting at block one. Writing to these devices with the
wrong tools will destroy the master boot loader and partition
table on these disks, rendering all data on this disk unusable
or making your system unbootable. Know what you are doing and,
again, back up before you do it.

Primary partitions on a disk are 1, 2, 3 and 4. So /dev/hda1 is
the first primary partition on the first IDE disk and so on.
Extended partitions have numbers 5 and up, so /dec/sdb5 is the
first extended partition on the second SCSI disk.

Each partition entry has a starting and an ending block address
assigned to it and a type. The type is a numerical code (a
byte) which designates a particular partition to a certain type
of operating system. For the benefit of computing consultants
partition type codes are not really unique, so there is always
the probability of two operating systems using the same type
code.

Linux reserves the type code 0x82 for swap partitions and 0x83
for "native" file systems (that's ext2 for almost all of you).
The once popular, now outdated Linux/Minix file system used the
type code 0x81 for partitions. OS/2 marks it's partitions with
a 0x07 type and so does Windows NT's NTFS. MS-DOS allocates
several type codes for its various flavors of FAT file systems:
0x01, 0x04 and 0x06 are known.  DR-DOS used 0x81 to indicate
proteced FAT partitions, creating a type clash with Linux/Minix
at that time, but neither Linux/Minux nor DR-DOS are widely
used any more. The primary partition which is used as a
container for extended partitions has a type of 0x05, by the
way.

Partitions are created and deleted with the "fdisk" program.
Every self respecting operating system program comes with an
fdisk and traditionally it is even called fdisk (or FDISK.EXE)
in almost all OSes. Some fdisks, noteable the DOS one, are
somehow limited when they have to deal with other operating
systems partitions. Such limitations include the complete
inability to deal with anything with a foreign type code, the
inability to deal with cylinder numbers above 1024 and the
inability to create or even understand partitions that do not
end on a cylinder boundary. For example, the MS-DOS fdisk can't
delete NTFS partitions, the OS/2 fdisk has been known to
silently "correct" partitions created by the Linux fdisk that
do not end on a cylinder boundary and both, the DOS and the
OS/2 fdisk, have had problems with disks with more than 1024
cylinders (see the "large-disk" Mini-Howto for details on such
disks).


WHAT PARTITIONS DO I NEED?
==========================

Okay, so what partitions do you need? Well, some operating
systems do not believe into booting from extended partitions
for reasons that are beyond the scope of any sane mind. So you
probably want to reserve your primary partitions as boot
partitions for your MS-DOS, OS/2 and Linux or whatever you are
using. Remember that one primary partition is needed as a
container for the rest of your disk with extended partitions.

Booting operating systems is a real-mode thing involving BIOSes
and 1024 cylinder limitations. So you probably want to put all
your boot partitions into the first 1024 cylinders of your hard
disk, just to avoid problems. Again, read the "large-disk"
Mini-Howto for the gory details.


To install Linux, you will need at least one partition. If the
kernel is loaded from this partition (for example by LILO),
this partition must be readable by your BIOS. If you are using
other means to load your kernel (for example a boot disk or the
LOADLIN.EXE MS-DOS based Linux loader) the partition can be
anywhere. In any case this partition will be of type 0x83
"Linux native".

Your system will need some swap space. Unless you swap to files
you will need a dedicated swap partition. Since this partition
is only accessed by the Linux kernel and the Linux kernel does
not suffer from PC BIOS deficiencies, the swap partition may be
positioned anywhere.  I recommed using an extended partition
for it (/dev/?d?5 and higher).  Dedicated Linux swap partitions
are of type 0x82 "Linux swap".

These are minimal partition requirements. It may be useful to
create more partitions for Linux. Read on.



HOW LARGE SHOULD MY SWAP SPACE BE?
==================================

If you have decided to use a dedicated swap partition, which is
generally a Good Idea [tm], follow these guidelines for
estimating its size:

1. In Linux RAM and swap space add up (This is not true for all
   Unices). For example, if you have 8 MB of RAM and 12 MB swap
   space, you have a total of about 20 MB virtual memory.

2. When sizing your swap space, you should have at least 16 MB
   of total virtual memory. So for 4 MB of RAM consider at least
   12 MB of swap, for 8 MB of RAM consider at least 8 MB of
   swap.

3. In Linux, a single swap partition can not be larger than 128
   MB.  That is, the partition may be larger than 128 MB, but
   excess space is never used. If you want more than 128 MB of
   swap, you have to create multiple swap partitions.

4. When sizing swap space, keep in mind that too much swap space
   may not be useful at all.

   Every process has a "working set". This is a set of in-memory
   pages which will be referenced by the processor in the very
   near future. Linux tries to predict these memory accesses
   (assuming that recently used pages will be used again in the
   near future) and keeps these pages in RAM if possible. If the
   program has a good "locality of reference" this assumption
   will be true and prediction algorithm will work.

   Holding a working set in main memory does only work if there
   is enough main memory. If you have too many processes running
   on a machine, the kernel is forced to put pages on disk that
   it will reference again in the very near future (forcing a
   page-out of a page from another working set and then a
   page-in of the page referenced). Usually this results in a
   very heavy increase in paging activity and in a sustantial
   drop of performance. A machine in this state is said to be
   "thrashing" (For you german readers: That's "thrashing"
   ("dreschen", "schlagen", "haemmern") and not trashing
   ("muellen")).

   On a thrashing machine the processes are essentially running
   from disk and not from RAM. Expect performance to drop by
   approximately the ratio between memory access speed and disk
   access speed (7 ms vs. 70 ns is factor 100).

   A very old rule of thumb in the days of the PDP and the Vax
   was that the size of the working set of a program is about
   25% of its virtual size. Thus it is probably useless to
   provide more swap than three times your RAM.

   But keep in mind that this is just a rule of thumb. It is
   easily possible to create scenarios where programs have
   extremely large or extremely small working sets. For example,
   a simulation program with a large data set that is accessed
   in a very random fashion would have almost no noticeable
   locality of reference in its data segment, so its working set
   would be quite large.

   On the other hand, an xv with many simultaneously opened
   JPEGs, all but one iconified, would have a very large data
   segment. But image transformations are all done on one single
   image, most of the memory occupied by xv is never touched.
   The same is true for an editor with many editor windows
   where only one window is being modified at a time.  These
   programs have - if they are designed properly - a very high
   locality of reference and large parts of them can be kept
   swapped out without too severe performance impact.

   One could suspect that the 25% number from the age of the
   command line is no longer true for modern GUI programs
   editing multiple documents, but I know of no newer papers
   that try to verify these numbers.

So for a configuration with 16 MB RAM, no swap is needed for a
minimal configuration and more than 48 MB of swap are probably
useless. The exact amount of memory needed depends on the
application mix on the machine (what did you expect?).


WHERE SHOULD I PUT MY SWAP SPACE?
=================================

1. Mechanics are slow, electronics are fast.

   Modern hard disks have many heads. Switching between heads of
   the same track is fast, since it is purely electronic.
   Switching between tracks is slow, since it involves moving
   real world matter.

   So if you have a disk with many heads and one with less heads
   and both are identical in other parameters, the disk with
   many heads will be faster.

   Splitting swap and putting it on both disks will be even
   faster, though.

2. Older disks have the same number of sectors on all tracks.
   With this disks it will be fastest to put your swap in the
   middle of the disks, assuming that your disk head will move
   from a random track towards the swap area.

3. Newer disks use ZBR (zone bit recording). They have more
   sectors on the outer tracks. With a constant number of rpms,
   this yields a far greater performance on the outer tracks
   than on the inner ones. Put your swap on the fast tracks.

4. Of course your disk head will not move randomly. If you have
   swap space in the middle of a disk between a constantly busy
   home partition and an almost unused archive partition, you
   would be better of if your swap were in the middle of the
   home partition for even shorter head movements. You would be
   even better off, if you had your swap on another otherwise
   unused disk, though.


Summary: Put your swap on a fast disk with many heads that is
not busy doing other things. If you have multiple disks: Split
swap and scatter it over all your disks or even different
controllers. Even better: Buy more RAM.


SOME FACTS ABOUT FILE SYSTEMS AND FRAGMENTATION
===============================================

Disk space is administered by the operating system in units of
blocks and fragments of blocks. In ext2, fragments and blocks
have to be of the same size, so we can limit our discussion to
blocks.

Files come in any size. They don't end on block boundaries.  So
with every file a part of the last block of every file is
wasted. Assuming that file sizes are random, there is
approximately a half block of waste for each file on your disk.
Tanenbaum calls this "internal fragmentation" in "Operating
Systems".

You can guess the number of files on your disk by the number of
allocated inodes on a disk. On my disk

# df -i
Filesystem           Inodes   IUsed   IFree  %IUsed Mounted on
/dev/hda3              64256   12234   52022    19%  /
/dev/hda5              96000   43058   52942    45%  /var

there are about 12000 files on / and about 44000 files on news.
At a block size of 1 KB, about 6+22 = 28 MB of disk space are
lost in the tail blocks of files. Had I chosen a block size of
4 KB, I had lost 4 times this space.


Data transfer is faster for large contiguous chunks of data,
though. That's why ext2 tries to preallocate space in units of
8 contigous blocks for growing files. Unused preallocation is
released when the file is closed, so no space is wasted.

Noncontiguous placement of blocks in a file is bad for
performance, since files are often accessed in a sequential
manner. It forces the operating system to split a disk access
and the disk to move the head. This is called "external
fragmentation" or simply "fragmentation" and is a common
problem with DOS file systems. 

ext2 has several strategies to avoid external fragmentation.
Normally fragmentation is not a large problem in ext2, not even
on heavily used partitions such as a USENET news spool. While
there is a tool for defragmentation of ext2 file systems, nobody
ever uses it and it is not up to date with the current release
of ext2. Use it, but do so on your own risk.

The MS-DOS file system is well known for its pathological
managment of disk space. In conjunction with the abysmal buffer
cache used by MS-DOS the effects of file fragmentation on
performance are very noticeable. DOS users are accustomed to
defragging their disks every few weeks and some have even
developed some ritualistic beliefs regarding defragmentation.
None of these habits should be carried over to Linux and ext2.
Linux native file systems do not need defragmentation under
normal use and this includes any condition with at least 5% of
free space on a disk.


The MS-DOS file system is also known to lose large amounts of
disk space due to internal fragmentation. For partitions larger
than 256 MB, DOS block sizes grow so large that they are no
longer useful.

ext2 does not force you to choose large blocks for large file
systems, except for very large file systems in the 0.5 TB range
and above, where small block sizes become inefficient. So
unlike DOS there is no need to split up large disks into
multiple partitions to keep block size down. Use the 1 KB
default block size if possible. You may want to experiment with
a block size of 2 KB for some partitions, but expect to meet
some seldom exercised bugs: Most people use the default.


FILE LIFETIMES AND BACKUP CYCLES AS PARTITIONING CRITERIA
=========================================================

With ext2, Partitioning decisions should be governed by backup
considerations and to avoid external fragmentation from
different file lifetimes.

Files have lifetimes. After a file has been created, it will
remain some time on the system and them be removed. File
lifetime varies greatly throughout the system and is partly
dependent on the pathname of the file. For example, files in
/bin, /sbin, /usr/sbin, /usr/bin and similar directories are
likely to have a very long lifetime: many months and above.
Files in /home are likely to have a medium lifetime: several
weeks or so. File in /var are usually short lived: Almost no
file in /var/spool/news will remain longer than a few days,
files in /var/spool/lpd measure their lifetime in minutes or
less.


For backup it is useful if the amount of daily backup is
smaller than the capacity of a single backup medium. A daily
backup can be a complete backup or an incremental backup.

You can decide to keep your partition sizes small enough that
they fit completely onto one backup medium (choose daily full
backups). In any case a partition should be small enough that
its daily delta (all modified files) fits onto one backup
medium (choose incremental backup and exspect to change backup
media for the weekly/monthly full dump - no unattended
operation possible).

Your backup strategy depends on that decision.

When planning and buying disk space, remember to set aside a
sufficient amount of money for backup! Unbackuped data is
worthless! Data reproduction costs are much higher that backup
costs for virtually everyone!


For performance it is useful to keep files of different
lifetimes on different partitions. This way the short lived
files on the news partition may be fragmented very heavily.
This has no impact on the performance of the / or /home
partition.

A common model creates /, /home and /var partitions as
discussed above. This is simple to install and maintain and
differentiates well enough to avoid adverse effects from
different lifetimes. It fits well into a backup model, too:
Almost noone bothers to backup USENET news spools and only some
files in /var are worth backing up (/var/spool/mail comes to
mind). On the other hand, / changes infrequently and can be
backup up on demand (after configuration changes) and is small
enough to fit on most modern backup media as a full backup
(plan 250 to 500 MB depending on the amount of installed
software). /home contains valuable user data and should be
backup up daily. Some installations have very large /homes and
must use incremental backups.

Some systems put /tmp onto a seperate partition as well, others
symlink it to /var/tmp to achieve the same effect (note that
this can affect single user mode, where /var will be
unavailable and the system will have no /tmp until you create
one or mount /var manually) or put it onto a RAM disk (Solaris
does this for example). This keeps /tmp out of /, a good idea.

This model is convenient for upgrades or reinstallations as
well: Save your configuration files (or the entire /etc) to
some /home directory, scrap your /, reinstall and fetch the old
configurations from the save directory on /home.


AN EXAMPLE
==========

There is this old ISA bus 386/40 sitting on your shelf that you
abandoned two years ago because it no longer cut it. Now you are
planning to turn it into a small X-less server for your
household LAN.

Here is how you do it: Take that 386 and put 16 MB RAM into it.
Add a cheapo EIDE disk, the smallest you can get (800 MB?) and
an ethernet card. Add any graphics card, even an old Hercules
will do, provided you still have a monitor for it. Install
Linux on it and there you have your local NFS, SMB, HTTP,
LPD/LPR and NNTP server as well as your mail router and POP3
server. With an additional modem or ISDN card the machine
becomes your TCP/IP router, too.

Most of the disk space on this machine will go into the /var
directories, /var/spool/mail, /var/spool/news and
/var/httpd/html. We put /var on a separate partition and make
this one large. There will be almost no users on this machine,
so we create only a smallish /home or even better, mount /home
from some other workstation via NFS.

Linux without X plus several locally installed utilities will
be fine with a 250 MB partition as /. The machine has 16 MB of
RAM, but it will be running many servers. 16 MB swap should be
in order, 32 MB should be plenty. We are not short on disk
space, so the machine will get 32 MB.  Out of sentimentality a
MS-DOS partition of some 20 MB is kept on it.  You decided to
import /home from another machine, so the remaining 500+ MB
will end up as /var. This is more than sufficient for a
household USENET news feed.

We get

Device     Mounted on                      Size
/dev/hda1  /dos_c                           25 MB
/dev/hda2  - (Swapspace)                    32 MB
/dev/hda3  /                               250 MB
/dev/hda4  - (Extended Container)          500 MB
/dev/hda5  /var                            500 MB

homeserver:/home /home                     1.6 GB

You are backing up this machine via the network using the tape
in homeserver. Since everything on this machine has been
installed from CD-ROM all you have to save are some
configuration files from /etc, your customized locally
installed *.tgz files from /root/Source/Installed and
/var/spool/mail as well as /var/httpd/html.  You copy these
files into a dedicated directory /home/backmeup on homeserver
every night, where the regular homeserver backup picks them up.

This is basically what I have been using at home for the last
few years and it works very well.