BSD Newsletter.com
   Front | Info | Lists | Newsfeeds | Study Guide | What is BSD?
Advertisement: The OpenBSD PF Packet Filter Book: PF for NetBSD, FreeBSD, DragonFly and OpenBSD

BSD Links
·New Links
·Advocacy
·Drivers
·Events
·Flavours
·FAQs
·Guides
·Programming
·Security
·Software
·User Groups

This is the BSDA Study Guide Book written via a wiki collaboration. This is a work in progress. You may contribute to or discuss this specific page at http://bsdwiki.reedmedia.net/wiki/Determine_disk_capacity_and_which_files_are_consuming_the_most_disk_space.html.

Determine disk capacity and which files are consuming the most disk space

Concept

  • Be able to combine common Unix command line utilities to quickly determine which files are consuming the most disk space.

Introduction

As disk sizes have increased over the years, so have the amount of data that we seem to want to keep on them. At one time or another, you may be faced with the "too much data/not enough space" problem. How can you quickly find the "disk hogs"?

Use the tools!

The BSD systems are full of tools that can assist with this problem, including:

  • df(1) - "disk free"
  • du(1) - "disk usage"
  • find(1) - "walk a file hierarchy"

If you're using NetBSD, you can also get a df type reading from systat(1). And with any BSD variant, using common "Unix-fu" (in particular, find and shell pipes), these commands can quickly produce useful information about disk usage.

df and du

For a quick summary of disk space, simply call df. Using "-c" with df provides an "overall total"; using "-h" with either df or du produces "human readable" output: that is, calculated into K, M, G (kilobytes, megabytes, gigabyes), etc., instead of "blocks" as indicated by the environment variable $BLOCKSIZE.

Unlike df, you probably don't want to simply call du. Without arguments, du lists the size of every file and subdirectory (and its files and subfiles, ad infinitum) of your CWD, roughly in the order of inodes --- if you happen to be in "/", you'd be a long time reading the output of du. Usually it's better to use du with "-s", possibly even with a specific file or file "glob" argument, or with "-h" and maybe "-c", and pipe the output through sort(1); look for a rather convoluted (yet effective) example below.

du can also read the sizes of files listed to its standard input, which makes find a fairly useful "frontend" to du on occasion (but see the section on find below before you scratch your head too hard on this).

Note: under certain conditions, df and du may disagree somewhat about the amount of free space on a filesystem. Generally, this occurs when a program is holding an open file descriptor to a file that has been unlinked; in such a case, du wouldn't count the file's size, but the blocks are still unavailable as "free blocks" (df="disk free", remember?) In such cases, you can use fstat(1) to see currently open files.

find and the "size" primary

The complete use of find is beyond the scope of this section; please see Find a file with a given set of attributes for complete information. However, using the "size" primary and an expression representing a given filesize, you can quickly produce a list of "disk hogs". See the Examples below.

Examples

Are any partitions nearing "full"?

$ df
    /dev/ad0s1a      1978    977   842    54%    /
    /dev/ad0s1e     67765  49502 12841    79%    /usr
    /dev/ad0s1d      3962   2182  1463    60%    /var

Display all the *.mp3 files in my homedir, and their sizes with a total:

$ du -sc *mp3 $HOME

List all files in the current directory, in order of size (almost):

$ du -h | sort -n | more

Here's a pretty wild set of pipes for "du", showing the largest disk hogs (unless files are >999MB - if so change "M" to "G" in the regular expression); to see the smallest files, use "head" rather than "tail", or for a complete listing pipe it to $PAGER instead of either. The "-n" option to sort(1) ensures that the filesizes are in numeric rather than alphabetical order:

[root@server][/usr/src]
# du -hc * | sort -n | grep "[0-9]M" | tail
    26M    crypto
    27M    contrib/binutils
    28M    release
    40M    sys/dev
    47M    contrib/gcc
    105M   sys
    204M   contrib
    458M   total

But this brings us to the relative power of find(1). A similar report could be produced like this ("find all files in the cwd greater than approximately 900MB in size"):

# find . -size +940000000c

The main difference between this statement's output and that of the "piped arrangement" above is that find doesn't report the actual sizes and the list isn't "sorted". Note that if you're using FreeBSD, you can use "[KMGTP]" with the size designation, thus: "find . -size +900M".

Practice Exercises

  1. Use df to see if your hard drives are nearing "full".
  2. Use find to find out whom in /home/ is the biggest "disk hog". (Optional: Use grep to see if any of these files are "mp3"s).
  3. Use du along with sort(1) and grep(1) to produce lists of files by size.

More information

du(1), df(1), find(1), sort(1), and, for NetBSD systat(1)



Front | Information | Lists | Newsfeeds