BSD Newsletter.com
   Front | Info | Lists | Newsfeeds | Study Guide | What is BSD?
Advertisement: The OpenBSD PF Packet Filter Book: PF for NetBSD, FreeBSD, DragonFly and OpenBSD

BSD Links
·New Links
·Advocacy
·Drivers
·Events
·Flavours
·FAQs
·Guides
·Programming
·Security
·Software
·User Groups

This is the BSDA Study Guide Book written via a wiki collaboration. This is a work in progress. You may contribute to or discuss this specific page at http://bsdwiki.reedmedia.net/wiki/Determine_if_a_file_is_a_binary__44___text__44___or_data_file.html.

Determine if a file is a binary, text, or data file

Author: Ivan Voras IvanVoras FreeBSD

Concept

While BSD systems use naming conventions to help determine the type of file, an administrator should be aware that these are conventions only and that there is a magic database to help determine file type.

Introduction

File types are really not well defined in Unix, and (not going into discussion about special file-like system objects) there are really only three types of files that are recognized by the system:

  1. Executable files, distinguished by having the execute ("x") bit set
  2. Directories, noted by their directory ("d") bit
  3. Everything else

The third category can really encompass anything - regular text files, images, multimedia, archives, etc. Files from all categories are not distinguished on the system level by their name (this is different from other architecture, for example Microsoft(r) Windows(tm)) but there are conventions that help users not to get lost in the listings. The most used convention is adding a file extension - a sequence of characters prefixed with dot (".") to the file name. Thus most shell scripts have filenames ending with .sh, readable text files end with .txt, JPEG images with .jpeg, etc. This convention can sometimes fail for various reasons, most common of which is if a file is copied from a system that doesn't support appropriate file attributes or has filename limitations.

To help recover file type information there's a database of detection strings (/usr/share/misc/magic) and a utility (file(1)) that are used together to inspect files and produce human readable description of its content. Because there can be infinite file types, this method cannot always work, but will probably work for nearly 100% of commonly used files.

If you're familiar with several widely used formats you may inspect the file yourself, for example by converting it to a hex dump (with hexdump(1)) and looking at the first few lines.

Examples

To verify that the "magic" database is indeed what it's supposed to be, use:

> file /usr/share/misc/magic
/usr/share/misc/magic: magic text file for file(1) cmd

To verify the format of an executable, use:

> file `which cat`
/bin/cat: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked (uses shared libs), stripped

> file `which acroread`
/usr/X11R6/bin/acroread: a /compat/linux/bin/sh script text executable

> file /compat/linux/bin/bash
/compat/linux/bin/bash: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped

To inspect the format of a random file:

> file a_file_i_found
a_file_i_found: JPEG image data, JFIF standard 1.02

To see for yourself what does the header of a file look like, use hexdump -C piped to head:

> hexdump -C zlib1.dll | head
00000000  4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  |MZ..............|
00000010  b8 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 f8 00 00 00  |................|
00000040  0e 1f ba 0e 00 b4 09 cd  21 b8 01 4c cd 21 54 68  |........!..L.!Th|
00000050  69 73 20 70 72 6f 67 72  61 6d 20 63 61 6e 6e 6f  |is program canno|
00000060  74 20 62 65 20 72 75 6e  20 69 6e 20 44 4f 53 20  |t be run in DOS |
00000070  6d 6f 64 65 2e 0d 0d 0a  24 00 00 00 00 00 00 00  |mode....$.......|
00000080  bb 22 a6 bc ff 43 c8 ef  ff 43 c8 ef ff 43 c8 ef  |."...C...C...C..|
00000090  7c 4b 95 ef fd 43 c8 ef  ff 43 c9 ef e7 43 c8 ef  ||K...C...C...C..|

Practice Exercises

  1. Find out the file type of your kernel (in FreeBSD, it's /boot/kernel/kernel)
  2. Find a unicode (UTF-16) text file and use hexdump to examine it. Compare the results with an ASCII text file.

More information

file(1), magic(5)



Front | Information | Lists | Newsfeeds