file(1p) — Linux manual page
FILE(1P) POSIX Programmer's Manual FILE(1P)
PROLOG
This manual page is part of the POSIX Programmer's Manual. The
Linux implementation of this interface may differ (consult the
corresponding Linux manual page for details of Linux behavior),
or the interface may not be implemented on Linux.
NAME
file — determine file type
SYNOPSIS
file [-dh] [-M file] [-m file] file...
file -i [-h] file...
DESCRIPTION
The file utility shall perform a series of tests in sequence on
each specified file in an attempt to classify it:
1. If file does not exist, cannot be read, or its file status
could not be determined, the output shall indicate that the
file was processed, but that its type could not be
determined.
2. If the file is not a regular file, its file type shall be
identified. The file types directory, FIFO, socket, block
special, and character special shall be identified as such.
Other implementation-defined file types may also be
identified. If file is a symbolic link, by default the link
shall be resolved and file shall test the type of file
referenced by the symbolic link. (See the -h and -i options
below.)
3. If the length of file is zero, it shall be identified as an
empty file.
4. The file utility shall examine an initial segment of file and
shall make a guess at identifying its contents based on
position-sensitive tests. (The answer is not guaranteed to be
correct; see the -d, -M, and -m options below.)
5. The file utility shall examine file and make a guess at
identifying its contents based on context-sensitive default
system tests. (The answer is not guaranteed to be correct.)
6. The file shall be identified as a data file.
If file does not exist, cannot be read, or its file status could
not be determined, the output shall indicate that the file was
processed, but that its type could not be determined.
If file is a symbolic link, by default the link shall be resolved
and file shall test the type of file referenced by the symbolic
link.
OPTIONS
The file utility shall conform to the Base Definitions volume of
POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines, except
that the order of the -m, -d, and -M options shall be
significant.
The following options shall be supported by the implementation:
-d Apply any position-sensitive default system tests and
context-sensitive default system tests to the file.
This is the default if no -M or -m option is specified.
-h When a symbolic link is encountered, identify the file
as a symbolic link. If -h is not specified and file is
a symbolic link that refers to a nonexistent file, file
shall identify the file as a symbolic link, as if -h
had been specified.
-i If a file is a regular file, do not attempt to classify
the type of the file further, but identify the file as
specified in the STDOUT section.
-M file Specify the name of a file containing position-
sensitive tests that shall be applied to a file in
order to classify it (see the EXTENDED DESCRIPTION). No
position-sensitive default system tests nor context-
sensitive default system tests shall be applied unless
the -d option is also specified.
-m file Specify the name of a file containing position-
sensitive tests that shall be applied to a file in
order to classify it (see the EXTENDED DESCRIPTION).
If the -m option is specified without specifying the -d option or
the -M option, position-sensitive default system tests shall be
applied after the position-sensitive tests specified by the -m
option. If the -M option is specified with the -d option, the -m
option, or both, or the -m option is specified with the -d
option, the concatenation of the position-sensitive tests
specified by these options shall be applied in the order
specified by the appearance of these options. If a -M or -m file
option-argument is -, the results are unspecified.
OPERANDS
The following operand shall be supported:
file A pathname of a file to be tested.
STDIN
The standard input shall be used if a file operand is '-' and the
implementation treats the '-' as meaning standard input.
Otherwise, the standard input shall not be used.
INPUT FILES
The file can be any file type.
ENVIRONMENT VARIABLES
The following environment variables shall affect the execution of
file:
LANG Provide a default value for the internationalization
variables that are unset or null. (See the Base
Definitions volume of POSIX.1‐2017, Section 8.2,
Internationalization Variables for the precedence of
internationalization variables used to determine the
values of locale categories.)
LC_ALL If set to a non-empty string value, override the values
of all the other internationalization variables.
LC_CTYPE Determine the locale for the interpretation of
sequences of bytes of text data as characters (for
example, single-byte as opposed to multi-byte
characters in arguments and input files).
LC_MESSAGES
Determine the locale that should be used to affect the
format and contents of diagnostic messages written to
standard error and informative messages written to
standard output.
NLSPATH Determine the location of message catalogs for the
processing of LC_MESSAGES.
ASYNCHRONOUS EVENTS
Default.
STDOUT
In the POSIX locale, the following format shall be used to
identify each operand, file specified:
"%s: %s\n", <file>, <type>
The values for <type> are unspecified, except that in the POSIX
locale, if file is identified as one of the types listed in the
following table, <type> shall contain (but is not limited to) the
corresponding string, unless the file is identified by a
position-sensitive test specified by a -M or -m option. Each
<space> shown in the strings shall be exactly one <space>.
Table 4-9: File Utility Output Strings
───────┬──────────────────────────────────────────────┬──────────────────────────────────┬─ │
│ If file is: <type│> shall contain the string: Notes│ │
───────┼──────────────────────────────────────────────┼──────────────────────────────────┼─ │
Nonex│istent canno│t open │ │
│ │ │ │
│ Block special │ block special │ 1 │
│ Character special │ character special │ 1 │
│ Directory │ directory │ 1 │
│ FIFO │ fifo │ 1 │
│ Socket │ socket │ 1 │
│ Symbolic link │ symbolic link to │ 1 │
│ Regular file │ regular file │ 1,2 │
│ Empty regular file │ empty │ 3 │
│ Regular file that cannot be read │ cannot open │ 3 │
│ │ │ │
│ Executable binary │ executable │ 3,4,6 │
│ ar archive library (see ar) │ archive │ 3,4,6 │
│ Extended cpio format (see pax) │ cpio archive │ 3,4,6 │
│ Extended tar format (see ustar in pax) │ tar archive │ 3,4,6 │
│ │ │ │
│ Shell script │ commands text │ 3,5,6 │
│ C-language source │ c program text │ 3,5,6 │
│ FORTRAN source │ fortran program text │ 3,5,6 │
│ │ │ │
│ Regular file whose type cannot be determined │ data │ 3 │
└──────────────────────────────────────────────┴──────────────────────────────────┴───────┘
Notes:
1. This is a file type test.
2. This test is applied only if the -i option is
specified.
3. This test is applied only if the -i option is not
specified.
4. This is a position-sensitive default system test.
5. This is a context-sensitive default system test.
6. Position-sensitive default system tests and
context-sensitive default system tests are not
applied if the -M option is specified unless the -d
option is also specified.
In the POSIX locale, if file is identified as a symbolic link
(see the -h option), the following alternative output format
shall be used:
"%s: %s %s\n", <file>, <type>, <contents of link>"
If the file named by the file operand does not exist, cannot be
read, or the type of the file named by the file operand cannot be
determined, this shall not be considered an error that affects
the exit status.
STDERR
The standard error shall be used only for diagnostic messages.
OUTPUT FILES
None.
EXTENDED DESCRIPTION
A file specified as an option-argument to the -m or -M options
shall contain one position-sensitive test per line, which shall
be applied to the file. If the test succeeds, the message field
of the line shall be printed and no further tests shall be
applied, with the exception that tests on immediately following
lines beginning with a single '>' character shall be applied.
Each line shall be composed of the following four <tab>-separated
fields. (Implementations may allow any combination of one or more
white-space characters other than <newline> to act as field
separators.)
offset An unsigned number (optionally preceded by a single '>'
character) specifying the offset, in bytes, of the
value in the file that is to be compared against the
value field of the line. If the file is shorter than
the specified offset, the test shall fail.
If the offset begins with the character '>', the test
contained in the line shall not be applied to the file
unless the test on the last line for which the offset
did not begin with a '>' was successful. By default,
the offset shall be interpreted as an unsigned decimal
number. With a leading 0x or 0X, the offset shall be
interpreted as a hexadecimal number; otherwise, with a
leading 0, the offset shall be interpreted as an octal
number.
type The type of the value in the file to be tested. The
type shall consist of the type specification characters
d, s, and u, specifying signed decimal, string, and
unsigned decimal, respectively.
The type string shall be interpreted as the bytes from
the file starting at the specified offset and including
the same number of bytes specified by the value field.
If insufficient bytes remain in the file past the
offset to match the value field, the test shall fail.
The type specification characters d and u can be
followed by an optional unsigned decimal integer that
specifies the number of bytes represented by the type.
The type specification characters d and u can be
followed by an optional C, S, I, or L, indicating that
the value is of type char, short, int, or long,
respectively.
The default number of bytes represented by the type
specifiers d, f, and u shall correspond to their
respective C-language types as follows. If the system
claims conformance to the C-Language Development
Utilities option, those specifiers shall correspond to
the default sizes used in the c99 utility. Otherwise,
the default sizes shall be implementation-defined.
For the type specifier characters d and u, the default
number of bytes shall correspond to the size of a basic
integer type of the implementation. For these specifier
characters, the implementation shall support values of
the optional number of bytes to be converted
corresponding to the number of bytes in the C-language
types char, short, int, or long. These numbers can
also be specified by an application as the characters
C, S, I, and L, respectively. The byte order used when
interpreting numeric values is implementation-defined,
but shall correspond to the order in which a constant
of the corresponding type is stored in memory on the
system.
All type specifiers, except for s, can be followed by a
mask specifier of the form &number. The mask value
shall be AND'ed with the value of the input file before
the comparison with the value field of the line is
made. By default, the mask shall be interpreted as an
unsigned decimal number. With a leading 0x or 0X, the
mask shall be interpreted as an unsigned hexadecimal
number; otherwise, with a leading 0, the mask shall be
interpreted as an unsigned octal number.
The strings byte, short, long, and string shall also be
supported as type fields, being interpreted as dC, dS,
dL, and s, respectively.
value The value to be compared with the value from the file.
If the specifier from the type field is s or string,
then interpret the value as a string. Otherwise,
interpret it as a number. If the value is a string,
then the test shall succeed only when a string value
exactly matches the bytes from the file.
If the value is a string, it can contain the following
sequences:
\character The <backslash>-escape sequences as
specified in the Base Definitions volume of
POSIX.1‐2017, Table 5-1, Escape Sequences
and Associated Actions ('\\', '\a', '\b',
'\f', '\n', '\r', '\t', '\v'). In
addition, the escape sequence '\ ' (the
<backslash> character followed by a <space>
character) shall be recognized to represent
a <space> character. The results of using
any other character, other than an octal
digit, following the <backslash> are
unspecified.
\octal Octal sequences that can be used to
represent characters with specific coded
values. An octal sequence shall consist of
a <backslash> followed by the longest
sequence of one, two, or three octal-digit
characters (01234567).
By default, any value that is not a string shall be
interpreted as a signed decimal number. Any such value,
with a leading 0x or 0X, shall be interpreted as an
unsigned hexadecimal number; otherwise, with a leading
zero, the value shall be interpreted as an unsigned
octal number.
If the value is not a string, it can be preceded by a
character indicating the comparison to be performed.
Permissible characters and the comparisons they specify
are as follows:
= The test shall succeed if the value from the file
equals the value field.
< The test shall succeed if the value from the file
is less than the value field.
> The test shall succeed if the value from the file
is greater than the value field.
& The test shall succeed if all of the set bits in
the value field are set in the value from the
file.
^ The test shall succeed if at least one of the set
bits in the value field is not set in the value
from the file.
x The test shall succeed if the file is large
enough to contain a value of the type specified
starting at the offset specified.
message The message to be printed if the test succeeds. The
message shall be interpreted using the notation for the
printf formatting specification; see printf. If the
value field was a string, then the value from the file
shall be the argument for the printf formatting
specification; otherwise, the value from the file shall
be the argument.
EXIT STATUS
The following exit values shall be returned:
0 Successful completion.
>0 An error occurred.
CONSEQUENCES OF ERRORS
Default.
The following sections are informative.
APPLICATION USAGE
The file utility can only be required to guess at many of the
file types because only exhaustive testing can determine some
types with certainty. For example, binary data on some
implementations might match the initial segment of an executable
or a tar archive.
Note that the table indicates that the output contains the stated
string. Systems may add text before or after the string. For
executables, as an example, the machine architecture and various
facts about how the file was link-edited may be included. Note
also that on systems that recognize shell script files starting
with "#!" as executable files, these may be identified as
executable binary files rather than as shell scripts.
EXAMPLES
Determine whether an argument is a binary executable file:
file -- "$1" | grep -q ':.*executable' &&
printf "%s is executable.\n$1"
RATIONALE
The -f option was omitted because the same effect can (and
should) be obtained using the xargs utility.
Historical versions of the file utility attempt to identify the
following types of files: symbolic link, directory, character
special, block special, socket, tar archive, cpio archive, SCCS
archive, archive library, empty, compress output, pack output,
binary data, C source, FORTRAN source, assembler source,
nroff/troff/eqn/tbl source troff output, shell script, C shell
script, English text, ASCII text, various executables, APL
workspace, compiled terminfo entries, and CURSES screen images.
Only those types that are reasonably well specified in POSIX or
are directly related to POSIX utilities are listed in the table.
Historical systems have used a ``magic file'' named /etc/magic to
help identify file types. Because it is generally useful for
users and scripts to be able to identify special file types, the
-m flag and a portable format for user-created magic files has
been specified. No requirement is made that an implementation of
file use this method of identifying files, only that users be
permitted to add their own classifying tests.
In addition, three options have been added to historical
practice. The -d flag has been added to permit users to cause
their tests to follow any default system tests. The -i flag has
been added to permit users to test portably for regular files in
shell scripts. The -M flag has been added to permit users to
ignore any default system tests.
The POSIX.1‐2008 description of default system tests and the
interaction between the -d, -M, and -m options did not clearly
indicate that there were two types of ``default system tests''.
The ``position-sensitive tests'' determine file types by looking
for certain string or binary values at specific offsets in the
file being examined. These position-sensitive tests were
implemented in historical systems using the magic file described
above. Some of these tests are now built into the file utility
itself on some implementations so the output can provide more
detail than can be provided by magic files. For example, a magic
file can easily identify a core file on most implementations, but
cannot name the program file that dropped the core. A magic file
could produce output such as:
/home/dwc/core: ELF 32-bit MSB core file SPARC Version 1
but by building the test into the file utility, you could get
output such as:
/home/dwc/core: ELF 32-bit MSB core file SPARC Version 1, from 'testprog'
These extended built-in tests are still to be treated as
position-sensitive default system tests even if they are not
listed in /etc/magic or any other magic file.
The context-sensitive default system tests were always built into
the file utility. These tests looked for language constructs in
text files trying to identify shell scripts, C, FORTRAN, and
other computer language source files, and even plain text files.
With the addition of the -m and -M options the distinction
between position-sensitive and context-sensitive default system
tests became important because the order of testing is important.
The context-sensitive system default tests should never be
applied before any position-sensitive tests even if the -d option
is specified before a -m option or -M option due to the high
probability that the context-sensitive system default tests will
incorrectly identify arbitrary text files as text files before
position-sensitive tests specified by the -m or -M option would
be applied to give a more accurate identification.
Leaving the meaning of -M - and -m - unspecified allows an
existing prototype of these options to continue to work in a
backwards-compatible manner. (In that implementation, -M - was
roughly equivalent to -d in POSIX.1‐2008.)
The historical -c option was omitted as not particularly useful
to users or portable shell scripts. In addition, a reasonable
implementation of the file utility would report any errors found
each time the magic file is read.
The historical format of the magic file was the same as that
specified by the Rationale in the ISO POSIX‐2:1993 standard for
the offset, value, and message fields; however, it used less
precise type fields than the format specified by the current
normative text. The new type field values are a superset of the
historical ones.
The following is an example magic file:
0 short 070707 cpio archive
0 short 0143561 Byte-swapped cpio archive
0 string 070707 ASCII cpio archive
0 long 0177555 Very old archive
0 short 0177545 Old archive
0 short 017437 Old packed data
0 string \037\036 Packed data
0 string \377\037 Compacted data
0 string \037\235 Compressed data
>2 byte&0x80 >0 Block compressed
>2 byte&0x1f x %d bits
0 string \032\001 Compiled Terminfo Entry
0 short 0433 Curses screen image
0 short 0434 Curses screen image
0 string <ar> System V Release 1 archive
0 string !<arch>\n__.SYMDEF Archive random library
0 string !<arch> Archive
0 string ARF_BEGARF PHIGS clear text archive
0 long 0x137A2950 Scalable OpenFont binary
0 long 0x137A2951 Encrypted scalable OpenFont binary
The use of a basic integer data type is intended to allow the
implementation to choose a word size commonly used by
applications on that architecture.
Earlier versions of this standard allowed for implementations
with bytes other than eight bits, but this has been modified in
this version.
FUTURE DIRECTIONS
None.
SEE ALSO
ar(1p), ls(1p), pax(1p), printf(1p)
The Base Definitions volume of POSIX.1‐2017, Table 5-1, Escape
Sequences and Associated Actions, Chapter 8, Environment
Variables, Section 12.2, Utility Syntax Guidelines
COPYRIGHT
Portions of this text are reprinted and reproduced in electronic
form from IEEE Std 1003.1-2017, Standard for Information
Technology -- Portable Operating System Interface (POSIX), The
Open Group Base Specifications Issue 7, 2018 Edition, Copyright
(C) 2018 by the Institute of Electrical and Electronics
Engineers, Inc and The Open Group. In the event of any
discrepancy between this version and the original IEEE and The
Open Group Standard, the original IEEE and The Open Group
Standard is the referee document. The original Standard can be
obtained online at http://www.opengroup.org/unix/online.html .
Any typographical or formatting errors that appear in this page
are most likely to have been introduced during the conversion of
the source files to man page format. To report such errors, see
https://www.kernel.org/doc/man-pages/reporting_bugs.html .