Operating Systems Lecture Notes
26 March 2012 • Files
Outline
Files, content and access.
File implementation.
Metadata
Files
A
file
is a sequence of items.
What kind of sequence?
What kind of items?
Files are named, external, independent, persistent, and potentially gargantuan.
External: files can exist outside a process's resource space.
Independent: file lifetime is independent of associated processes' life times.
File Items
At base, a file is a sequence of 2
i
bytes for small
i
.
In
Unix
,
i
= 0.
Bytes can be grouped into larger structures called
records
.
Fixed-length records
are all the same in a file.
Variable-length records
may vary in structure within a file.
Implementing Items
Most operating systems implement byte-item files.
Applications can turn byte-item files into more structured files.
Databases, spread-sheets, HTML-XML files.
Most operating systems also support record-based files in some form.
Usually specialized, such as directories in Linux.
Occasionally available as a service.
File Sequences
The file sequence is indexed and variable length.
Indexes are array-like, although occasionally content based.
Usually only for record-based files.
Sequence length changes easiest at the end (largest index).
Size changes at other sequence locations are possible but expensive.
File Access
Access patterns
characterize the way items in a file are read or written.
Sequential access
: item by item in one direction.
Direct
(or
random
or
indexed
) access: jumping from item to item.
Reliable access pattern preditions can be exploited to improve efficiency.
Files On Disk
Many file properties (size, persistence) come from being implemented on disk.
The file-disk mapping (
file allocation
) is similar to the process-virtual-store mapping.
Although the trade-offs are worked differently in each case.
Efficiency is primary influence in resolving trade-offs.
Efficient file use, as well as efficient disk use.
Allocation Structure
File items are grouped into a
file block
.
An item sequence is turned into a file-block sequence.
A file block is a small multiple of the disk-block size.
2
i
K-bytes, 0 ≤
i
≤ 4.
Whatever treats the disk most kindly.
The juncture between adjacent file blocks can get awkward for record-based files.
File Blocks
File blocks encourage efficient disk use.
Amortize expensive disk operations over more data.
Caches are lurking around someplace.
But file blocks may waste disk space.
This is
internal fragmentation
.
Contiguous Allocation
Contiguous allocation
is the simplest file-allocation policy.
The whole file-block sequence is plopped into an available contiguous disk-block sequence.
Contiguous Analysis
Contiguously allocated files are not cheaply dynamic.
Shrinking is easy but wasteful.
Growing is limited and expensive.
Good allocation requires maximum size declarations.
Wasting space no matter how accurate the declarations.
Over time, external fragmentation eats into allocatable space.
Linked-List Allocation
Linked-list allocation
is the antipodes to contiguous allocation.
Allocate the file as a linked-list of file blocks.
Linked-List Analysis
Link-allocated files are dynamic at the ends.
No external fragmentation.
No arduous maximum-size estimates required.
On the other hand:
File traversal is more complex to implement.
File transfer efficiency can easily plummet.
Link Information
Where are the links in a linked-list allocation?
They can be part of the file block.
In-block links introduces a host of irksome special cases.
File traversal is performance hostile.
External Links
Separate concerns: use file-blocks for data and other blocks for links.
File navigation is now buffer friendly.
Big Files
External links still incur travel costs linear in file size.
An
N-way tree
has sub-linear traversal costs.
N
is usually the file- (or disk-) block size divided by pointer size (4 to 8 bytes).
Moving sequentially from file-block to file-block (leaf to leaf) in a tree is not simple.
A node points to a file block (a leaf) or another tree node (an
indirection block
).
Access Trees
Access trees can be tilted to put file blocks closer to the root on one side.
Small files have file blocks one hop away from the root.
As the file grows, new file blocks move further down the tree.
Three hops away is the usual stopping point.
Access Tree Illustrated
B+ Access Trees
B+ trees
are an alternative access tree.
All pages are the same distance away from the root.
B+ trees are
self balancing
to maintain uniform distance.
Hybrid Allocation
Sequential allocation is inflexible but disk friendly.
Linked-list allocation is flexible but disk hostile.
How to combine sequential efficiency with linked-list flexibility?
Allocate file-block runs for efficiency and link them for flexibility.
Extent-based allocation
.
Metadata
External link blocks are an example of
file metadata
: data about files.
Link blocks are the first of many instances of file metadata.
Metadata is packed along with the file, but usually not in the file.
Metadata is a good place to keep statistics about file use to support adaptive performance adjustments.
File Protection
File protection information is common metadata.
Basic protections are read-only, write-only and execute-only.
Execute-only implies read-only, most times.
There are elaborate variations on the basic protections.
But these require more metadata and higher-level structures.
Summary
Files are persistent, large item sequences.
Tree-based extent file implementations strike a balance between flexible file manipulation and efficient disk use.
Metadata is data about files.
This page last modified on 2012 March 26.