Lecture notes for Operating Systems

Lecture Notes for Operating Systems

Implementing File Systems, 27 November 2000

implementing files
1. the disk abstraction - an array of fixed-sized blocks
2. disk-block allocation
  1. sequential -
    1. good - fast and simple to do and use
    2. bad - static maximum file sizes required; external fragmentation
  2. linked list - each block contains a pointer to the next block;
    1. good - reasonably fast allocation, no external fragmentation;
    2. bad - slow random access, data and meta-data mixed.
  3. indexed list - an primary memory representation of the disk array
    1. good - fast; data and meta-data separate
    2. bad - could be a huge list
  4. index nodes - disk blocks of indexed lists; unix
    1. good - fast; huge
    2. bad - complex
3. disk block size
  1. depends on disk geometry (sector, track, and cylinder), system considerations (page size)
  2. disk space utilization (drops as page size grows) vs data transfer rates (increases as page size grows)
4. free block management
  1. bit maps vs. linked lists using disk blocks
  2. one block primary memory cache
implementing directories
1. cp/m directories - flat
  1. user code (1), file name (8), extension (3), extent (1) block count (1) disk block numbers (15)
2. ms-dos directories - hierarchical
  1. file name (8), extension (3), attributes (1), time (2), date (2), first disk block number (2), size (4)
  2. an index into a file-allocation table
3. unix directories - hierarchical
  1. i-node number (2), file name (14) - i-node leads to the next directory level, or the file itself
performance
1. millisecond access times - caching
  1. a block of buffers between the disk and the rest of the os
  2. should be invisible except for peformance gains
  3. full caches need to be flushed - like paging
  4. file system consistency - writes to the disk go to the disk
  5. modified lru - important immediately, infrequently first, frequently last
  6. putting writes to disk - periodic flushes to disk or write-through
  7. removable media - synchronizing with non-write-through caches caches
2. disk clustering - putting associated things close together
  1. data and directories within the same cylinder
  2. interleaving sectors within a cylinder
  3. centerally locating meta-data within the disk or cylinder groups
log-structured file systems
1. everything's getting better except for disk seek times
2. large, fast primary caches reduce disk accesses to writes - small writes (data and meta-data)
3. writes go to a segment; when full the segment gets written to disk
4. reads are as before
5. i-node maps keep track of the most recent i nodes
6. a scavenger thread cleans up old segments, freeing them for further use
raid - redundant array of inexpensive (or independent disks)
1. lots of reliable, fast disk storage cheap
2. basic idea - parallel reads and writes to multiple disks
3. reliability through redundancy, ecc, or both
4. levels of raid performance - originally five, now who knows
  1. raid 0 - parallel disks storing blocks; no reliability
  2. raid 1 - parallel disks with mirroring; reliability through redundancy
  3. raid 2 - parallel disks; byte interleaved; data and ecc disks
  4. raid 3 - data are striped across parallel data disks; stripe parity is stored on separate disk
  5. raid 4 - non-striped data written to parallel disks; block parity stored on separate disk
  6. raid 5 - parity and data disks combined; block writes

This page last modified on 29 November 2000.