File Systems

File:

A File is a named abstract resource capable of string a byte stream for later access or from which a stream of bytes can be read to obtain data (or) in other words a file is a named object that comes into existence by explicit creation, is immune to temporary failures in the system, and persists until explicitly destroyed. Files are created by software and usually conform to a particular file format. They are almost always assigned file names by the file system on which they are stored, so that they can be referred to at a later time. Files are often organized hierarchically by the operating system, placing them in folders or as directories.

Files can be classified into many types. From structural point of view, files are of two types - Unstructured and Structured. From the modifiability criteria, file may be - Mutable and Immutable

File System:

A File System is a subsystem of an OS that performs File Management (The part of the Operating system that creates files ,abstractions, and provides mechanisms for manipulating and controlling them) activities such as organization, storing, retrieval, naming, sharing, and protection of files. The File System of a single-Processor system provides advantage of permanent storage and sharing of information. File systems are represented either textually or graphically by file browsers or shells. If graphically, the metaphor of folders containing documents, other files, and nested folders is often used.

A file system is an integral part of any modern operating system. Early microcomputer operating systems' only real task was file management - a fact reflected in their namesFile systems typically have directories which associate file names with files, usually by connecting the file name to an index into a file allocation table of some sort, such as the FAT in an MS-DOS file system, or an inode in a UNIX-like filesystem. Directory structures may be flat, or allow hierarchies where directories may contain subdirectories. In some file systems, file names are structured, with special syntax for filename extensions and version numbers. In others, file names are simple strings, and per-file metadata is stored elsewhere.

Traditional filesystems offer facilities to create, move and delete both files and directories. They lack facilities to create additional links to a directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create bidirectional links to files.

Linux assigns a device name to each device, but this is not how the files on that device are accessed. There are no drive letters in Linux. Instead, Linux creates a Virtual File System for us, which makes all the files on all the devices appear to exist on one global device. In Linux, there is one root directory, and every file you have access to is located under it somewhere. Furthermore, the Linux root directory does not have to be in any particular place. It might not be on your first hard drive. It might not even be on your computer. Linux can use a network shared resource as its root directory.

To gain access to files on another device, in Linux you must first tell it where in the directory tree you would like those files to appear. This process is called mounting a file system.For example, you will frequently need to access files from CD-ROM. In order to do this, you must tell Linux, "Take the file system from this CD-ROM and make it appear under the directory /mnt." The directory given to Linux is called the mount point. In this case it is /mnt. The /mnt directory exists on all Linux systems, and it is intended specifically for use as a mount point for temporary media like floppy disks or CDs. It may be empty, or it may contain subdirectories for mounting individual devices. Generally, only the administrator (i.e. root user) may authorize the mounting of file systems.

At least one and perhaps many file systems are automatically mounted (automounting) by Linux at boot time. The system administrator can control which file systems are mounted at boot time, and can pre-determine the mount points for specific file systems. The sysadmin can also designate some file systems that may be mounted by normal users, and can specify when mounted file systems are checked for errors and backed up. All this information is stored in the file /etc/fstab, which anyone can read to discover what file systems are available and mountable by users.

Traditional filesystems also offer facilities to truncate, append to, create, move, delete and in-place modify files. They do not offer facilities to prepend to or truncate from the beginning of a file, let alone arbitrary insertion into or deletion from a file. The operations provided are highly asymmetric and lack the generality to be useful in unexpected contexts. For example, interprocess pipes in Linux have to be implemented outside of the filesystem because it does not offer truncation from the beginning of files.

Secure access to basic file system operations can be based on a scheme of access control lists or capabilities. Access control lists have been proved insecure several decades ago, which is why research operating systems tend to use capabilities. Commercial file systems still use access control lists.

File system types:

File system types can be classified into

· Disk File Systems

· Network File Systems

· Special Purpose File Systems

Disk File Systems, a file system designed for the storage of files on a disk drive, which might be directly or indirectly connected to the computer.Examples of disk file systems include: EXT3 provided in Linux , FAT (DOS and Microsoft Windows file system; 12, 16 and 32 bit table depths), HFS (for Mac OS),etc.

Network file systems, a file system where the files are accessed over a network, potentially simultaneously by several computers. Examples of network file systems include: AFS (Andrew File System), CIFS (sometimes also called SMB or Samba filesystems), NFS on Linux.

Special Purpose File Systems is any file system that is not disk file system or network file system. This includes systems where the files are arranged dynamically by software, intended for such purposes as communication between computer processes or temporary file space.Examples include: acme (Plan 9) (text windows), archfs (archive), cdfs (reading and writing of CDs).

File Sharing:

File Sharing is the direct or indirect transfer of files from one computer to another computer over the Internet, over a smaller Intranet. Or across multiple networks following the peer-to-peer model. A shared file may be simultaneously accessed by multiple users. In such a situation, an important design issue for any file system is to clearly define when modifications of file data made by a user are observable by other users.

File Sharing in a Local Area Network (LAN)

The four commonly used file sharing semantics are

· Unix Semantics

· Session Semantics

· Immutable shared-files semantics

· transaction like semantics.

Unix Semantics, this forces an absolute time ordering on all operations and ensures that every read operation on a file sees the effects of all previous write operations performed on the file.In particular, writes to an open file by a user immediately become visible to other users who have this file open at the same time.Unix Semantics is most desirable because it is easy to serialize all read/write requests but is difficult to implement in a distributed file system.

In Session Semantics, all changes made to a file during a session (A Series of file accesses made between the open and close operations) are initially made visible only to the client process (or possibly to all processes on the client node) that opened the session and are invisible to other remote processes who have the same file open simultaneously. Once the session is closed, the changes made to the file are made visisble to remote processes only in later starting sessions. Already open instances of the file do not reflect these changes.Here each client maintains it s own image of the file. Furthermore using session semantic raises the question of what should be the fianl image when multiple file sessions, each one having a different filr image are closed one after another. Session semantics should be used only with those file systems tha use the file-lvel transfer model.

Immutable Shared-Files Semantics is based on the immutable file model ( File that cannot be modified once it has been created). According to this semantics, once the creator of a file declares it to be sharable, the file is treated as immutable, so that it cannot be modified any more.Changes to the file are handled by creating a new updated version of the file.Each version of the file is treated as an entirely new file. Therefore this semantics allows files to be shared only in the read-only mode (ie) the shared files cannot be modified.

Transaction – like Semantics is based on the transaction mechanism, which is a high-level mechanism for controlling concurrent access to shares mutable data.A transaction is a set of operations enclosed in-between a pair ofbegin_transaction and end_transaction like operations.The transaction mechanism ensures that the partial modifications made to the shared data by a transaction will not be visible to other concurrently executing transactions entil the transaction ends.There fore in multiple concurrent transactions operating on a file, the fianl file content will be the same as if all the transactions were run in some sequential order.

Remote File Accessing:

The two complementary models for accessing remote files are

· Remote Service Model

· Data-Caching Model

Data-Caching Model, In file ssytems that follow data caching model, an important design issu is to decide the unit of data transfer. Unit of data transfer refers to the fraction of a file data that is transferred to and from clients as a result of a single read or write operation. The four Commonly used units for this purpose are

· File-Level Transfer Model

· Block-Level Transfer Model

· Byte-Level Transfer Model

· Record-Level Transfer Model

File-Level Transfer Model, when an operation requires file data to be transferred across the network in either direction between a client and a server, the whole file is moved. The advantages of this model is its conceptual simplicity, Less Requests and Response, better scalability,imune to network failures once the file is copied to the client, optimized disk access routines and also simplifies the task of supporting heterogeneous workstations. The main draw back of this model is that it needs sufficient storage space in the client’s side.

Block-Level transfer model, file data transfers across the network between a client and a server take place in units of file blocks. A file block is a contiguous portion of a file and is usually fixed in length. For filesystems in which block size is equal to virtual memory page size, this model is also called a page-level transfer model. The advantage of this model is that it does not require client nodes to have large storage space.It also eliminate the need to copy an entire file when only a small portion of the file data is needed. Therefore, this model can be used in systems having diskless workstations.The model has poor performance when compare to the file-level transfer model when the access requests are such that most files have to be transferred in their entirety.

Byte-Level Transfer Model, file data transfers across the network between a client and a server take place in units of bytes. This model provides maximum flexibility because it allows storage and retreival of an arbitrary sequential subrange of a file, specified by an offest within a file, and a length. The main draw back of this model is the difficulty in cache management due to the variable length data for different accessrequests.

Record-Level Transfer Model, is suitable for use with those file models in which file contents are structured in the form of records. In this model, filr data transfers across the network between a clinet and a server take place in units of records.

Existing Systems:

A variety of file-sharing programs is available on several different networks. Availability depends partly on operating system, and different networks have different features. The most commonly used systems are the NFS on Linux and the Samba Systems

Samba Service, Samba runs on most UNIX and Unix-like systems, such as GNU/Linux, the Solaris operating environment, and the BSD variants, including Apples OS X Server. The name samba comes from inserting two vowels into the name of the standard protocol that Microsoft Windows network file system use, called server message block (SMB). Samba was originally called smbserver

Network File System (NFS), is a protocol developed by Sun Microsystems, a network file system which allows a computer to access files over a network as if they were on its local disks. NFS is strongly associated with UNIX systems, though it can be used on any platform such as Macintosh and Microsoft Windows operating systems. The server message block (SMB), a similar protocol, is the equivalent implementation of a network file system under Microsoft Windows