[cs631apue] Several Questions

Jan Schaumann jschauma at stevens.edu
Thu Sep 15 22:27:36 EDT 2016


kthompso <kthompso at stevens.edu> wrote:

> > bzhang41 at smurf:~/ReadAndTest$ cat file.hole > file.nohole
> > bzhang41 at smurf:~/ReadAndTest$ ls -ls file.*
> > 259 -rw-------+ 1 bzhang41 student 10240020 Sep 15 11:15 file.hole
> >   1 -rw-r--r--+ 1 bzhang41 student 10240020 Sep 15 11:18 file.nohole
> >
> > why the file.nohole has large block size than the file.hole ? When
> > Prof shows the result, it also happened. It?s the problem of different
> > OS ?

This one's interesting.  ("Interesting" is the word programmers use when
they mean "this makes no sense, but I can't explain it right now".)

Let's first remember that your home directory is on an NFS share.  That
is, the filesystem in question is not a local file system and may behave
differently from a regular file system.  With that in mind, let's first
try to run this experiment on a local file system:

$ hostname -f
nemo.srcit.stevens-tech.edu
$ pwd
/var/tmp
$ cc -Wall hole.c
$ ./a.out
$ ls -ls file.hole
16 -rw------- 1 jschauma professor 10240020 Sep 15 21:10 file.hole
$ cat file.hole >/dev/null
$ ls -ls file.hole
16 -rw------- 1 jschauma professor 10240020 Sep 15 21:10 file.hole
$ cat file.hole >file.nohole
$ ls -ls file.*
   16 -rw------- 1 jschauma professor 10240020 Sep 15 21:10 file.hole
10020 -rw------- 1 jschauma professor 10240020 Sep 15 21:11 file.nohole
$

This is the result we expected and what I should have shown in class:

The sparse file uses only a small amount of disk space ('ls -s' reports
the number of blocks used by the file; 16 in this case), but once we
read the data (via cat(1)), the kernel supplies all the null bytes in
the hole, which are then written to disk and the file 'file.nohole' ends
up using a lot more disk space (10020 blocks).

Side note: ls(1) on linux has a bug: according to the manual page, 'ls
-s' is supposed to report blocks in 512 byte blocks, but it actually
reports them in 1024 byte blocks.  to get 512 byte blocks reported, you
can run

$ env BLOCKSIZE=512 ls -ls file*
   32 -rw------- 1 jschauma professor 10240020 Sep 15 21:10 file.hole
20040 -rw------- 1 jschauma professor 10240020 Sep 15 21:11 file.nohole

That is, the actual amount of disk space used up by each of these two
files is:

file.hole  : 32 * 512    =    16384 bytes
file.nohole: 20040 * 512 = 10260480 bytes

Why does a file with 20 actual bytes (file.hole) take up 16384 bytes?
First, let's see if the file without the hole uses the right disk space.
We know (from the size reported by 'ls -l') that the file contains
10240020 bytes.  How do we store 10240020 bytes?  We split them into a
number of blocks.  What sizes are the blocks used by the file system?
Let's ask it:

$ stat -c "%o" file.nohole
4096

Ok, so the file system uses 4K blocks.  So we can ask ls(1) to report to
us the number of 4K blocks used by the file:

$ env BLOCKSIZE=4096 ls -l file.nohole
2505 -rw------- 1 jschauma professor 10240020 Sep 15 21:11 file.nohole

Which looks about right:

10240020 bytes is not evenly divisible by 4096, so we need more than
2500 blocks of 4K size.  2501 should suffice, but the file system likely
allocates additional blocks as size increases; let's call it overhead.

Similarly, the sparse file uses a little bit more space than it needs
to:  16384 bytes, when it could get away with a single block (since it
only uses 20 bytes, which would fit into a single 4K block).  Again, the
file system likely allocates a little bit of overhead, which is
negligible here.


Ooookay.  So now back to the NFS file system.  If you run the same
commands in your home directory, then you are not actually creating
files on the local disk.  Instead, you are creating a file on a network
file system that stores the data on a separate server somewhere.  How
sparse files are supported then depends on (a) the NFS file system as
well as (b) the remote file server's actual system.

Every time you read a file that's stored on NFS, you are effectively
making a call to another server saying "give me the data that's stored
on your file system as 'filename'".

So let's give this a try.  /home/jschauma is on an NFS share:

$ pwd
/home/jschauma/tmp
$ ./a.out
$ ls -ls file.hole
1 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole

So far, so good.  The sparse file was created, taking only a single
block.  What's the block size for this file system?

$ stat -c "%o" file.hole
1048576

This is a bit bigger (and may in fact be different for your NFS share
than mine).  For performance reasons, it is beneficial to set your NFS
to use a large blocksize, since the network is likely to be the
bottleneck, and not the file system.

Ok, so we use one large block.  Let's read the file:

$ cat file.hole >/dev/null
$ ls -ls file.hole
260 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole
$

Now this is weird.  Note that our file changed in the number of blocks
it used, even though we did not modify the file at all.

We asked NFS: "Hey, give me all the data in the file 'file.hole'."

NFS provided some data, and apparently changed its idea of how many
blocks that file occupies.  But remember that ls(1) is reporting 1K
blocks for '-s'.  So we can ask it again to tell us how many actual
blocks are used by using the file system's blocksize:

$ env BLOCKSIZE=1048576 ls -ls file.hole
1 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole
$

Now let's read that file and write the data to a second file:

$ cat file.hole > file.nohole
$ env BLOCKSIZE=1048576 ls -ls file.*
1 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole
1 -rw------- 1 jschauma professor 10240020 Sep 15 22:01 file.nohole

This seems reasonable so far.  But now things get weird:

$ cat file.nohole > file.nohole2
$ env BLOCKSIZE=1048576 ls -ls file.*
 1 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole
10 -rw------- 1 jschauma professor 10240020 Sep 15 22:01 file.nohole
 1 -rw------- 1 jschauma professor 10240020 Sep 15 22:02 file.nohole2

Reading 'file.nohole' changed its used size.

$ cat file.nohole > file.nohole3
$ env BLOCKSIZE=1048576 ls -ls file.*
 1 -rw------- 1 jschauma professor 10240020 Sep 15 21:56 file.hole
10 -rw------- 1 jschauma professor 10240020 Sep 15 22:01 file.nohole
10 -rw------- 1 jschauma professor 10240020 Sep 15 22:02 file.nohole2
 1 -rw------- 1 jschauma professor 10240020 Sep 15 22:02 file.nohole3

Even weirder: the used size of 'file.nohole2' changed even though we
didn't do anything with it at all.

This is the part where I'm going to use the word "interesting".  I can
only speculate that NFS reports blocks used in some way related to the
size of the directory directory, since block size only changes if we
create these files in the current directory.  I'm afraid I don't have a
full explanation, as my knowledge of how NFSv4 is implemented on Linux
is not sufficient.

Suffice it to say that support for sparse files heavily depends on the
underlying file system.

-Jan

P.S.: I'll answer the other questions in a separate email.


More information about the cs631apue mailing list