def disk_usage(paths):
cmd = ['du', '-B1', '-s', '-D'] + [os.path.expanduser(p) for p in paths]
P = subprocess.run(cmd, capture_output=True, encoding='utf-8')
out = {}
for line in P.stdout.split('\n'):
s = line.split('\t')
if len(s) == 2:
out[s[1]] = int(s[0])
return out
But don’t (physical on-disk) disk block sizes vary over time as drives get bigger? I remember when a HDD block size was 32K, now it might be 512K or more.
How would one find the block size for an individual drive so this routine would work for 30 years?
Are block sizes managed differently for HDD and SDD?
I was looking for a way to estimate the disk space in use by a file,
Somehow on my Windows system I have du from https://www.sysinternals.com. Hm, but that’s just for a directory size only, not for a single file size on the physical disk.
Or maybe take a look at the gnu utils du source code. Start somewhere around here: Coreutils - GNU core utilities
Chuck R.:
But don’t disk block sizes vary over time as drives get bigger? I remember when a HDD block size was 32K, now it might be 512K or more.
In these cases, there should be a separate .blksize attribute to tell you the block size.
I have to say, it’s quite strange not to see an obvious, high-level interface for this in the standard library.
Chuck R.:
But don’t disk block sizes vary over time as drives get bigger? I remember when a HDD block size was 32K, now it might be 512K or more.
How would one find the block size for an individual drive so this routine would work for 30 years?
Karl Knechtel:
In these cases, there should be a separate .blksize attribute to tell you the block size.
To be clear, st_blocks isn’t measured in units of filesystem blocksizes or in units of st_blksize (all three are uncorrelated).
The units of st_blocks aren’t POSIX-standardized, but it’s often blocks of 512-bytes. Often enough that the Python docs simply document it as “Number of 512-byte blocks allocated for file.”
The actual filesystem blocksize is found in statvfs.f_bsize, which has the Python analog os.statvfs("...").f_bsize.
Per sys/stat.h:
st_blksize - A file system-specific preferred I/O block size for this object. In some file system types, this may vary from file to file.
st_blocks - Number of blocks allocated for this object.
The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.
The value that st_blocks reports has no direct relation to the underlying filesytem block size. Per the coreutils source code linked in my post above, du assumes that st_blocks is in units of 512 bytes unless you redefine it to some other value with a macro at compile time.