mp_hashdirectory.sas File Reference

Returns a unique hash for each file in a directory. More...

Go to the source code of this file.

Detailed Description

Hashes each file in each directory, and then hashes the hashes to create a hash for each directory also.

This makes use of the new hashing_file() and hashing functions, available since 9.4m6. Interestingly, those functions can be used in pure macro, eg:

%put %sysfunc(hashing_file(md5,/path/to/file.blob,0));

Actual usage:

%let fpath=/some/directory;

%mp_hashdirectory(&fpath,outds=myhash,maxdepth=2)

data _null_;
  set work.myhash;
  put (_all_)(=);
run;

Whilst files are hashed in their entirety, the logic for creating a folder hash is as follows:

  • Sort the files by filename (case sensitive, uppercase then lower)
  • Take the first 100 hashes, concatenate and hash
  • Concatenate this hash with another 100 hashes and hash again
  • Continue until the end of the folder. This is the folder hash
  • If a folder contains other folders, start from the bottom of the tree - the folder hashes cascade upwards so you know immediately if there is a change in a sub/sub directory
  • If a subfolder has no content (empty) then it is ignored. No hash created.
  • If the file is empty, it is also ignored / no hash created.
  • If the target directory (&inloc) is empty, &outds will also be empty

SAS Macros

Related Files

Parameters
[in]inlocFull filepath of the file to be hashed (unquoted)
[in]iftrue=(1=1) A condition under which the macro should be executed
[in]maxdepth=(0) Set to a positive integer to indicate the level of subdirectory scan recursion - eg 3, to go ./3/levels/deep. For unlimited recursion, set to MAX.
[in]method=(MD5) the hashing method to use. Available options:
  • MD5
  • SH1
  • SHA256
  • SHA384
  • SHA512
  • CRC32
[out]outds=(work.mp_hashdirectory) The output dataset. Contains:
  • directory - the parent folder
  • file_hash - the hash output
  • hash_duration - how long the hash took (first hash always takes longer)
  • file_path - /full/path/to/each/file.ext
  • file_or_folder - contains either "file" or "folder"
  • level - the depth of the directory (top level is 0)
Version
9.4m6
Author
Allan Bowe

Definition in file mp_hashdirectory.sas.