tarsieve - filter, then list or split, a tar file or tar data stream
tarsieve selects a set of records from a tar file or tar data stream and then either lists them or emits them to one or more tar files, or as a single tar data stream to stdout. tarsieve is primarily designed for working with collections of data files which have been stored in a tar archive, and where only some of the archive’s records are needed for a partiular task. This subset may be selected by name, modification time, size, and so forth. There are options to eliminate path information from names and to eliminate the data payload from files. tarsieve does not itself create files from a tar file or tar data stream, tar must be employed for that purpose.
tarsieve may be obtained as part of the drm_tools package from: http://sourceforge.net/projects/drmtools/
On many operating systems tarsieve must be compiled with large file support if it is to read or write files above a few gigabytes in size. If compiled with the gcc compiler add the command line switches -D_LARGE_FILE_SOURCE -D_FILE_OFFSET_BITS=64 to include large file support. The -i option will show whether or not large files are supported.
When tarsieve is used to split a tar file or tar data stream the default output file name template is frag%.3d.tar. The value of the environmental symbol TARSIEVETEMPLATE overrides this default.
GNU long name records are processed differently than the others. These are always read and stored. They are emitted before the following record only when the following record has been selected.
Tar files end with two or more null blocks. These are always ignored on input and new ones are written at the end of output tar files or tar data streams. This allows tarsieve to accept concatenated tar files as input. (See the EXAMPLES )
All tests are optional, and if present, are applied in the order shown below. Records that fail any test are dropped, others are retained.
-fnge,-fnle <RECNUM> Record number [>= , <=] RECNUM. Records are numbered 1 to N.
-fdge,-fdle <DATE> Records with date [>= , <=] DATE (YYYY-MM-DD hh:mm:ss).
-fsge,-fsle <SIZE> Records with size [>= , <=] SIZE (decimal size in bytes).
-fgeq <GID> Records with gid == GID (octal value).
-fueq <UID> Records with uid == UID (octal value).
-fgrp <GROUP> Records with gname == GROUP (a string like users ).
-fown <OWNER> Records with uname == OWNER (a string like John ).
-frec <TYPE> Records with record type == TYPE. TYPE is a comma dilimited list of tar record types:
all All of the following
reg Regular file
lnk Hard link
sym Symbolic link
chr Character special
blk Block special
fifo Named pipe
cont Contiguous file
other all other types
-ffre <REGEXP> Records whose file name matches the PCRE regular expression.
-fdre <REGEXP> Records whose directory name matches the PCRE regular expression. The tar record name field actually holds both the file name and the directory name. Two filters are provided so that each part of the name may be targeted separately
-n <STREAMS> Set the number of tar streams to produce, STREAMS must be >0. (Default = 1).
-p <PHASE> PHASE = 0 or default, emit STREAMS streams to STREAMS files.
1 <= PHASE <= STREAMS, only emit contents of the stream specified by PHASE to stdout.
-c <COUNT> Emit COUNT tar records for each PHASE to each stream (Default=1).
-l Emit listing information to stderr.
-lx Like -l, but also block all other output.
-lf <FIELDS> FIELDS is a comma delimited list of fields to show in a -l or -lx listing. Fields are:
all all fields
nam file name
mod file protection mask
uid user id
gid group id
siz file size
typ record type
The default FIELDS value is: siz,typ,nam
-in <FILE> Read input from FILE. (Default or -, input is from stdin.)
-z Zero record size. Selected files are emitted but their contents are dropped. (Default, retain file contents.)
-np No Path. Remove the path from the record name. If nothing remains, drop the record. (Default, retain paths in file names.)
-h -help --help -? --?? Print the help message. (Default - do not print help message.)
-i Emit version, copyright, license and contact information.( Default - do not emit information.)
% tarsieve -h List the the command line options.
% tarsieve -in a.tar -n 4 -p 0 -c 5 ( a.tar holds 20 tar records.) Split the input into four output files. Records 1-5 go to frag001.tar, 6-10 to frag002.tar, 11-15 to frag003.tar, and 16-20 to frag004.tar.
% tarsieve -in a.tar -n 4 -p 2 -c 3 ( a.tar holds 20 tar records.) Write records 4-6 and 16-17 to stdout.
% cat a.tar b.tar c.tar | tarsieve -lx -lf typ,nam List the concatenated contents of the three tar files, showing only the record type and name fields.
% tarsieve -in a.tar -fuid 0 -fgid 1 -frec reg -np >b.tar Select from a.tar only those records with uid 0, gid 1, and containing a regular file. Remove the path part of the name field from these records and store the results in b.tar.
GNU General Public License 2
Copyright (C) 2013 David Mathog and Caltech.
David Mathog, Biology Division, Caltech <firstname.lastname@example.org>
|drm_tools||tarsieve (1)||1.0.0 JUN 24 2013|