NCCOPY(1) UNIDATA UTILITIES NCCOPY(1)NAMEnccopy - Copy a netCDF file to specified variant of netCDF format,
optionally compressing or chunking data in the output copy.
SYNOPSISnccopy [-k kind ] [-d n ] [-s] [-c chunkspec ] [-u] [-w] [-[v|V]
var1,...] [-[g|G] grp1,...] [-m bufsize ] [-h chunk_cache ]
[-e cache_elems ] [-r] infile outfile
DESCRIPTION
The nccopy utility copies an input netCDF file in any supported format
variant to an output netCDF file, optionally converting the output to
any compatible netCDF format variant, compressing the data, or rechunk‐
ing the data. For example, if built with the netCDF-3 library, a
netCDF classic file may be copied to a netCDF 64-bit offset file, per‐
mitting larger variables. If built with the netCDF-4 library, a netCDF
classic file may be copied to a netCDF-4 file or to a netCDF-4 classic
model file as well, permitting data compression, efficient schema
changes, larger variable sizes, and use of other netCDF-4 features.
nccopy also serves as an example of a generic netCDF-4 program, with
its ability to read any valid netCDF file and handle nested groups,
strings, and user-defined types, including arbitrarily nested compound
types, variable-length types, and data of any valid netCDF-4 type.
If DAP support was enabled when nccopy was built, the file name may
specify a DAP URL. This may be used to convert data on DAP servers to
local netCDF files.
OPTIONS-k kind
Specifies the kind of file to be created (that is, the format
variant) and, by inference, the data model (i.e. netcdf-3 (clas‐
sic) versus netcdf-4 (enhanced)). The possible arguments are as
follows.
'1' or 'classic' => netcdf classic format
'2', '64-bit-offset', or '64-bit offset' => netCDF 64-bit
format
'3', 'hdf5', 'netCDF-4', or 'enhanced' => netCDF-4 format
(enhanced data model)
'4', 'hdf5-nc3', 'netCDF-4 classic model', or 'enhanced-
nc3' => netCDF-4 classic model format
If no value for -k is specified, then the output will use the
same format as the input, except if the input is classic or
64-bit offset and either chunking or compression is specified,
in which case the output will be netCDF-4 classic model format.
Note that attempting some kinds of format conversion will result
in an error, if the conversion is not possible. For example, an
attempt to copy a netCDF-4 file that uses features of the en‐
hanced model, such as groups or variable-length strings, to any
of the other kinds of netCDF formats that use the classic model
will result in an error.
-d n
For netCDF-4 output, including netCDF-4 classic model, specify
deflation level (level of compression) for variable data output.
0 corresponds to no compression and 9 to maximum compression,
with higher levels of compression requiring marginally more time
to compress or uncompress than lower levels. Compression
achieved may also depend on output chunking parameters. If this
option is specified for a classic format or 64-bit offset format
input file, it is not necessary to also specify that the output
should be netCDF-4 classic model, as that will be the default.
If this option is not specified and the input file has com‐
pressed variables, the compression will still be preserved in
the output, using the same chunking as in the input by default.
Note that specifying output deflation level with nccopy results
in all output variables compressed using the same compression
level, but the API has no such restriction. With a program you
can customize compression for each variable independently.
-s For netCDF-4 output, including netCDF-4 classic model, specify
shuffling of variable data bytes before compression or after de‐
compression. This option is ignored unless a non-zero deflation
level is specified. Turning shuffling on sometimes improves
compression.
-u Convert any unlimited size dimensions in the input to fixed size
dimensions in the output. This can speed up variable-at-a-time
access, but slow down record-at-a-time access to multiple vari‐
ables along an unlimited dimension.
-w Keep output in memory (as a diskless netCDF file) until output
is closed, at which time output file is written to disk. This
can greatly speedup operations such as converting unlimited di‐
mension to fixed size (-u option), chunking, rechunking, or com‐
pressing the input. It requires that available memory is large
enough to hold the output file. This option may provide a larg‐
er speedup than careful tuning of the -m, -h, or -e options, and
it's certainly a lot simpler.
-c chunkspec
For netCDF-4 output, including netCDF-4 classic model, specify
chunking (multidimensional tiling) for variable data in the out‐
put. This is useful to specify the units of disk access, com‐
pression, or other filters such as checksums. Changing the
chunking in a netCDF file can also greatly speedup access, by
choosing chunk shapes that are appropriate for the most common
access patterns.
The chunkspec argument is a string of comma-separated associa‐
tions, each specifying a dimension name, a '/' character, and
optionally the corresponding chunk length for that dimension.
No blanks should appear in the chunkspec string, except possibly
escaped blanks that are part of a dimension name. A chunkspec
must name at least one dimension, and may omit dimensions which
are not to be chunked or for which the default chunk length is
desired. If a dimension name is followed by a '/' character but
no subsequent chunk length, the actual dimension length is as‐
sumed. If copying a classic model file to a netCDF-4 output
file and not naming all dimensions in the chunkspec, unnamed di‐
mensions will also use the actual dimension length for the chunk
length. An example of a chunkspec for variables that use 'm'
and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200
chunks. To see the chunking resulting from copying with a
chunkspec, use the '-s' option of ncdump on the output file.
Note that nccopy requires variables that share a dimension to
also share the chunk size associated with that dimension, but
the programming interface has no such restriction. If you need
to customize chunking for variables independently, you will need
to use the library API in a custom utility program.
-v var1,...
The output will include data values for the specified variables,
in addition to the declarations of all dimensions, variables,
and attributes. One or more variables must be specified by name
in the comma-delimited list following this option. The list must
be a single argument to the command, hence cannot contain un‐
escaped blanks or other white space characters. The named vari‐
ables must be valid netCDF variables in the input-file. A vari‐
able within a group in a netCDF-4 file may be specified with an
absolute path name, such as default, without this option, is to
include data values for all variables in the output.
-V var1,...
The output will include the specified variables only but all di‐
mensions and global or group attributes. One or more variables
must be specified by name in the comma-delimited list following
this option. The list must be a single argument to the command,
hence cannot contain unescaped blanks or other white space char‐
acters. The named variables must be valid netCDF variables in
the input-file. A variable within a group in a netCDF-4 file may
be specified with an absolute path name, such as
'/GroupA/GroupA2/var'. Use of a relative path name such as
'var' or 'grp/var' specifies all matching variable names in the
file. The default, without this option, is to include all
variables in the output.
-g grp1,...
The output will include data values only for the specified
groups. One or more groups must be specified by name in the
comma-delimited list following this option. The list must be a
single argument to the command. The named groups must be valid
netCDF groups in the input-file. The default, without this op‐
tion, is to include data values for all groups in the output.
-G grp1,...
The output will include only the specified groups. One or more
groups must be specified by name in the comma-delimited list
following this option. The list must be a single argument to the
command. The named groups must be valid netCDF groups in the in‐
put-file. The default, without this option, is to include all
groups in the output.
-m bufsize
An integer or floating-point number that specifies the size, in
bytes, of the copy buffer used to copy large variables. A suf‐
fix of K, M, G, or T multiplies the copy buffer size by one
thousand, million, billion, or trillion, respectively. The de‐
fault is 5 Mbytes, but will be increased if necessary to hold at
least one chunk of netCDF-4 chunked variables in the input file.
You may want to specify a value larger than the default for
copying large files over high latency networks. Using the '-w'
option may provide better performance, if the output fits in
memory.
-h chunk_cache
For netCDF-4 output, including netCDF-4 classic model, an inte‐
ger or floating-point number that specifies the size in bytes of
chunk cache for chunked variables. This is not a property of
the file, but merely a performance tuning parameter for avoiding
compressing or decompressing the same data multiple times while
copying and changing chunk shapes. A suffix of K, M, G, or T
multiplies the chunk cache size by one thousand, million, bil‐
lion, or trillion, respectively. The default is 4.194304 Mbytes
(or whatever was specified for the configure-time constant
CHUNK_CACHE_SIZE when the netCDF library was built). Ideally,
the nccopy utility should accept only one memory buffer size and
divide it optimally between a copy buffer and chunk cache, but
no general algorithm for computing the optimum chunk cache size
has been implemented yet. Using the '-w' option may provide
better performance, if the output fits in memory.
-e cache_elems
For netCDF-4 output, including netCDF-4 classic model, specifies
number of elements that the chunk cache can hold. A suffix of K,
M, G, or T multiplies the copy buffer size by one thousand, mil‐
lion, billion, or trillion, respectively. This is not a proper‐
ty of the file, but merely a performance tuning parameter for
avoiding compressing or decompressing the same data multiple
times while copying and changing chunk shapes. The default is
1009 (or whatever was specified for the configure-time constant
CHUNK_CACHE_NELEMS when the netCDF library was built). Ideally,
the nccopy utility should determine an optimum value for this
parameter, but no general algorithm for computing the optimum
number of chunk cache elements has been implemented yet.
-r Read netCDF classic or 64-bit offset input file into a diskless
netCDF file in memory before copying. Requires that input file
be small enough to fit into memory. For nccopy, this doesn't
seem to provide any significant speedup, so may not be a useful
option.
EXAMPLES
Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
file of the same type:
nccopy foo1.nc foo2.nc
Note that the above copy will not be as fast as use of cp or other sim‐
ple copy utility, because the file is copied using only the netCDF API.
If the input file has extra bytes after the end of the netCDF data,
those will not be copied, because they are not accessible through the
netCDF interface. If the original file was generated in alignment, the
output file may have different padding bytes.
Convert a netCDF-4 classic model file, compressed.nc, that uses com‐
pression, to a netCDF-3 file classic.nc:
nccopy-k classic compressed.nc classic.nc
Note that '1' could be used instead of 'classic'.
Download the variable 'time_bnds' and its associated attributes from an
OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
nccopy 'http://test.opendap.org/opendap/data/nc/sst.mn‐
mean.nc.gz?time_bnds' tb.nc
Note that URLs that name specific variables as command-line arguments
should generally be quoted, to avoid the shell interpreting special
characters such as '?'.
Compress all the variables in the input file foo.nc, a netCDF file of
any type, to the output file bar.nc:
nccopy-d1 foo.nc bar.nc
If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
netCDF-4 classic model netCDF file, because the classic and 64-bit off‐
set format variants don't support compression. If foo.nc was a
netCDF-4 file with some variables compressed using various deflation
levels, the output will also be a netCDF-4 file of the same type, but
all the variables, including any uncompressed variables in the input,
will now use deflation level 1.
Assume the input data includes gridded variables that use time, lat,
lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
and that the time dimension varies most slowly. Also assume that users
want quick access to data at all times for a small set of lat-lon
points. Accessing data for 1000 times would typically require access‐
ing 1000 disk blocks, which may be slow.
Reorganizing the data into chunks on disk that have all the time in
each chunk for a few lat and lon coordinates would greatly speed up
such access. To chunk the data in the input file slow.nc, a netCDF
file of any type, to the output file fast.nc, you could use;
nccopy-c time/1000,lat/40,lon/40 slow.nc fast.nc
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
If you had enough memory to contain the output file, you could speed up
the rechunking operation significantly by creating the output in memory
before writing it to disk on close:
nccopy-w -c time/1000,lat/40,lon/40 slow.nc fast.nc
SEE ALSOncdump(1),ncgen(1),netcdf(3)Release 4.2 2012-03-08 NCCOPY(1)