WOCE Logo  World Ocean Circulation Experiment
  Global Data Resource
 
   Home   |   Documents   |   Software/Resources   |   Help
 
 Inventories
 WOCE Bibliography
 WOCE Data and Summaries
     Updates
  Bathymetry
     Summary
     Data
  Updates
  Current Meters
     Summary
     Data
  Updates
  Drifters
     Summary
     Data
  Updates
  Sub Surface Floats
     Summary
     Data
  Updates
  Air-Sea Fluxes
     Data
  Updates
  Hydrography
     Summary
     Data
  Updates
  Profiling Floats
     Summary
     Data
  Updates
  Satellite Sea Level
     Summary
     Data
  Updates
  Satellite Sea Sfc Temp  
     Summary
     Data
  Updates
  Satellite Surface Winds
     Summary
     Data
  Updates
  Sea Level - Delayed Mode
     Summary
     Data
  Updates
  Sea Level - Fast Return
     Summary
     Data
  Updates
  Sea Surface Salinity
     Summary
     Data
  Updates
  Shipboard ADCP
     Summary
     Data
  Updates
  Surface Meteorology
     Summary
     Data
  Updates
  Upper Ocean Thermal
     Summary
     Data
  Updates
 Search Tool
  Updates
 Global Data History
 
netCDF Primer   | Software   | netCDF Online hyperlink  | WOCE Conventions

Appendix 2: Example matlab script

function [data] = read_woce_netcdf(file_name)
% read_woce_netcdf.m
%
% General purpose script to load all of a netcdf file
% using getnc. This is supplementary to the help provided in the
% netCDF primer on the WOCE V3 DVDs. It calls the matlab
% libraries described in the primer. These must be set up first.
%
% Example:
%
% data = read_woce_netcdf('my_file.nc');
%
% will return a structure array named data.
%
% It is necessary to include the trailing .nc in the filename.
%
%
% If the data have been read into a structure array named data
% as in the example above
% then this can be unpacked to a set of single variables with
% the following code:
%
% sname = 'data' % or whatever the structure array is called.
% eval(['struct_var_names = fieldnames(' sname ') ']);
% for i = 1:length(struct_var_names)
% eval([struct_var_names{i} ' = ' sname '.' struct_var_names{i} ' ;']);
% end
%
%
% This script has some extra code to catch a few special cases.
% If this code is not required, it could be commented out to increase
% speed. The extra code is only required for
% sss, pfloat, woce-ssf, sat_sl, sat_sst, sat_mwf
%
% Brian King, SOC, 23 Sepetember 2002.
%

% There has not been time for anyone else to review these notes
% between BAK drafting them and their inclusion on the WOCE DVDs.
% The author therefore apologies for any innacuracies. I hope
% they are more of a help than a hindrance. As a netCDF novice,
% it took me several; days to figure out quite why some things
% weren't working as I expected.

% There are two sets of optional code to catch cases where
% the basic script doesn't work correctly. By default, these are
% left active. So the full script works on all files on which
% I have tested it. The optional code can be commented out
% for better speed. But there coudl be unexpected combinations
% of variables and attributes which don't work. Proceed with
% caution when first loading new data types.

% Apart from the optional code, there are very few
% executable lines. These are reproduced here in a block
% to make it easy to see what is going on. The error traps
% have also been left out of this summary.
%%%%%
%%%%% function [data] = br(file_name)
%%%%% ncdf_obj = netcdf(file_name);
%%%%% var_names = ncnames(var(ncdf_obj));
%%%%% close(ncdf_obj);
%%%%% num_vars = length(var_names);
%%%%% disp(['file_name = ''' file_name ''''])
%%%%% for ivar = 1:num_vars
%%%%% name = var_names{ivar};
%%%%% command = ['data.' name ' = getnc(file_name,''' name ''');'];
%%%%% disp(command)
%%%%% eval(command);
%%%%% end
%%%%%
%
% General notes on use of the script:
%
% The variable file_name is the argument of the function.
%
% File_name should be of the form 'example.nc' and include
% the trailing .nc
%
% This script worked OK for me in Matlab 6 on
% a Sun unix workstation. I'm afraid there hasn't been time
% to test it on any other platforms.
%
% One way to read the data from a file is to use the
% ncload command. This is illustrated below. But ncload
% doesn't take account of the variable attributes, such
% as valid_min/max/range, _FillValue or missing_value.
%
%
% Once we have the list of variable names, use getnc to load
% them properly; getnc is supposed to take account of the variable
% attributes, and substitute NaN where data match a fill vlaue
% or missing value, or are outside a valid range.
%
% -------------------------------------------------
% See end of script for some important comments about
% getnc, which concern the following directories
%
% sss ) Problems swapping missing values in char arrays to NaN.
% pfloat )
% woce-ssf )
%
% sat_sl ) Issues concerning use of scale factor in some variables.
% sat_sst )
% sat_mwf1 )
% sat_mwf2 )
% -------------------------------------------------
%
%
%
% If getnc is causing problems that can't be solved,
% for example incorrectly changing
% out_of_valid_range or missing_values to NaN ,
% then an alternative strategy is to use ncload
% to read in the 'raw' values, and write further simple
% code to fix particular variables.
%
% If speed, efficiency, or memory is important, then you could find
% out the list of variables that you require, and insert
% the particular getnc commands that you require, rather than
% loading all variables.
%
% The relevant getnc comamnds are displayed to the screen when the
% program is run. This could be used for building new scripts.
%


%---------------------------------------------------------
% script starts here with some error traps
%---------------------------------------------------------

if nargin ~= 1
%file_name was not supplied. Ask for it.
file_name = input('Type the file name : ','s')
if length(file_name) == 0
disp('You must supply precisely one argument')
disp('in the function call or in response to the prompt.')
disp('Type')
disp(' ')
disp(['help ' mfilename])
disp(' ')
return
end
end
if exist(file_name,'file') ~= 2
disp(['The file named ' file_name ' does not exist'])
return
end

%---------------------------------------------------------
% First step is to obtain variable names from the file.
% One option would be
%var_names = ncload(file_name);
% This would load all the variables from the file, and put the
% list of variable names into var_names. But this would make no
% use of the fill values or valid ranges.
% Instead, use the netcdf utility.
%---------------------------------------------------------

% var_names = ncnames(var(netcdf(file_name)));
%
% The above construct works fine to get var_names, but it seems to leave the
% netcdf file open at the end of the function.
% If the script is used on a large number of files,
% (218 in my case) matlab eventually complains. Therefore we must
% use a netcdf close command.

ncdf_obj = netcdf(file_name);
var_names = ncnames(var(ncdf_obj));
close(ncdf_obj);

% var_names is a cell array with a list of variable names
% as character strings
% Also loads the variables

num_vars = length(var_names);

% number of variables

disp(['file_name = ''' file_name ''''])

for ivar = 1:num_vars
name = var_names{ivar};

% take each name in turn

command = ['data.' name ' = getnc(file_name,''' name ''');'];

% Use getnc to load the data into a structure
% array called data. The structure array makes it easy
% to pass data out of the function.

% A simpler construct would be
% command = ['data.' name ' = getnc(file_name,name);'];
% but the echo of the command to the screen wouldn't be so helpful

% The basic simple form of the getnc command would be
% depth = getnc('example.nc','depth');
% to read a variable called 'depth' from a
% file called 'example.nc'

% There are two sets of conditions for which the above command
% doesn't produce the correct results. These are discussed extensively
% in comments at the end of this script. To handle these without
% error, some non-default options are required on getnc. In this
% default version of this script, these options are included. When not
% using the particualr datasets, the extra lines can be commented out,
% which will increase speed and efficiency.

%---------------------------------------------------------
%
% Special case 1.
%
% Some data files define an attribute scale_factor for one
% or more variables.
% getnc recognises this attribute, and then has to decide whether
% to scale the data, and possibly other attributes, such as
% fill value or valid range. The default action is
% to scale data AND attributes. See help getnc for details.
% For all the WOCE V3 datasets identified by BAK, which have defined
% both scale_factor and valid range attributes, valid range attributes
% are always in scaled units rather than unsacled units.
% Therefore these attributes must NOT be scaled again by getnc.
% Therefore, we need to set non-default action for getnc.
% The datasets concerned are: SAT_SL, SAT_SST and SAT_MWF.
% Note that woceflux uses scale_factor, but not valid range,
% so the default values of getnc are OK.
% Where needed, change rescale_opts in getnc from [1 1] to [1 0].

% command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1 0]);'];

%---------------------------------------------------------
% The next segment of code can be commented out for some data files.

[att_vals att_names] = attnc(file_name,name);

att_num = length(att_names);

clear i_scale i_min i_max i_range

for i_att = 1:att_num
if strcmp(att_names{i_att},'scale_factor')
i_scale = 1;
end
if strcmp(att_names{i_att},'valid_min')
i_min = 1;
end
if strcmp(att_names{i_att},'valid_max')
i_max = 1;
end
if strcmp(att_names{i_att},'valid_range')
i_range = 1;
end
end

if exist('i_scale') & (exist('i_min') | exist('i_max') | exist('i_range'))
command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1 0]);'];
end

clear i_scale i_min i_max i_range

%end of code fragment
%---------------------------------------------------------
%
% Special case 2.
%
% getnc searches for attributes called _FillValue or missing_value.
%
% Where these attributes occur, the data values are compared to the
% attribute value. When a match is found, the data value is set to NaN.
% If the variable or attribute are of class char (ie character), then the
% logic is flawed. The Matlab result could be an error, or a warning with
% unpredictable matlab action. In some cases tested by BAK, the
% character variable was set to a blank, which was not always
% the intention.
% The solution in these cases (_FillValue or missing value of type char)
% is to suspend checking against fill or missing values.

% command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);'];
%
% Directories requiring the fix:
%
% sss
% pfloat (trajectory files)
% woce-ssf

%---------------------------------------------------------
% The next segment of code can be commented out for some data files.

[att_vals att_names] = attnc(file_name,name);

att_num = length(att_names);

clear i_fill_miss

for i_att = 1:att_num
if strcmp(att_names{i_att},'_FillValue') & strcmp(class(att_vals{i_att}),'char')
i_fill_miss = 1;
end
if strcmp(att_names{i_att},'missing_value') & strcmp(class(att_vals{i_att}),'char')
i_fill_miss = 1;
end
end

if exist('i_fill_miss')
command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);'];
end

clear i_fill_miss


%end of code fragment
%---------------------------------------------------------

%---------------------------------------------------------
% end of special cases
%---------------------------------------------------------

disp(command)
% echo to the screen the commands used

eval(command);
% execute command to load this variable

end


%---------------------------------------------------------
% end of script. The remainder is comment and explanation.
%---------------------------------------------------------

%------------------------------------------------------
% The script above was tested for at least one file from every
% WOCE DAC. Where a Data Centre has created files of several
% different types for their various data sets, I have tried
% to test one file of each type. But I cannot claim
% that testing has been exhaustive.
% In particular, The above script read files from
% the following directories. As far as I can tell the results were
% correct. Reading from the other directories will require small
% modifications, described below.
%
% bathymetry
% cmdac
% pfloat (profiles & trajectories file)
% sadcp
% sat_mwf1
% sat_mwf2
% sat_sl
% sat_sst
% sl_fast
% slevel_dm
% sss
% svp
% whp (ctd & bottle file)
% woce-ssf
% woce-uot
% woceflux (several file types)
% wocemet
%
% There is extra code to catch two special cases where the
% default action of getnc was not satisfactory. These were:
%
% 1) correct handling of scaling
% 2) Correct handling of 'fill values' on character strings
%
%
% ------------------------------------
% Correct handling of scaling
% ------------------------------------
%
% Directories involved
%
% sat_mwf1
% sat_mwf2
% sat_sl
% sat_sst
% woceflux
%
% ***** The background:
%
% Some datasets are stored as integers, with a scaling factor.
% The scaling factor is stored as an attribute of the variable.
% For example, sat_sl stores sea_level as an integer
% in the range +/- 32767, with a scaling factor of 0.001
% This saves on storage.
%
% ***** The problem:
%
% getnc must take the correct action when reading in the data.
%
% When getnc reads data, it has separate swithces that tell
% it whether or not to scale the data, and whether or not to scale
% the attributes. The default action is to scale both data and attributes.
% This is not always correct.
%
% Variable attributes may include missing_value, _FillValue, valid_min
% valid_max, valid_range.
%
% After the data and attributes have been scaled (or not, depending
% on the switch settings) the data are compared with the attributes
% and set to NaN if they match the fill/missing value or are outside
% the valid range.
%
% Different DACs have used different conventions about whether or not
% attributes are scaled, so special action is required.
%
% ***** The solution:
%
% getnc controls scaling with an argument rescale_opts. This
% is an array of length 2. The first element controls
% scaling of data. The second element controls scaling of
% attributes. The default is [1 1] = yes and yes.
%
% Unfortunately, rescale_opts is the 10th argument of getnc.
% Fortunately, the unused options can be defaulted to -1.
%
% Thus the default action is equivalent to
%
% rescale_opts = [1 1]
% sea_level = getnc('example.nc','sea_level',-1,-1,-1,-1,-1,-1,-1,rescale_opts);
%
% Here is the convention used by DACs for scaling
% and the required setting of rescale_opts:
%
% *** sat_mwf: use rescale_opts = [1 0]
% Many variables use scaling
% Valid_range in attributes is in scaled units
% Fill and missing values in attributes are in unscaled units
% Example
% short zonal_wind_speed(latitude, longitude) ;
% zonal_wind_speed:data_min = -9.75 ;
% zonal_wind_speed:data_max = 11.3 ;
% zonal_wind_speed:scale_factor = 0.01 ;
% zonal_wind_speed:valid_min = -60. ;
% zonal_wind_speed:valid_max = 60. ;
% zonal_wind_speed:missing_value = 32767s ;
% zonal_wind_speed:_FillValue = 32767s ;
%
%
% The use of a scaled valid_range means that the attributes
% MUST NOT be scaled again when reading in by getnc.
% Otherwise the valid range becomes +/- 0.60,
% and nearly all data fall outside the
% new valid_range and are replaced by NaNs.
%
% getnc therefore compares scaled data (eg 32.767)
% with unscaled fill and missing values,
% and fails to match them. However, the data that are filled
% or missing values (read in and converted to 327.67)
% lie outside the valid range (+/- 60), so they are caught that way.
%
%
% *** sat_sl and sat_sst: use rescale_opts = [1 0]
% One variable in each dataset uses scaling
% Valid range in attributes is in scaled units
% Fill and missing values in attributes are in scaled units
% Example
% short sea_level(latitude, longitude) ;
% sea_level:data_min = -0.259f ;
% sea_level:data_max = 0.37f ;
% sea_level:valid_min = -1.5f ;
% sea_level:valid_max = 1.5f ;
% sea_level:_FillValue = 32.767f ;
% sea_level:missing_value = 32.766f ;
% sea_level:scale_factor = 0.001f ;
%
%
% The use of a scaled valid_range means that the attributes
% MUST NOT be scaled again when reading in by getnc.
% Otherwise the valid range becomes +/- 0.0015,
% and nearly all data fall outside the
% new valid_range and are replaced by NaNs.
%
% getnc compares scaled data with fill and missing values of 32.767
% and 32.766. Unfortunately, as far as I can tell, the match does not
% work properly in matlab. The data are read in as integers (type 'short'
% in netCDF) and then scaled by 0.001. The attribute is read in as
% type 'float', and results in a slightly different value due to machine
% error. However, the data that are filled or missing
% values (read in and converted to 32.767 and 32.766)
% lie outside the valid range, so they are caught that way.
%
%
% *** woceflux: use rescale_opts = [1 1], so default is OK.
% In some datasets, two variables use scaling
% Valid_range is not used
% Fill and missing values in attributes are in unscaled units
% Example
% short u(lat, lon) ;
% u:scale_factor = 0.1f ;
% u:missing_value = 32767s ;
%
% Since there is no valid_range to be checked, the attributes
% MUST be scaled when they are read in by getnc.
%
% If a data value is read in as an integer 'short' value
% of 32767, this is scaled by 0.1 and compared with
% the missing_value of 32767, which must also be scaled by 0.1.
%
% Therefore the default value of rescale_opts is correct for these
% files.
%
%
% -----------------------------------
% Correct handling of fill values on character strings
% ------------------------------------
% The solution, given in detail after the description of the problems, is to switch
% off missing value checking in getnc for these variables. This solution
% works for all the problems of this nature I have discovered.
%
% Directories affected
%
% sss
% pfloat (trajectory files)
% woce-ssf
% *** woce-ssf
%
% The problem here concerns attributes of type character, where
% a DAC has defined a missing value or a fill value.
%
% First, we must understand how character variables are stored in matlab
% and (I believe) netCDF, though I'm not a netCDF expert.
%
% For example, in a file from woce-ssf
%
% dimensions:
% time = UNLIMITED ; // (424 currently)
% kstatus = 6 ;
% variables:
% char cstatus(time, kstatus) ;
% cstatus:long_name = "status/method values" ;
% cstatus:missing_value = " " ;
%
%
% So variable cstatus is a 2-D array. Each element
% of the 2-D array is a SINGLE CHARACTER.
% One dimension is time. The intention is that the other
% dimension allows character strings of length 6 to
% provide a status variable.
%
% When getnc attempts ro read the attribute cstatus:missing_value,
% it crashes. It complains that the missing value must be a single
% character. I suggest that the problem here is in the netCDF file.
% If the dimension is 424x6, the missing value is a single blank
% not a string of 6 blanks.
%
%
% *** sss and pfloat trajectory files
%
%
% There is a more general problem with how getnc handles character variables and
% missing/fill values.
%
% sss and pfloat trajectory files both attempt to
% define a _FillValue for character variables.
%
% Example
%
% dimensions:
% n_measurement = UNLIMITED ; // (1628 currently)
% string4 = 4 ;
% n_history = 1 ;
% variables:
% char history_software_release(n_measurement, n_history, string4) ;
% history_software_release:_FillValue = " " ;
%
%
% The problem here is that the variable is read in to a matlab char array
% of dimension 1628 x 1 x 4. These values are then tested against the
% fill value of " ". If any values match, getnc attempts to set them
% to NaN, which is an error for a single element of a char array.
% My version of matlab gives a warning, and sets the element in the
% char array to blank. If the _FillValue was blank anyway, this is not
% a problem. However, if the _FillValue was a character zero or nine, it becomes
% character blank, which is clearly not what the DAC intends.
%
% *** Solution:
% Read in these variables using ncload, or
% using getnc with the option to substitute
% missing values switched off. see help getnc for details. Note the use
% of 1, rather than the default, in the seventh argument.
% Example for pfloat trajectory file:
%
% temp_corrected_qc = getnc(file_name,'temp_corrected_qc',-1,-1,-1,-1,1);
%
% some code has been inserted in the script which catches all
% the cases I know about. It can be commented out if not required.
%
% Here is a list of attributes that showed up in my test files that
% would cause a problem if any elements of the char arrays matched the FillValue.
% I therefore recommend users of these files to substitute appropriate getnc commands
% of the form illustrated above.
%
% pfloat/trajectory file
% trajectory_parameters:_FillValue = " " ;
% inst_reference:_FillValue = " " ;
% positioning_system:_FillValue = " " ;
% juld_qc:_FillValue = "0" ;
% position_accuracy:_FillValue = "9" ;
% position_qc:_FillValue = "0" ;
% temp_corrected_qc:_FillValue = "0" ;
% psal_corrected_qc:_FillValue = "0" ;
% cndc_corrected_qc:_FillValue = "0" ;
% pres_corrected_qc:_FillValue = "0" ;
% history_software_release:_FillValue = " " ;
% sss
% woce_date:_FillValue = " " ;
% woce_time:_FillValue = " " ;
% position_qc:_FillValue = "0" ;
% sst_qc:_FillValue = "0" ;
% sss_qc:_FillValue = "0" ;
% woce-ssf
% cstatus:missing_value = " " ;
%
%
% ------------------------------------------------------