NOAA Logo National Centers for Environmental Information

NOAA Satellite and Information Service

NCEI is transitioning to a new website and paths to data resources will be changing. Please contact NCEI.Info@noaa.gov with any questions of issues. See the new website at www.ncei.noaa.gov.

You are here: NODC > Ocean Climate Laboratory > OCL Products > WOD > NetCDF format description

World Ocean Database ragged array netCDF format

The World Ocean Database (WOD) officially archived version is provided in a ragged array netCDF format which follows the Climate-Forecast (CF) conventions.

Ragged array format is optimal for ocean profile data collections, such as WOD, which aggregate together oceanographic casts (collections of profiles taken at the same date/time/location; where a profile is a set of measurements of one ocean variable vs. depth/pressure). Different casts can have very different counts of depth/variable pairs for each profile (from 2 to 24,000 in the WOD), and from 1 to 26 variables with separate profiles in each cast. This renders standard array representation (max_depth_count x max_variable_count x number_of_casts) inefficient for oceanographic casts. Ragged array form has single dimension arrays for each profile variable which contain all the measurements for the given variable (see CF convention description of ragged array, specific to profile data). Ragged array form has a second array, a counting array (called VAR_row_size where VAR is the variable name), which gives the number of variable measurements for each cast. To get to the variable measurements for cast N, the (N-1) VAR_row_size counts are summed, and the pointer in array VAR is moved to this element position. The next VAR_row_size(N) elements in array VAR are the variable measurements for cast N. Note that variable z (depth) is always present and the indexed variable measurements for a particular cast are always associate with the same index for depth.

A trivial example: A file contains five oceanographic casts, each of which has profiles of depth/temperature and depth/salinity, one of which contains a profile of depth/oxygen. Only the fourth cast contains a profile of oxygen. The file has the following:

netcdf wod_example {
dimensions:
        casts = 5 ;
        z_obs = 25 ;
        Temperature_obs = 25 ;
        Salinity_obs = 25 ;
        Oxygen_obs = 5 ;
: : : : : : : : : : 
variables:
     float lat(casts) ;  
     float lon(casts) ;
     double time(casts) ;
     float z(z_obs) ;
     int z_row_size(casts) ;
     float Temperature(Temperature_obs) ;
     int Temperature_row_size(casts) ;
     float Salinity(Salinity_obs) ;
     int Salinity_row_size(casts) ;
     float Oxygen(Oxygen_obs) ;
     int Oxygen_row_size(casts) ;
: : : : : : : : : : : :
z_row_size =  5, 5, 5, 5, 5 ;
Temperature_row_size = 5, 5, 5, 5, 5;
Salinity_row_size = 5, 5, 5, 5, 5;
Oxygen_row_size = _, _, _, 5, _ ;
}

Note that `_' for VAR_row_size is a missing value. Fill value is set to zero (0). To read in the fourth cast (N=4), skip the first 15 elements in variables z, Temperature, and Salinity. (N-1)=3, VAR_row_size(1)+VAR_row_size(2)+VAR_row_size(3)=5+5+5=15 for VAR=z,Temperature,Salinity. For Oxygen, Oxygen_row_size(1)=Oxygen_row_size(2)=Oxygen_row_size(3)=0, so read from the first value in array Oxygen (position 0 in the array).

For all variables, VAR_row_size(4)=5, so the next 5 values are read from each VAR array (elements 16-20 in arrays z,Temperature,Salinity; elements 1-5 in array Oxygen). VAR_row_size(N) will always be either equal to z_row_size(N) or equal to 0, the latter only in cases where the particular variable was not measured in cast N. All variables present in a cast will have a one-to-one correspondence with the depth (z) for that cast: z(cast=4,element=1) corresponds to Temperature(cast=4,element=1), Salinity(cast=4,element=1), Oxygen(cast=4,element=1). In the ragged array representation then, z(16) corresponds to Temperature(16), Salinity(16), Oxygen(1) the separate VAR_row_size must be accounted for. Oceanographic casts are complex. Describing the ocean environment requires multiple profile variables associated with depth (z). But all profile variable elements must be associated not only with depth (z), but with cast specific variables such as latitude, longitude, date/time. Further, other cast specific measurements such as bottom depth, wave height, wind speed, etc. help to contextualize the ocean profile variables to describe the ocean environment. Other information, such as ship name, primary investigator, cruise identifier, etc. are important to identify and assess the ocean profile data. It is important to keep all of this information together for each cast and so for the aggregate oceanographic cast file provided to users. It is also important, even with today's system capacities to minimize file size when possible. This is the reason behind using a ragged array format. It is also important to use accepted standards in order to make sure the data are widely accessible. This is why the CF standard has been followed. Two points of the CF standard for contiguous ragged array netCDF are problematic for the efficient arrangement of oceanographic cast, file size, and inclusion of all necessary variables together and are not followed. The first is that all ocean profile variables do not have the same array size, each ocean profile variable has an array size (VAR_obs) commensurate with the number of measurements of the variable itself (VAR). All variables are still associated with the cast depth through the VAR_row_size counter. The second is that there are arrays of variables both ocean profile and other ocean environment descriptors with different axes. For instance, the ocean profile variables are arranged along the depth axis (and the cast axis) while ocean state variables are arranged only along the cast axis.