WOCE DATA ASSIMILATION

V. WOCE DATA ASSIMILATION

V.1. Introduction

Modelling of the ocean has progressed in less than a decade from single basin simulations to global eddy-resolving models, using varying formulations. Much development remains, as described in Chapter IV, including thorough model testing and comparison, but a variety of models of differing complexity are available and being used now for data assimilation. Data assimilation in turn can provide model evaluation tests.

"Data assimilation" in the meteorological context refers principally to combining data with a numerical model to produce first a better estimate of the fluid state, and then a prediction of later behaviour. In the large-scale oceanographic context, the emphasis is on estimating the state of the circulation during WOCE in a dynamically consistent manner. However, the phrase "data assimilation" is used widely for the techniques being developed for oceanography and so it is used here.

Ocean data assimilation is in an early developmental stage, but its eventual success will be important to WOCE and to the field in general. The practitioners are testing advanced methods on models of increasing complexity, and are taking advantage of new computer technology - both hardware and software. The biggest problems include: lack of skill in the underlying model (the lack arising from both poor initial data and dynamical deficiencies), a poor knowledge of the statistics of forcing errors and parameterisation errors, lack of resolution and a paucity of computer power, and lack of manpower. Progress must be made as a matter of urgency since assimilation reveals model deficiencies, and also provides for assessment of observing systems.

WOCE data assimilation will produce regional and global maps of oceanic fields, using rational combination of observations, dynamics and other information. It is hoped that these mappings will lead to ocean model improvement (WOCE Goal 1): mapping formalism can indicate model shortcomings automatically, while subjective analysis of the maps should also suggest model improvements.

A special and serious challenge for WOCE is the assimilation of data which represent the longest time scales of the ocean circulation. These are mainly the large-scale smoothed hydrography, time-averaged upper ocean data, and time-averaged deep float velocities, as opposed to the time-dependent data sets which sample the seasonal and interannual variability in the upper ocean. One of the WOCE goals is to assess the extent to which it can be assumed that such smoothed and averaged data represent a long-term mean, and another of the WOCE objectives it to estimate the variability in the long-term mean. The smoothed or averaged data sets should most likely be assimilated into steady models since the computational challenge of running an ocean model to near-equilibrium is likely insurmountable in the immediate future. The approach is described in section V.4.7.d.

The following sections describe the objectives of WOCE data assimilation, assess current methods of assimilation of oceanic data into models, and assess resource requirements and needed developments over the next few years.

V.2. Objectives

WOCE data assimilation will have three strands:

1. production of an ocean atlas as a rigorous application of inverse methods with simplified dynamics;

2. crude assimilation of data into high-resolution dynamically-complex models, in order to map the variability ignored in (1);

3. statistically-defensible assimilation into coarse-resolution dynamically-complex models, for model testing, model improvement and array design.

The three strands are listed in order of increasing complexity and resource needs, and are also in order of temporal priority. An ultimate goal of these efforts is assimilation with accurate estimation methods into high resolution models, but this is unlikely to be realised until methods are further developed through the three stated directions, and until computer power increases.

The scientific objectives of WOCE data assimilation are:

(1) the simultaneous and dynamically consistent estimation of all fields (including model parameters), observed and unobserved;

(2) the reassessment of errors in forcing, parameters, initial conditions, boundary conditions and data, as a result of the finding of misfits or the making of corrections to all these inputs;

(3) the estimation of errors in the corrections to all inputs;

(4) the testing of ocean models as formal statistical hypotheses;

(5) the assessment of the efficiency of the WOCE observing system, and

(6) the design of future ocean observing systems.

It would seem prudent to prioritise these objectives, but the fact is that sophisticated assimilation systems such as Kalman filtering and geophysical inverse theory perform tasks (1)-(5) concurrently. Thus only (6) can be postponed, as seems appropriate at the present as far as WOCE is concerned. CLIVAR is in immediate need of design studies for observing systems but our responsibility is meeting the original WOCE Goals with the available resources.

How close are we to being able to accomplish the first five tasks? Let us begin with (4): the formal testing of models. We are very confident, for example, that in the middle of subtropical gyres, the Boussinesq approximation and the hydrostatic balance are highly accurate on scales of interest, while geostrophy is a quite accurate approximation to the momentum balance. Indeed, assimilation of pre-WOCE hydrography, current meter data, altimetry data and preliminary WOCE data has not lessened our confidence in these relations. The data themselves are generally of very high quality. The accuracy of the circulation estimates is determined far more by data gaps, both in space and time. Other sources of error include: surface fluxes of heat, freshwater and momentum; tracer diffusivities; and bottom stresses. Nevertheless, it seems plausible that the WOCE data may be used to greatly improve our picture of the global, broad-scale ocean circulation. Thus the first strand of WOCE assimilation is:

S-1. The construction of a dynamically-consistent update to the expected data-based WOCE atlases of the general circulation of the world ocean as described in Chapter III. These updates would include consistent estimates of surface fluxes.

The existence of a consistent description of a long-term mean circulation is not a foregone conclusion, but if it can be constructed it would represent a tremendously useful basic description of the world ocean circulation. This task may well be within the capabilities of existing, small groups of scientists, using relatively simple dynamics: geostrophy, the hydrostatic balance, conservation of mass and tracers. It is essential that as many groups as possible be encouraged to perform the tasks, using varieties of assimilation methods and ranges of estimates of uncertainties in inputs. It is likely that each group will take a different approach to the asynoptic nature of the WOCE data. The results should be compared and critiqued by the entire WOCE community, leading to a consensus on the best one or few. The SMWG can take numerous steps to facilitate this basic assimilation activity (including choosing the criteria for comparisons); these will be listed below. The production of this new type of atlas is the very least that WOCE must accomplish; the task should have highest priority.

The assimilation-based atlas will not represent the great temporal variability already revealed by the current meters, the floats and drifters, the XBTs and especially the altimetry. This variability is, of course, especially pronounced in the Antarctic Circumpolar Current, in the tropics and the boundary currents and their extension regions. The evolution of these currents is not determined by geostrophy; the full primitive-equation dynamics of our finest-resolution models are required. Initialising these evolving fields at all depths will be impossible. Forcing errors will be significant. The subsequent data will not compensate for all the initialisation errors and forcing errors. In other words, aside from the tropics (where WOCE data is supplemented with TAO data, for one thing) the detailed circulation estimates are bound to be significantly wrong, as are the estimates for their errors, and significance tests for dynamical residuals (the formal model tests).

Nevertheless, it is hoped that assimilation will lead to reliable estimates of mesoscale variability in space and time, and of mean fluxes owing to mesoscale eddies. The latter are only coarsely resolved by the finest of present ocean models. Only a few groups world-wide are running global high-resolution models, and this activity is already consuming all their resources. Yet we must make progress in high-resolution assimilation, if we are to comprehend the complexity of ocean circulation. The enormity of the task of high-resolution assimilation could lead to consumption of major resources; finding the right balance will be very difficult. Relatively crude assimilation techniques, such as optimal interpolation, may be the most sensible for the first, tentative combining of sparse data and high-resolution models.

The second strand is thus:

S-2. The crude assimilation of high resolution data into complex models, in order to map schematically the variability ignored in the first strand.

The third strand of WOCE data assimilation is:

S-3. Sophisticated assimilation into coarse-resolution, dynamically-complex models, for model testing, model improvement and array design.

Few groups world-wide have the resources and experience to do this work, which will become the most technically sophisticated aspect of WOCE assimilation. Yet it must be carried out, if the development of ocean circulation models is ever to advance beyond empiricism. This setting may be the one preferred for observing system design. Solutions from these efforts would also constitute the initial conditions for long-term climate evolution scenarios consistent with oceanographic data.

The coarse-resolution primitive-equation models that we would like to provide for climate prediction themselves possess significant instabilities and other problems. This variability probably cannot be controlled with WOCE data, and of course climate prediction cannot involve data assimilation beyond initialisation. Indeed, it is to be hoped that the climate prediction is much more sensitive to gross parameters, such as the total CO₂ content, than to initial conditions. Yet it is to be hoped that data assimilation will correctly reveal the greatest deficiencies in the coarse-resolution models, be they in particular conservation laws or over some oceanic regions. The revelations will come from residual maps and sensitivity studies. Sophisticated assimilation techniques will be essential.

V.3. Methods of data assimilation

Data assimilation refers to loosely related ways for combining data and theory (usually expressed as a model). There are two imprecisely defined but useful categories: sequential estimation and inverse methods. In this section are reviewed the technical definitions and commonly used methods of ocean data assimilation.

V.3.1. Sequential estimation

Sequential estimation fits naturally in genuine forecasting. As data become available, they are used to prepare initial fields for subsequent forecasts.

All oceanographers are familiar with "objective analysis", or optimal interpolation in space using the method of least squares. This is usually univariate, that is, each field such as velocity, temperature or salinity is interpolated independently of the others. Meteorologists, on the other hand, commonly make a multivariate "analysis" when "initialising" numerical weather forecasts. Optimal interpolation requires estimates of the mean field, the spatial covariance of the field, and the data error covariance. These are constructed a priori either from archives, model output, or theoretical considerations. The prior estimate of the mean field is often a model forecast for that time. Theoretical estimates of the errors in optimal interpolation may be constructed using these "priors".

The model may also be used to forecast the field covariance. The error covariance for the new data may be used to modify further the field covariance. The final, modified field covariance then becomes the initial condition for the next covariance forecast. Known as the distributed-parameter Kalman filter, this algorithm is very popular in meteorology and oceanography. It uses dynamics and data to construct inhomogeneous, anisotropic and time-evolving field covariances. It also has the convenience of a time-sequential method. However, the huge task of forecasting the error covariance in realistic situations remains a problem. Coarse-resolution approximations are reasonable in weakly inhomogeneous regions; the use of time-asymptotic approximations seems promising, but needs detailed investigations for unphysical behaviour near data points. There is a substantial world-wide effort to advance the Kalman Filter in meteorology and oceanography, both for operational forecasting and research.

'Nudging' is a very economical alternative to optimal interpolation and the Kalman Filter. Although usually formulated by adding relaxation terms to the dynamics, nudging in practice amounts to initialising with optimal interpolation based on crude guesses for field covariances. In an operational application, this easily-implemented method can only be assessed for utility, that is, better forecasts. Its scientific value is vague, especially as it is difficult to estimate the reliability of products resulting directly or indirectly from "nudged" models.

V.3.2. Inverse methods

Most models involve inputs and outputs. We are said to be using the model in the "forward" mode when we deduce the latter from the former. The errors in model output owe to modelling errors and input errors. If we use the output errors to deduce these modelling and input errors, we are using the model in its "inverse" mode.

A short account by Wunsch of an inverse problem in the magazine Science in 1977 was a well-spring for WOCE. Observations of conserved tracers were used to infer the reference values for the steady, advecting geostrophic velocities. The reference values were over-determined by the data. Then errors in both were allowed, and the reference values became under-determined. Wunsch sought the smallest corrections to the reference values yielding the best-fits to the data. (Wunsch originally described the method in somewhat different terms, but it amounts to much the same thing.) All "sizes" were measured with quadratic or Euclidean norms, that is, Wunsch used a least-squares inverse formulation. An essential criterion for the quality of the least-squares goodness-of-fit to data is its conditioning. As pointed out by Bretherton, Davis and Fandry in their influential 1976 paper in Deep-Sea Research, the conditioning is a quantitative assessment of the efficiency (or redundancy) of the observing system. They also indicated that exact or approximate dynamical models may be imposed as constraints upon optimal interpolation, and showed that error estimates for input and data were also available.

We would like to use the time-dependent WOCE data to infer the errors in inputs and model errors in our most complex model of the world ocean circulation. These include: the initial conditions, the surface forcing and the various parameterisation schemes. We note also bathymetric and numerical errors. Such an inverse calculation can be performed now using current technology, subject to two compromises. First, the model would have to be of the type intended for climate prediction: it would have horizontal resolution of 1 to 2 degrees, and between 10 and 20 vertical levels. Second, the WOCE data, in particular the altimetry, would have to be compressed. The extent of compression would vary from "somewhat" (such as moderate binning) to "heavy" (such as analysis of variance or "eofs"). It is not yet computationally feasible to invert the full WOCE data set using our finest-resolution models.

V.4. Resources and requirements

V.4.1. Knowledge

Most oceanographers are comfortable with objective analysis, singular value decomposition, and the essentials of numerical modelling. They are often highly practised in time series analysis. Only a small number are comfortable with all three elements necessary for assimilation: real data, realistic general circulation models, and estimation methods. The estimation methods are closely related to the more familiar analysis methods - they are really just a combination of modelling with statistical estimation theory. Several monographs and a collection of essays have appeared recently, or will do so shortly.

Atmospheric Data Analysis, by Roger Daley (Cambridge U.P., 1991)

Inverse Methods in Physical Oceanography, by Andrew Bennett (Cambridge U.P., 1992)

Modern Developments in Oceanic Data Assimilation, edited by Paola Malanotte-Rizzoli (Elsevier, 1996)

The Ocean Inverse Problem, by Carl Wunsch (Cambridge U.P., 1996)

In the last few years, there have been several international symposia and summer schools on atmospheric and oceanic data assimilation. These have usually consisted of loosely-coupled curricula taught by a number of specialists with varying emphasis. Data assimilation is in its infancy, and the few specialists have rather widely differing perspectives. Tightly co-ordinated short courses with a limited number of lecturers per course are recommended. This will lead to course biases, but this should be tolerable so long as these biases are made clear. At this early stage of ocean data assimilation, many coherent presentations with a wide range of approaches are to be preferred.

A central purpose of these short courses should be to familiarise faculty and students with the practices of estimation theory, with the aim that this become part of the standard oceanographic curriculum.

V.4.2. Training

A rapid expansion of the number of practising data assimilators is required. WOCE provides the only opportunity for gaining experience in global ocean data assimilation, outside of naval operations. This passing opportunity must be exploited to the full through: opening junior faculty positions to increase training, hiring of highly proficient research assistants with training in statistics, computational fluid dynamics and control theory, provision of post-doctoral fellowships, and recruitment of graduate students.

Immediate expansion of assimilation effort in oceanography is recommended, via research assistantships, fellowships, and junior faculty positions: WOCE National committees are called on to assist in achieving this aim.

V.4.3. Computing

Global Kalman filtering and four-dimensional inversion in a basin or global model require great computer speed and memory. Dedicated, true supercomputers are therefore required for strand 3 (sophisticated assimilation into coarse resolution models). Pursuing this strand is the principle reason for an assimilation centre (section V.4.4). Shared supercomputers or dedicated expanded workstations will be adequate for strands 1 and 2: atlas production and crude insertion into high-resolution models. This would include global inverses with reduced dynamics, or nudging a high resolution basin model for a year or so of model ocean time. The facilities for these tasks might result from the activities of groups such as the CCOM described in section IV.4.2

V.4.4. Data assimilation centres

Data assimilation centres are required if there is to be access to the major (dedicated) supercomputer resources needed for sophisticated assimilation into coarse resolution models. In a centre the required expertise in modelling, statistics and data analysis would be juxtaposed. Formation of a centre requires an institutional commitment which would last beyond a normal academic project. A second reason for supporting centres would be for efficiency in data stream accession, especially where it includes temporal data sets such as altimetry, XBTs, or surface drifters, which will be collected for many years to come. Centres would also provide the archiving, production and distribution facilities.

A French project, MERCATOR, has been proposed, which will initially concentrate on WOCE-type assimilation in the N. Atlantic, although funding at this time (1997) is not secured. On a longer time scale (5-7 years) the project will strive to develop a high resolution primitive equation model of the global ocean, assimilate altimetric and in-situ data, have commercial and military benefit, as well as provide near real time conditions for coupled climate models. In the US a workshop (Denver, 1997) brought together all US parties interested in ocean assimilation, involving several projects, agencies and individuals, in order to consider a common requirement. It was agreed that a US national centre would adopt an interdisciplinary outlook, and be capable of running global, coastal and biogeochemical models. Interagency support was proposed with NSF taking the lead. At the time of writing further developments are awaited.

At least three major institutional groups are moving now into operational ocean data assimilation, separate from WOCE-oriented planning. In the US, NCEP (NOAA) is currently producing real-time analyses for the tropical Pacific and Atlantic, assimilating temperature and sea-level data using variational objective analysis. In the UK, the UKMO is producing real-time analyses for the North Atlantic using a similar technique. In Europe, ECMWF is beginning a tropical data assimilation project.

Projects such as MERCATOR and initiatives from US agencies that support WOCE-type assimilation are endorsed. The activity of other major institutions in ocean data assimilation, though regional and utilising only specific data types is welcomed , and expansion into global WOCE-type assimilation in the future is encouraged.

V.4.5. Models

For data assimilation a range of models is desirable, from planetary geostrophic models, to primitive-equation or perhaps even non-hydrostatic models. Good parameterisations and accurate, stable and fast numerics are all important. So are surface fluxes and bathymetry. The merits of aspects such as pressure coordinates or density or sigma coordinates will probably be debated well after 2002. It may be noted that some of the world's leading computational fluid dynamicists are now becoming interested in ocean modelling. WOCE should join forces with them.

V.4.6. Data

The form of WOCE data used for assimilation depends on the data type and the choices made for each specific assimilation project. In general some version of the Level 1 data will be assimilated, and information about the source and errors for each individual data point (for instance temperature from differing devices) will be needed. Some assimilators may prefer highly decimated data sets, or specific time averages of Lagrangian observations, or particular gridding choices for flux products, for instance. These choices are likely to vary from one project to another, and hence easy access to the Level 1 version of each data type is essential. Close communication between the assimilation investigators and the data providers as represented by the DACs is essential.

In addition to the WOCE data themselves, assimilators need the statistics of data errors. These include:

1. calibration, navigation and timing errors;

2. the statistics of discarded outliers;

3. sampling errors, such as likely internal wave and mesoscale noise in each hydrographic section, or correlated mooring motions, and

4. the variability within bins, where reported values are averages.

While the first and second error types are independent of the version of each data being used for assimilation (e.g., time averages, decimation choices), the third and fourth depend on the specific assimilation project choices. Each WOCE DAC should distribute interim technical reports on the first and second types of data errors as part of their regular data distributions, and redistribute final reports at the completion of the WOCE field programme. Whenever possible, data error variances and covariances should be provided; since these depend on the specific data versions being assimilated, provision of these errors would most likely be in the form of software which operates on the Level 1 data. Significant non-normality should be reported, along with at least the skewness and kurtosis.

The SMWG recommends that with the co-ordination of the DPC the WOCE DACs provide estimates of error variances, covariances, skewness and kurtosis for all data types in their charge.

V.4.7. Assimilation method developments

Developments in approaches used for WOCE assimilation are required. The following four subsections contain detailed discussion of what is needed.

V.4.7.a Dynamical error statistics

WOCE Goal 1 is the development of ocean circulation models. Thus it has always been understood in WOCE that models are imperfect. The usual evidence for these imperfections is in the form of solutions that differ structurally from the observed ocean. However, there is equally significant evidence in the form of term imbalances in the equations of motion. The imbalances typically arise from differences between the real eddy fluxes and the model's parameterisation of the fluxes. Statistical information about these imbalances is essential for data assimilation, but the needed statistics are largely unavailable. We shall have to make crude guesses. We must also carefully distinguish between random and systematic errors.

V.4.7.b Estimators

The ostensible purpose of ocean data assimilation is the estimation of a circulation, that nearly fits all the data and dynamical information. The choice of fitting criterion or 'estimator' is crucial. The very popular weighted least-squares estimator is the simplest, and is amenable to very efficient mathematical algorithms. However it may be seriously inappropriate in certain circumstance. For example, the estimator of maximum likelihood may be a much more reasonable choice when errors in the data or dynamical information are non-Gaussian.

V.4.7.c Optimisation algorithms

WOCE data sets are large, while WOCE ocean models have many degrees of freedom, are nonlinear and may involve non-smooth parameterisations of processes such as convection. Thus, estimating the circulation that best fits this information is a major mathematical challenge. Much progress has been made, but much more experience is needed, as are expertise and computational resources.

V.4.7.d Assimilation into steady ocean models

The slowly evolving nature of deep circulation greatly reduces the advantage of 4- D model-assisted assimilation over 3-D optimal interpolation. Assimilation into steady models should be advantageous, but will be technically difficult.

The baroclinic adjustment time of the deep ocean is much longer than the duration of the WOCE field programme, so it would be sensible in the first place to interpolate (more correctly, smooth) the deep data using synoptic objective analysis. That is, the data should be optimally interpolated in space, without regard to the times at which the data were collected. The method requires mean fields at one time, and also the spatial covariance of the fields. Specifying these moments is bound to be suspect at present - that is the reason for doing the WOCE.

The data in the upper thermocline show seasonal variability, and so a more complex analysis scheme is called for. Objective analysis is conceptually no more difficult in four dimensions than in three, but specifying means and covariances is substantially more suspect. The covariances should be multivariate, and non-stationary and inhomogeneous on long scales in space and time.

The function of least-squares data assimilation is the dynamically-consistent generation of such 4-D means and covariances using a model, together with specified moments for all the model inputs. The assimilation scheme then performs a 4-D objective analysis of the data. The inputs include interior forcing (representing errors in eddy-flux parameterisations), surface fluxes, and initial conditions. Again, means or prior estimates are needed for these, as are covariances. However the model-generated priors and covariances for the deep circulation will not differ significantly from their specified initial values, and so the 4-D analysis of the deep data is effectively 3-D analysis without regard to the timing of the data. This 3-D analysis may as well have been performed as such, using the initial means and covariances. These are acting as proxies for the ancient surface fluxes that are responsible for the present deep circulation.

If models are to be of any value in assimilation of deep data, they must be steady models that, of course, do not require initial values. They would be driven by steady surface fluxes and interior forcing. The model would, in effect, generate highly structured yet dynamically consistent spatial covariances. These would be used by the assimilation to make a spatial objective analysis of deep data, or could be used as initial covariances in a time-dependent assimilation of data throughout the full water column.

The challenge is that global, nonlinear, steady ocean models are very difficult to solve. Yet they must be solved many times in an assimilation calculation, either for the Monte Carlo generation of circulation covariances, or in gradient-search for the optimal circulation.

V.5. Data assimilation workshops, training and intercomparison

The highest priority for assimilation are the development methods, provision of resources, and entrainment of trained investigators. Several ocean assimilation workshops in the past few years and the small size of the WOCE data assimilation community suggest that there is not a great need for a new workshop in the short term. However, there is a need for intensive training, which could be accomplished in concentrated courses (section V.4.1).

In the short term, WOCE data assimilators are encouraged to participate fully in the WOCE basin and modelling workshops, to take full advantage of emerging data sets and knowledge about them, and of developments in models. They should bring their preliminary results of estimated fields, suggestions for sampling strategies, and tests of models to the attention of both WOCE modellers and observationalists.

Within several years, a variety of approaches to WOCE data assimilation will have been carried out, including attacks on both the climatological aspect of some WOCE data sets, and the usage of those WOCE data sets with temporal coverage in more standard predictive type assimilation. Then a workshop should be dedicated to WOCE data assimilation including a careful intercomparison of different approaches. The workshop would probably focus on mid-latitude oceans, assimilation of sparse data sets with long time scales, the treatment of eddies and persistent flows of small spatial scales, and the various technical methods such as Kalman filtering, nudging, optimal interpolation, and various inverse method approaches. A date of 1999 or 2000 is suggested.

In the longer term a thorough intercomparison of different approaches would be valuable, testing data assimilation methods on a common model and common data set; testing models with a common data set and assimilation scheme; and testing data sampling schemes with a common model and assimilation scheme.

The following course of action is therefore recommended;- in the near future short courses in assimilation; in the year 1999 or 2000, a WOCE data assimilation workshop; and eventually, a comparison study of various assimilation approaches.

Return to Table of Contents