Automated Probabilistic Analysis and Parametric Modelling of the Seasonal-Diurnal Wind Vector

A refined and extended version of the Offset Elliptical Normal mixture model has been developed to parameterise the seasonal diurnal wind vector automatically. Automated using R scripts, the method eliminates any potential risk of confirmation bias posed by the manual supervision in the original method. Refinements to the method include the latest algorithms for clustering of Gaussian mixtures, with Bayesian regularisation to set the number of components and to limit the predisposition to overfit. A new extension uses fuzzy logic to evaluate the probability distributions, autocovariances and spectra of the random perturbations around the mean seasonal-diurnal variations for each component of the mixture. These additional parameters allow the predictions of the OEN model to be validated and its automated application demonstrated using the hourly METAR reports of mean wind speeds at Adelaide, South Australia, showing significant improvements over the previously published analysis. The OEN mixture model is directly applicable to a wide range of wind engineering applications where seasonal and diurnal variation is of importance. variation of speed is strongest in the austral summer afternoons. The direction shows a diagonal pattern that indicates an overall net anti-clockwise rotation consistent with Coriolis effects in the southern hemisphere but proceeds in jumps between three preferred directions: Ɵ = 230°, 150° and 20°. The corresponding chart synthesised from the final OEN model is shown as (b) to enable later direct comparison.


Introduction
The seasonal and diurnal variations of the wind speed vector is an important consideration in many engineering applications, including the safety and comfort of pedestrians around tall buildings; heating, ventilation, and air-conditioning (HVAC) loads; structural design to resist windstorms; and the availability of wind power generation. "Now-casting" methods are used when the application requires prediction of the wind vector some minutes into the future, eg. for the safety of crane operations or the operational control of wind turbines. Daily forecasts are sufficient for controlling the safety of transient operations, including temporary structures such as grandstands for events. However, the planning and design of engineering projects is almost universally done of the basis of probability of occurrence, as in assessing wind energy potential, or risk of exceedance, as in codes of practice for structural design.
Assessment of the joint probability density function (jPDF) of the wind vector can be done in several ways: 1. As a "raw" or "observed" jPDF by binning occurrences of wind speed and direction; 2. Fitting the observations to a non-parametric model, eg. using kernel density estimation (KDE) methods, which is essentially a smoothing exercise.
3. Fitting the marginal density functions of wind speed and direction observations, separately, to parametric distributions, accepting the resulting loss of information on their joint action; 4. Fitting the observations to a parametric jPDF model, preferably one which is justified by meteorological theory. The OEN mixture model falls into the fourth category. It is founded on the physical theory proposed by Brooks and Carruthers [1] and developed later by Crutcher et al [2][3][4][5], which predicts each wind generating mechanism produces a bivariate Gaussian jPDF that forms elliptical contours on zonal (W) and meridional (S) axes. The wind climate in any location around the World is recognised to be "mixed", this is: produced by a mixture of mechanisms, each acting exclusively, that result in a jPDF which is the disjoint sum of the individual component jPDFs. The OEN mixture model was introduced in 2014 [6] as a novel way of demonstrating why simple meteorological and statistical theory leads to a bivariate Gaussian model for the PDF of the surface wind vector and, from that, the ubiquitous Weibull model for the marginal cumulative distribution function (CDF) of speed irrespective of direction. It was also recognised that OEN had a useful role in separating and parameterising the individual wind components of the mixture that is produced by various meteorological wind-producing mechanisms acting disjointly. Notably, it avoids all the circularity issues that plague other approaches operating on speed and direction e.g. [7], is based on physical meteorological and statistical theory and is globally applicable [8].
The ability of OEN to parameterise seasonal-diurnal variation was first demonstrated in 2015 on the well-studied, mixed-mechanism wind climate of Adelaide [9] where it was demonstrated that each disjoint model component corresponded to the known physical mechanisms. It was then applied to the wind climate of Rome [10], using observations from the Fiumicino and Ciampino airports, and again showed correspondence between model components and mechanisms, and with the differences between sites matching the differences in topography. The method has since been refined and its application has been extended with additional analyses that were previously thought impractical [11], but which now allow the underlying predictions of the model to be validated. Implementation has been moved from Excel TM into the R statistical computing language to take advantage of R's superior analysis and graphics facilities and to automate the fitting process, as far as is possible, eliminating any risk of confirmation bias posed by manually supervised fitting. Also, in the interim period, an additional six years of METAR reports at hourly intervals have been accrued at Adelaide, so the time is ripe to revisit the previous analysis of the Adelaide wind climate to validate this automated method and to extend the original findings. The principal purpose of this paper is to make the R analysis scripts freely available, allowing the OEN mixture model to be fitted with minimal manual intervention to any other suitable dataset, to parameterise the jPDF of the wind vector at any season and time of day.

Meteorological Data
The ability of the OEN mixture model to represent the known physics of the Adelaide mixedmechanism wind climate is demonstrated by applying it to the hourly METAR reports from Adelaide International Airport from January 1991 to October 2020, extracted from the TD3505 database on the US National Centers for Environmental Information (NCEI) server. The earlier [9] data at 6-and 3-hour intervals, with its reported uncertainty between UTC and local times [12], have been discarded. Figure 1(a) shows the seasonal-diurnal variation of the marginal probability density functions (PDF) of wind speed, p(V), and direction, p(Ɵ), obtained by binning the observations, in a format [13] that presents the whole annual cycle in a single chart. Here, the horizontal axis represents the cycle of each hour through the day in twelve panels that represent the cycle of each month during the year. The vertical axes represent the marginal PDF of wind speed (top) and wind direction (bottom) averaged over every day in each month of the 30-year record. This is equivalent to Figure2 in [9] but now is clearer and reveals more detail in the strong and coherent pattern. The diurnal variation of speed is strongest in the austral summer afternoons. The direction shows a diagonal pattern that indicates an overall net anti-clockwise rotation consistent with Coriolis effects in the southern hemisphere but proceeds in jumps between three preferred directions: Ɵ = 230°, 150° and 20°. The corresponding chart synthesised from the final OEN model is shown as (b) to enable later direct comparison.   shows the whole-year joint PDF of the observations, p(W,S), excluding calms and plotted as westerly, W, and southerly, S, components on the zonal-meridional plane. This is equivalent to Figure 4(b) in [9] but is again clearer and more detailed. The ripple in the outer contours is the residual effect of the 10° increments of wind direction after the transformation from polar to cartesian coordinates. Again, (b) and (c) are results, included here for later direct comparison and discussion.  Most of the data in this study are four-dimensional, depending on the wind vector components, W and S, in the zonal-meridional plane (or their meteorological counterparts, V and Ɵ), the local time of day, H, in Australian Central Time (ACT = UTC+0930) and the month, M. The most dimensions that can be represented on any two-dimensional chart is three, i.e., as contours or colour-maps, hence compromises must be made in their presentation which result in some loss of information. In Figure 1 the compromise is to show the marginal distributions of V and Ɵ separately, so omitting their joint characteristics. Figure 3 shows whole-year data in which the seasonal-diurnal variations are averaged out. Revealing these variations in p(W,S) in hourly data requires 288 separate charts which are best laid out in the form of an H by M matrix, as provided by the supplementary online Figure S1. Figure 2 is an extract which shows that the establishment of the sea breeze between 0930 ACT and 1130 ACT from April to June (austral autumn) occurs slightly later in each successive month.

Basis of the OEN Method
The original OEN mixture model is fully described in [9] and the later refinements and extensions in [10,11] and its application to whole-year data in [8]. Here, for the convenience of the reader, the representation of the model on the zonal-meridional plane is shown in Figure 5 and the defining equations are reproduced in Appendix A. The model joint PDF of each component of the mixture forms elliptical contours around the mean vector, so it is convenient in Figure 5 to represent each component of the mixture by its one standard deviation contour and refer to this as the "ellipse".
The theory underpinning the OEN mixture model predicts that:

Figure 5
Definitions of the Offset Elliptical Normal model parameters. Note that all wind speeds values in this paper are expressed in the original observed units of knots (1kn = 0.514ms -1 ) to avoid unit conversion bias when binning to probability density function (PDF) estimates.
1. The ellipses of the mixed climate are exclusive (disjoint), but not necessarily independent. This means that only one mechanism occurs at a given time (exclusive) so that the individual jPDFs are additive in proportion to their frequency (disjoint) in accordance with Eqn. A.1. However, they may not be independent in the sense that they interact, for example: a land breeze generally follows a sea breeze; a sea-breeze cell is strengthened by an overlying synoptic off-shore breeze; and land/sea breezes are inhibited by synoptic winds in excess of 15 knots.
2. All the non-stationary seasonal-diurnal variation is expressed by the parameters of Equations A.3 and A.4. Hence the location and scale parameters, representing the ellipse centres, sizes, and orientations, vary deterministically with season and time of day.
3. That, by rotating the axes such that = 0 in Figure 5(b), all the random variation is expressed by stationary, orthogonal, Normally distributed random perturbations u/σ u and v/σ v with zero mean and unity standard deviations.
The extension of the fuzzy model in [11] to derive u/σ u and v/σ v and to estimate their PDFs, autocovariances and spectra now allows these predictions to be tested.

Implementation of the OEN Method in R
Implementation of the refined and extended OEN analysis method is broken down into eight R scripts, as indicated in Figure 4: o Steps 1 and 2 to extract the observations, correct/remove artefacts [14], then form the joint PDFs p(W,S) for each M and H, operate identically to the previous implementation [9].
o Step 3 is new and applies a (0.25,0.5,0.25) smoothing filter between the joint PDFs p(W,S) for preceding, current, and succeeding hours, H, and months, M, to reduce ephemeral peaks in the data field and facilitate the automated "threading" process: the pairing of ellipses between successive observation times, H, in step 5. This smoothing has the unwanted effect of reducing the gradients and peak values in the data field which is corrected later by the fuzzy model. o Step 4 generates the first OEN fit without any manual intervention. It takes advantage of the latest algorithms in the package "mclust" [15] for clustering of Gaussian mixtures [16], including Bayesian regularisation to set the number of ellipses and limit any predisposition to overfit the parameters, as well as R's standard multiparameter optimisation functions that are equivalent to the Excel TM Solver function used previously. Nominally, the result has the optimal number of ellipses to adequately describe p(W,S) for each M and H independently of the others. In practice, this Bayesian regularisation step is often too aggressive, merging ellipses that are clearly separate in the adjacent H, resulting in a loss of serial continuity between adjacent hours and in values of the coefficient of determination, R 2 , that are unacceptably low. Therefore, whenever R 2 is below the target minimum value of 0.99, additional ellipses are added and re-fitted until the target is reached. Figure 6 compares the R 2 values for the first fit (black circles) with the improved final fit (red crosses). Those few instances where the first fit appears to be better are over-parameterisations, where an extra ellipse appears which is not found in the preceding or succeeding observations and has been removed.

Figure 6
Coefficient of determination, R 2 , for first fit (black circles) and final fit (red crosses.
o In step 5, a simple algorithm replaces the previous visual "threading" process. Each ellipse is paired with the closest-matching ellipse of the previous H and assigned the corresponding index. Ellipses with no counterpart in the previous H are assigned a new ellipse index. Previously found ellipses with no counterpart in the current H are "remembered" so that they can be matched if they reappear later. When two similar ellipses move past each other, the algorithm sometimes erroneously swaps their indices. A manual check and any necessary corrections are made after threading each month and the algorithm learns the pattern of ellipses, so that fewer errors require correction when threading the later months. o Step 6 permits manual inspection, adjustment and refitting of the first fit. Now that the ellipses are serially "threaded", refits can be initialised using the previous/next H or M parameters to avoid introducing any potential confirmation bias. o Step 7 assigns the ellipse probabilities (membership), F, calculated from the final fit and computes the fuzzy OEN model parameters (see §A.2). This stage does not use the pre-smoothed p(W,S) fields in Step 3, so eliminates the associated errors. o Step 8 implements the new fuzzy OEN procedures described in [10,11] (see §A.3) to remove the deterministic seasonal/diurnal variation of the location and scale parameters by a process of demodulation, so revealing the random perturbations and estimating their PDFs, autocovariances and spectra.
The fuzzy demodulation process in step 8 to obtain u and v for each observation is approximate and has some obvious flaws. For each ellipse, the mean ̅ ( , ) and ̅ ( , )is subtracted from every observation to shift the vector origin to the ellipse centre, the axes are rotated by α(M,H) to align with the ellipse axes to give the u and v perturbations, which are then normalised by the standard deviations σ u and σ v , respectively. This demodulation is exact only for those observations that belong to the target ellipse, when the resulting perturbations will be small but their weighted contribution from F i will be large. In observations which belong to the other ellipses, a reverse modulation is induced, generating spurious seasonal/diurnal perturbations, but the weight from F i will be small. Hence the accuracy of this fuzzy demodulation process is a trade-off between the large probability of only a small error and the small probability of a large error. Some "leakage" from other ellipses is therefore inevitable. Three indicators of the quality of the normalised perturbations, u/σ u and v/σ v , are that the means should be zero, the standard deviations should be unity and the mean cross-products should be zero.
The principal difference between this implementation and the original in [9] is that the ellipses attributed to large scale weather (LSW) systems are no longer forced to be non-diurnal by averaging over all H. The improved fitting process no longer needs this precaution to be able to resolve these ellipses. In any case, [10] recognised that the relative frequencies of the LSW ellipses must vary diurnally to maintain Σf i = 1 in response to the varying diurnal components.

Results of Applying the Method to Adelaide
The first fit, final fit and fuzzy OEN parameters are provided in the Additional Materials as .CSV files, allowing the model to be applied and the figures in this paper to be reproduced by the R scripts without having to repeat the analysis and fitting steps. However any figure that includes the raw TD3505 observations will require steps 1 -3 of Figure 4 to be run to extract them from the NCEI TD3505 database to comply with the NCEI terms of use.

Overall
The seasonal-diurnal chart for the marginal PDFs, Figure 1(b), and the whole-year joint PDF, Figure 3(b), synthesised from the fuzzy OEN mixture model, are both faithful, smoothed, reproductions of the corresponding observed charts. Figure 3(c) shows the mean locations and sizes of the eight ellipses deduced by the method, weighted by their probability of occurrence, and the p(W,S) field generated from them. This is included as a useful aide in interpreting the later results. But the probability field in (c) is not the same as in (b) because the mean of the 288 fields and the field from the mean parameters are not concomitant due to the non-linear nature of the model. In comparison with the five ellipses of the earlier analysis [9]: the sea breeze ellipse now separates into the local, E1, and continental sea breeze, E4; the downslope ellipse separates into a diurnal downslope, E3, and a summer SE ellipse, E5; and the LSW2 ellipse separates into a NE ellipse and a winter NW ellipse. The ellipses in each new pair show markedly different characters, supporting their separation and suggesting that the previous model was under-parameterised. Figure 7(a) shows the contributions from each ellipse to the overall marginal PDF of wind speed, p(V) by the thin curves labelled with the ellipse index, the overall PDF for all ellipses by the thick (blue) curve and the observed PDF by the circle symbols. It reveals that E8, the SW component, controls the eventual upper tail and therefore the extreme wind speeds. Figure 7(b) shows the corresponding overall CDF plotted on standard axes that transform the Weibull distribution ( ) = 1 − exp(− ⁄ ) into a straight line, where the observations are shown by the circular symbols and the thick (blue) curve is the fuzzy OEN model. The OEN model prediction that u and v are Normally distributed leads directly to the expectation that P(V) converges onto a straight line of slope k = 2 [6] in the upper tail. The straight (red) line is the best fit with slope k = 2 through the upper tail of the data. Figure 7(c) shows the contributions of each ellipse to the overall marginal PDF of wind direction, using the same key as 7(a). Figures 7(b) and (c) are equivalent to Figure 19 (a) and (b) of [9] except that here P(V) includes calms so that the data converges onto the Weibull line from above.  Figure 8 shows the autocovariance and spectrum of wind speed for the whole record. It is not possible to obtain the power spectral density function (PSDF) directly using the fast Fourier transform (FFT) because this does not tolerate any missing observations which are inevitably found in practice. Instead, the classical Blackman and Tukey [17] method has been adopted, whereby autocovariance is first computed from the lagged products, discounting missing observations, then Fourier transformed to obtain the PSDF. A maximum lag of two years is the minimum length that will give a whole number of daily and annual ½ cycles, such that all their harmonics and their sidebands [11,18] each occupy only a single frequency bin. Figure 8 1. The diurnal cycle does not decay because it is phase locked. 2. The mean of the diurnal cycle at these short lags is above the origin, representing the contribution of the annual cycle (≈2.5 kn 2 pk-pk). Figure 8(b) presents the composite PSDF obtained by the following procedure: 1. The raw PSDF was obtained as the Fourier transform of Figure 8(a). In this, the daily and annual harmonics occupy single frequency bins and present as sharp spikes. Small sidebands representing the annual modulation of the daily cycle, first noted by Oort and Taylor [18], present as smaller peaks spaced at 1/365.25 day-1 intervals either side of them.
2. These spikes and sidebands were removed by heterodyning to leave the raw PSDF of only the random component.
3. The random component was smoothed using a filter bandwidth proportional to frequency. 4. The harmonics and sidebands were reinstated to form the composite spectrum shown. Filtering the PSDF was essential because the bandwidth-time product [17] BT = 14.13 for these data is insufficient for defining the random component but filtering the raw PSDF directly would also artificially widen the bandwidth of the daily and annual harmonics. The single peak to the left is the annual cycle, the peak values towards the right are the harmonics of the daily cycle, while the lower values on these harmonic spikes are the annual sidebands. The underlying random component takes the characteristic shape of the bandwidth-limited pink noise (BLPN) spectrum of Equation A.10.  Figure 9 shows the relative frequency, f, for each ellipse at each H through the year. This is equivalent to Figure 14 in [9] but now reveals how the LSW ellipses have been split and vary diurnally. Note that the previous arbitrary split between "SL" and "DS", marked "A" in Figure 14(a) in [9] was automatically resolved into f 1 and f 3 by the refined method.  a. Calms, f 0 , are seen to occur in the early hours throughout the year. b. The local diurnal sea-breeze, f 1 , peak occurrence is early afternoon. c. The land-breeze, f 2 , is nocturnal with peak occurrence in winter. d. The downslope winds, f 3 , are nocturnal with peak occurrence in summer. e. The continental diurnal sea-breeze, f 4 , peak occurrence is early afternoon in summer. f. Southeast winds, f 5 , occur throughout the day in summer with peak occurrence in the evening.

Relative Frequencies
g. Northeast winds, f 6 , occur throughout the year with peak occurrence at midday in winter. h. Northwest winds, f 7 , occur through the day in winter and around noon in summer. i. Southwest winds, f 8 , are nominally non-diurnal throughout the year. As anticipated, now that the LSW ellipses have not been forced to be non-diurnal, the patterns for (g) E6, (h) E7 and (i) E8 are less coherent than the compact diurnal ellipses but are no longer arbitrarily constrained.

Magnitude of the Mean Vector
The magnitude of the mean vector, ̅ = ( ̅ 2 + ̅ 2 ) 0.5 , for each ellipse is plotted in Figure 12 as contour maps, where the contour labels are in knots and the colour scale is the same for all charts. The R function specially developed to display these data replaces all missing values with the average value around the periphery of the valid data field to mitigate the Gibbs [19] phenomenon: the "ringing" effect that would otherwise appear next to a jump discontinuity after Fourier smoothing, but this has no impact in the OEN model since f = 0.) These deterministic seasonal-diurnal cycles of the mean vectors represent the regular periodic movement of the ellipse centres around which the random components act. Studies of the sea/land breeze cycle, e.g., those cited in [9], particularly [20], focus strongly on the mean diurnal hodographs formed by these movements, and they speculate on why a particular hodograph does, or does not, show the sunwise rotation predicted by Haurwitz [21] theory -citing the "delicate balance" between Coriolis force, surface and synoptic pressure gradients, topography, and advection effects. The corresponding whole-year diurnal hodographs for Adelaide, compiled by averaging the Figure  12 fields across all months, are plotted in Figure 13. The individual graphs for each ellipse are labelled by the ellipse index and the small labels indicate the time at 6h intervals. The (black) set labelled "All" is the overall hodograph for all the ellipses combined and corresponds in format to the hodographs in [20]. Here, it describes a sunwise (anti-clockwise) circuit which, as it contains the origin, results in the continuous net rotation of the wind vector noted earlier (Figure 1). However, this net rotation is achieved in jumps between the sea-breeze, downslope, and land-breeze ellipses, repeating in sequence. The hodographs for the local (E1) and continental (E2) sea breezes contradict the observation in [22] that the former's vector at Adelaide is normal to the coast and the latter's is southerly: both show a sunwise (anticlockwise) rotation with H consistent with Coriolis forcing, veering from SW towards S, but the continental sea breeze establishes later and is stronger. The largest LSW ellipses, E7 and E8, describe a clockwise circuit which must be driven by the shape of the corresponding geostrophic systems as they are advected past the site. The other ellipses form almost closed loops indicating no Coriolis forcing.  Figure 14 indicates of the scale of each ellipse by = ( 2 + 2 ) 1/2 which, as u and v are predicted by the OEN model to be orthogonal and uncorrelated, represents an overall standard deviation of the random perturbations. The contour labels are in knots and the colour scale is the same as Figure 12. This shows that the diurnal ellipses E1 to E4 are consistently small, while LSW ellipses E6 to E8 are large, and E5 is intermediate. The westerly-dominated LSW ellipses are largest during the transition from summer to winter, indicating the most variability in the strongest winds.

The Random Perturbation Intensities, u/σu and v/σv
The OEN model is founded on theory that predicts random, orthogonal, Gaussian perturbations modulated by the deterministic mean seasonal-diurnal cycles. The previous analysis [9] gave no insights into how well the characteristics of the random perturbations matched these predictions. The developments of the fuzzy OEN model in [11], defined in §A.3, enable the predictions and the quality of the fuzzy demodulation process to be tested, which is important if the fuzzy OEN mixture model is to be useful in practice. The extended analysis also provides the parameters needed to synthesise time series of the wind vector with the correct serial correlation as well as the seasonaldiurnal variation [11].

Location, Scale, and Orthogonality
If the fuzzy demodulation process was perfect, then the mean would be zero and the standard deviation would be unity for u/σ u and v/σ v of each ellipse. Additionally, if the perturbations were orthogonal, then their mean cross-product = / × / ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ would be zero. As noted earlier, the demodulation is a compromise between the large probability of a small error in the target ellipse and the small probability of a large error from the other ellipses. Even if an ellipse is orthogonal, leakage from the other ellipses will contribute non-zero vales to ρ uv . Table 1 shows the u/σ u and v/σ v values for each ellipse -deviations from the expected values are small but, as expected, they are not zero.

Fuzzy Probability Densities
As the fuzzy PDFs do not depend on the order of the observations in time, any deviations from the model prediction are due solely to imperfect demodulation. Figure 10 presents the fuzzy PDFs of the random perturbations for each ellipse, plotted on logarithmic probability axes to emphasise the tails. On these axes the Normal distribution presents as a parabola, represented here by the dashed (blue) curves.
While the distributions are reasonable fits over 2 orders of magnitude, they all some show exponential deviations in the far tails. This is a common phenomenon found in Gaussian processes and is caused by non-linearities in the perturbations. It is found in nature at all scales from the distribution of stars in elliptical galaxies down to electrical signals in biological cells [23]. The parabola is the form of conic section that lies on the boundary between the hyperbola and the ellipse. A model for the PDF corresponding to the best fitting hyperbola on these axes is given by: where and are the parameters of the canonical hyperbola equation  Figure 10 show that this model gives a good fit to the exponential deviations from Normal parabola.
Note that the anomaly around the origin of the PDFs of the smaller ellipses, most evident in (c), is a resolution bias artefact which is introduced by the 10° increments of direction [11] and can therefore be discounted.

Autocovariances
The fuzzy autocovariances functions (ACFs) are dependent on the time lag between observations so additional error from "leakage" through the F i values of the other ellipses must be expected in addition to that from imperfect demodulation. The autocovariances, R (τ), of u/σ u and v/σ v for each ellipse are presented in Figure 15. With perfect demodulation and orthogonality, the standard deviations in Table 1 would be unity, so the autocovariance and autocorrelation would be identical. Autocovariance is presented here to reveal the imperfections in the process. Here, (d) shows the most scatter because E4 is active for the least time and has the smallest population. Some residual diurnal variation is evident in most of the cases but decays with lag, τ, indicating that this is likely to be a "leakage" error since any real diurnal cycle must show a constant amplitude. The fitted curves are the bandwidth limited pink noise (BLPN) model introduced in [10] and defined by Equation A.10.

Figure 15
Fuzzy ACFs of the scaled random perturbations for each ellipse. Figure 16 compares the RMS error between observed p(W,S) and final OEN model from the refined method of this paper with the corresponding error in the original analysis in [9]. The range of error in the new method is smaller, indicating that it is more consistent. The larger RMS errors of the original method are reduced. However, some smaller RMS errors of the original method are exceeded -this is the small price that is paid for removing over-parameterisation to achieve serially consistent threads of ellipses.

Discussion
The earlier analysis [9] presented the OEN model as a simplification of reality, a "black-box" method which met the principal aim of characterising and simulating the Adelaide wind climate sufficiently well for engineering design. The later suggestion that the OEN ellipses represented actual physical mechanisms can now be viewed with more confidence since this extended analysis, and the intervening analyses of the two Rome stations in [10], confirm that the ellipses present characteristics consistent with the wind mechanisms attributed to the two well-studied sites. It is now proposed that the OEN mixture model is a useful tool to identify, separate and characterise the physical meteorology. The extension of the OEN method to estimate the PDFs and ACFs/spectra was introduced specifically to verify the predictions of the OEN model theory; but is also a necessary first step in the development of indefinitely long timeseries simulations of the wind vector which combine the observed seasonal-diurnal variation with the serial correlation of the random components. As noted in [11], this can now be done for each individual ellipse component, but not for the complete mixture until a model for the disjoint transitions between mixture components has been developed.
The extensive review in [8] of previously proposed methods for assessing wind energy potential exposed the futility of seeking a universal single-model distribution for the marginal PDF of wind speed, p(V), based on goodness-of-fit alone. On this basis, the Weibull distribution is inevitably rejected in favour of esoteric distributions with multiple adjustable parameters. The majority of these methods operate on the marginal distribution of p(V), or of p(V) and p(Ɵ) independently, with the unfortunate consequence of losing all information on their joint action. Further, this approach is essentially an empirical curve-fitting exercise in which the more multiple adjustable parameters there are, the better the apparent fit will always be. It is important to recognise the essential difference between empirical and theoretical models: the former, obtained by curve-fit, is valid only within the range of the fitted data, so cannot be extrapolated, only interpolated 1 ; whereas the latter is a representation of the known physics from which reliable inferences can be drawn, including extrapolation outside the range of the data.
Only a few studies, so far, recognise that all wind climates are generated by a mixture of mechanisms, so is always best represented by a mixture of distributions [8]. Modelling the marginal PDF of direction has the additional complexity of circularity as the direction passes through North: Carta et al [7] and others [8] address this issue using a mixture of von Mises distributions. Even fewer studies attempt to represent the joint characteristics of the wind vector, whether in meteorological polar coordinates (V, Ɵ) [7] or in zonal-meridional coordinates (W,S) as here.
Consider the marginal distributions in Figure 7: • In (a), the whole-year p(V) evaluated from the OEN model, which fits the observations well, has a flattened region after the mode so is not a good fit in the body to a single Weibull distribution, although converging in (b) to Weibull, k = 2, in the upper tail. This overall shape is dictated by the shapes and relative frequencies of the individual ellipse distributions, each of which would be expected to be reasonably well represented by a single Weibull distribution, so leading to a Weibull mixture model as found by others [8].
• In (c), the whole-year p(Ɵ) evaluated from the OEN model, which fits the observations well, forms three lobes corresponding to NE, SE and SW winds. The NE lobe is formed by ellipses 2 and 6, the SE lobe by 3 and 5, and the SW lobe by 1, 4 and 8, each of which occupy different speed ranges in (a).
Returning to verify the principal predictions of the OEN model: 1.
The assumption, that all the seasonal-diurnal variation is encapsulated by the five OEN parameters for each ellipse, is verified in the u and v perturbations by the lack of any organised, non-decaying, diurnal cycles in their ACFs, and any residual variation about the fitted BLPN model is ascribed to "leakage" errors.
2. The values of ρ in Table 1 test the prediction that u and v are orthogonal and uncorrelated but can never actually be zero due to the imperfect demodulation. However, the values are sufficiently low for this to be a good working assumption.
3. The PDFs show that the prediction that u and v are Normally distributed holds well for about ±2 standard deviations, outside of which the tails tend to become exponential. It is not at all clear whether this is due to non-linear errors or to non-linear behaviour. The former can be discounted but the latter must be addressed, especially if the interest is in the extremes. The hyperbolic distribution, Equation 1, is a good fit to the tails but is a less good fit near the origin, suggesting that a mixture model may be better.
The unique characteristic of the OEN method is that it builds a bivariate Normal mixture model of p(V, Ɵ) from the observations without incurring any loss of information. The conventional marginal distributions of each V and Ɵ may be subsequently evaluated from the OEN model by marginalising.

Conclusions
The merits of the refined, extended, and automated OEN methodology are: 1) That previous model constraints have been removed or mitigated. 2) That automation of the implementation removes any possibility of unconscious confirmation bias.
3) Introduction of Bayesian regularisation to set the initial number of ellipses limits any predisposition to overfit the parameters but can be too aggressive.
4) The overall analysis time has been considerably reduced. The automated scripts run unsupervised and manual supervision is confined to: a) Checking/correcting the "threading" process which indexes the ellipses for successive observation times, and b) Checking/correcting the final fit for serial continuity. The open-source R scripts to implement the OEN mixture model are offered a useful tool to identify, separate and characterise the physical meteorology at any site with a reasonably long observational record at hourly, or shorter, intervals. and is sometimes called the "membership ratio". It represents the proportion of observations for any wind vector value that belong to the i-th ellipse. Without additional information it is not known to which ellipse any single individual observation belongs.
The fuzzy model parameters in (A.3) are the moments of W and S weighted by the fuzzy probability, F. For example, the mean W parameter for the i-th ellipse at a given month, M, and hour, H, is: where ∑ t represents summation of the observations matching M and H.

A.3 The u and v components in [10] and [11]
The u and v components are found from W and S at any given M and H by: where τ is the time lag and ∑ t represents summation over the whole record, excluding missing observations. Analogous to (A.5), the overall autocovariance of a disjoint mixture is: The fuzzy spectra must be obtained from the Fourier transform of the autocovariances because conventional Fast Fourier Transform methods do not tolerate missing observations. The generalised spectral model for bandwidth-limited "pink" noise (BLPN) introduced in [11] is: where T is the integral time scale, β is a decay constant and t is the averaging time of the observations: here, t = 600s for SYNOP/METAR mean wind speeds. A is a normalising constant that ensures ∫ ( ) = 2 ∞ =0 , the value of which must be obtained by optimisation. The sin 2 term is the Ogura [24] spectral window representing block-averaging over time t.