Consolidation forecast archive.    

U.S. Temperature and precipitation probability of exceedence forecasts.

Forecast description.
   These forecasts are produced by a linear regression technique called 'Ensemble Regression'   Ensemble 
   regression treats all members of a forecast ensemble as a potential solution to the problem.  In a 
   single model ensemble, all members are considered to be equally likely to be the 'best' .  In a 
   multi-model ensemble, the members from the more skillful models are assumed to be more likely to 
   occur.   The ensemble regression procedure assumes that the conditional error distribution for the 
   best member (that is, the expected errors between the closest solution and the observation) are about 
   the same regradless of which model produced it.  The ensemble regression procedure derives a least 
   squares solution to the entire ensemble set from the standpoint of minimizing the expected errors 
   between the best member and the observation.   

File names and variable ids.

File names.

   There are two types of archive files, hindcast and operational forecasts.  A hindcast file  contains a 
   'clean' set of retrospective forecasts.  These forecasts are generally made in research mode from 
   historical data.   The “operational forecast” files contain the data issued in real time, and 
   may including any variations in procedures or even errors, if those forecasts were officially 'issued'.  

Hindcast datasets are identified by a data identifier (ID).  Operational forecast datasets are identified 
by the element followed by a “.operational” suffix in the dataset name.

File format.

  Files for the 102 climate divisions are in ASCII format in a simple spreadsheet format.  Data are 
  grouped by the order in which the forecasts were issued.  Forecasts are typically for 102 forecast 
  divisions based on the NCDC climate division data, and are for three month seasons.   
Column 1 = Year and month of the center month of the three month season in YYYYMM  format..
Column 2 = Year that the forecast was issued
Column 3 = Month that the forecast was issued.  A forecast is typically issued around the 2nd week of 
          the month, so a forecast labeled 1982   1 would have been issued around mid-January, 1982.   
Column 4 = Lead time, in months, between the latest data used for this forecast and the START of the 
             valid time.   So a 1 in this column indicates a 1-month lead time.  A forecast issued in 
	     January, 1982 would typically be based on data through the end of December, 1981, so a 
	     1-month lead would refer to the 3-month period starting on February 1, 1982 (Jan + 1)=Feb.  
	     and extend through the end of April (FMA).  This is labeled by the center month, M=March, 1982) 
	     hence Column 1 for this example would read 198203.

Column 5 = Forecast division for which this data is valid. (See CPC website)

Column 6-   Probability of exceedence values.  Column 6 gives the value expected to be exceeded 98% of the time.  
Column 7   95% PoE
Column 8   90% PoE
Column 9   80% PoE
Column 10  70% PoE
Column 11  60% PoE
Column 12  50% PoE
Column 13  40% PoE
Column 14  30% PoE
Column 15  20% PoE
Column 16  10% PoE
Column 17   5% PoE
Column 18   2% PoE

Column 19   Gives the expected value (Mean) of the distribution of observations expected for this forecast.
Column 20  P(N+A) , Gives the Probability that the observation will be in the Normal or Above normal class.
  Here 'Normal' refers to the middle third of the distribution (not necessarily near the expected value).  
  Below normal is the lower third (0-33.3%) of the climatological distribution of observations. Near 
  Normal is the middle, (33.3-66.6%), and above normal is the upper third (66.7%-100%) of the climatological 
  distribution.  Climatology is always defined by the observations of the last 3 complete decades 
  (ie. 1961-90, 1970-2000).
Column 21 Gives the probability of Above normal)  (P(A)).

Column 22.   Gives the effective skill of the relationship.  The value in this column is defined as:
R=SQRT(1-Vf/Vb) Where Vf is the forecast error variance 
(Expected value of (Forecast - Observation)^2)
and Vb is the climatological variance of the observations (Observations-Obs mean)^2.
   
For positive values, this produces a skill estimate is similar to the correlation coefficient between the forecast and observations.  Negative values signify that the models are predicting a greater variance in the expected observations than climatological variance.  

Column 23  Forecast ID.  
    Because models may change in the course of time, each forecast is given an idea to help identify how it was made.

Forecast ID.
CCCVVV
CCC = decimal equivalent of binary model inclu  
VVV = a version number.
Key to CCC   = Each of four current input model forecast tools is given a position in a binary field. 
  ECCA = Ensemble CCA  
  CFS    = CFS model ensembles
 CCA    = Canonical Correlation Analysis (Barnston)
 SMLR = Screening Multiple Linear Regression.

                Binary                         Decimal
       ECCA, CFS, CCA, SMLR     CCC
           0          0        0         1        = 001
           0          1        0         0        = 004
           0          1        1         1        = 007
           1          1        1         1        = 015