Canadian meteorological resources and engineering data set (cweeds) is a data set of 143 observation locations in Canada generated by computer. This includes hourly meteorological data and hourly solar radiation data used for the design of wind and solar systems and assisting in the design of energy-saving buildings [cweeds document, Canadian meteorological energy calculation (CWEC document)]. The earliest data files began in 1953 and were not used in most locations until 2005. During the recording period, 35 observation stations have some measured data. In some recording periods, the data of solar observation points of 21 observation stations are consistent with the data of hourly meteorological observation points. The other 14 solar monitoring sites are usually located within 40km from the meteorological observation station. Another 108 stations contain solar radiation data generated by the model. The formation of these data comes from cloud cover data and other meteorological data.
The solar irradiance measurements were recorded according to the solar time and adjusted to the local standard time using the algorithm invented by Perez (Morris et al., July 4-8, 1992; Perez et al., 1990). Solar noon is the time when the sun is at the local meridian and at the highest position in the sky. In the northern hemisphere, the sun is due south at noon, while in the southern hemisphere, it is due north. Local standard time is the time defined for the local time zone, which is consistent in the whole time zone and does not adjust with daylight saving time.
According to the cweeds user manual, GHI values were estimated from 108 sites when ground-based measurements could not be obtained from 35 other stations (Davies et al., 1984; Canada 1985). In the period of missing meteorological data, especially the period of missing cloud observations, the GHI is estimated by won statistical model or linear interpolation. Data markers associated with each location can indicate whether solar irradiance data have been observed or whether solar irradiance data models have been established. In any given hour, the estimated root mean square error (RMSE) of GHI per hour is generally about 30% (Morris and Skinner, June 18-20, 1990). However, the long-term average RMSE is estimated to be 5% or less.
The goal of metstat model is to generate a solar radiation data set with statistical characteristics of actual values, rather than to accurately match the GHI of any specific day and specific hour. If the location of the observation station with solar irradiance measurement data is different from that with hourly meteorological observation data, it should be noted that the measured GHI may be affected by cloud cover or opacity in any given hour.
When the foundation GHI measurement data cannot be obtained, the DNI value is estimated by mac3 model; If hourly ground GHI measurements are available, the DNI value is estimated from the GHI value using the algorithm of Perez (Morris et al., July 4-8, 1992; Perez et al., 1990; Perez et al., 1991). When evaluating a solar data set, it is a useful method to describe the solar composition according to the eye space index (K1).
GHI divided by extraterrestrial irradiance is equal to the clear sky index. This standardized formula is easy to compare the data of the whole day or the whole year. Figure 1 shows the data of Ottawa in 1998. The data source is cweeds. The hourly scattering ratio (DHL / GHI) is marked as K1 on the figure. The following figure is a typical scattering ratio plan with more data dispersion points. In addition to a small amount of modeling data, the Ottawa data in 1998 are almost all observation data. Where there is no direct sun, the scattering irradiance is equal to GHI, and the scattering ratio is 1. In the most sunny period, DHI is 10% ~ 20% of GHI, so the scattering is relatively low. The clear sky period is concentrated in the lower left of K1 area in the plan, between 0.6 ~ 0.8, while DHI / GHI is between 0.1 ~ 0.3. Note: there are few data points on the left side of the main distribution area.

Figure 2 shows a similar 2005 Ottawa data chart. All data in 2005 were modeled. There are two problems in the figure. For many hours, the value of K is greater than 1, which should not occur. The value of hourly K is too high, which may be caused by the occurrence of snow on the ground and the overestimation of GHL under the multiple scattering rule.
The range of data points in Figure 2 shows that the range of DNI and DHI values obtained by mac3 model is larger than that obtained by foundation observation data. It is often difficult to model DNI and DHI values based on GHI values (Vignola et al., 2012). Generally, the scattering ratio is reasonable, but its distribution is different from that of the measured data in Figure 1. Since many scattering ratio GHI values do not occur naturally, using these values may lead to abnormal prediction or interfere with performance prediction. A simple example is that the design of the optimal PV system (from string size to inverter specification) depends on the maximum irradiance it receives. If the maximum irradiance is overestimated, the design is not optimal. Therefore, before using data to predict system performance, the best practice is to check the distribution of data.

Read more: analysis of the national solar radiation database