Mobile phone with website of ESG rating company Sustainalytics
VoxEU Column Environment Financial Markets

Measuring uncertainty in ESG ratings

Environmental, social, and governance ratings have become increasingly important inputs to investment decisions. This column investigates the sources of uncertainty in ESG ratings of 20 globally important stock indices and their underlying stock components. Simulations reveal that uncertainty is primarily related to the choice of ESG rating provider, which is a further consideration for the planned EU regulation of rating activities.

With the provisional agreement of the Council and European Parliament reached on 5 February 2024, the proposal for an EU regulation on environmental, social and governance (ESG) ratings got one step closer to acceptance. The ultimate goal is to boost investor confidence in sustainable products (European Council 2024). A similar legislative proposal related to credit ratings was adopted a decade ago after the European debt crisis. The fundamental reasonings behind the two regulatory actions are identical for credit and for ESG ratings: the need to restore market confidence and increase investor protection. The provisional political agreement is subject to approval by the Council and the Parliament before going through the formal adoption procedure. The regulation will start applying 18 months after its entry into force.

Earlier papers argued that investors and regulators should reopen the discussion about the concepts and practice of ESG ratings to support the sustainable finance community in reaching their self-imposed objectives with the ESG measurement (Dorfleitner et al. 2015, Drempetic et al. 2020). Berg et al. (2019) highlighted the importance of three factors in ratings – scope, measurement and weight – and concluded that measurement divergence explained more than 50% of the overall divergence.

Simulation setup for ESG ratings

In a related paper (Erhart 2022), I investigated the sustainability characteristics of 20 global stock exchange benchmarks mostly from Europe 1 by analysing and aggregating the ESG ratings of the stocks in the benchmarks. The key objective was to understand empirically the uncertainty of ratings and to broaden the scope of the ESG discussion from ESG benchmarks to general stock benchmarks, as these constitute the majority of the benchmark universe.

The development of ESG ratings, like any measurement, entails assumptions and subjective decisions. Hence, one of the key research objectives was to test whether and to what extent some of the assumptions influence the ESG ratings, within a range of plausible alternatives. A Monte Carlo experiment was performed and the aggregated ESG rating was rebuilt for each stock 4,000 times. In each simulation run, we randomly selected combinations of three assumptions on (i) the aggregation formula, (ii) the weighting scheme and (iii) the data provider.

Assumptions tested in the Monte Carlo simulations:

  • Aggregation formula: In practice, rating providers aggregate the ‘E’, ‘S’ and ‘G’ scores into a single ESG score by using the weighted arithmetic mean. The geometric mean was chosen as an alternative approach, which is a non-compensatory aggregation method. In this way, high scores in one component of the ESG rating do not compensate for low scores in another.

While many investors rely solely on the aggregated headline ESG ratings when screening investment target companies today, substitution of E, S and G implies  important theoretical and practical questions for investors and also for regulators. Can environmental degradation be perfectly substituted for example by social benefits in terms of less workplace injuries (curve A in Figure 1)? Or just imperfectly substituted (curve C)? Or are they complementary, with both needed in a sustainable business (curve B)?.

To investigate the questions of substitution and aggregation, the aggregation formula was randomly varied and scores were either aggregated by the arithmetic or the geometric mean.

Figure 1 Example of an ESG indifference curve

Figure 1 Example of an ESG indifference curve

  • Weights. The second assumption tested was the weighting scheme. Nominal weights assigned at the disaggregated level are all equal (=1/3) in the Sustainalytics methodology and sector-specific in the Refinitv ESG rating methodology. Therefore, the effect of randomly varying weights by +/−25% around the equal weights is tested, to investigate the effect of minor variations in the importance of different ESG components.
  • Data provider. The third assumption we tested was the data provider. ESG investors have a free choice to select their preferred data provider. However, in contrast to credit ratings, the precision and efficiency of ESG ratings cannot be judged on the basis of ex-post back-testing. As an analogy of credit ratings, there are no observations on outcome variables such as default events in case of credit ratings. To test the uncertainty faced by an uninformed ESG investor from data provider selection, we varied randomly the data provider of the ESG scores between Sustainalytics and Refinitiv.

The rating methodology matters

A common obstacle to the use of ESG ratings is that the ratings of different providers are often not directly comparable. The ratings of Sustainalytics and Refinitiv in our sample are no exemptions: the Sustainalytics ESG score is a risk score, while the Refinitiv score measures good performance. We transformed statistically the scores of Sustainyalytics onto the industry specific normalised scale of Refinitiv to make them comparable, though there still remained substantial discrepancy in their ratings of the same stocks.

In the ideal case, there should also be positive, significant correlations within aggregated ratings and underlining scores, (OECD-JRC 2008). Both the Refinitiv and Sustainalytics subscores comply with the above requirement, as the correlation ratios of the environmental, social and governance scores with the ESG headline rating are balanced and vary within the recommended range of 0.4–0.8 for meaningful aggregates. However, the pairwise correlation between Refinitiv and Sustainalytics scores on the E, S and G pillar level is weak (0.1–0.3). Furthermore, the association of the Sustainalytics environmental scores with the social and governance scores is not very strong, and this may limit opportunities in sustainable finance (see Table 1).  Specifically, this implies that one cannot find an investment portfolio building on the sample of stocks in the analysed benchmark indices without trade-offs between environmental social and governance goals.

Table 1 Cross correlation table

Table 1 Cross correlation table

Simulation results

In general, the ESG aggregated ratings are not very robust, and users should take them with a pinch of salt. The choice of the ESG data provider has a major impact on the overall ESG evaluation of stocks (Figure 2, panel A), and variation of aggregation rules and weights have minor impacts (Figure 2, panel B, C).

For scores below 80, the difference in ESG assessment based on Sustainalytics and Refinitiv bcomes very wide. The lowest ESG scores are particularly dependent on methodological choice. This finding could be used to guide the conclusions that can be based on the ESG scores. For example, differences of 5–10 between issuer companies on the standard 0-100 scale used by Refinitiv cannot be deemed as highly significant, whereas differences of 30 upwards or downwards can show a meaningful difference. ESG investment strategies are often built upon specific strategies. For example, some investors apply (i) best-in-class or (ii) exclusion rules. Our simulation analysis reveals that only the best-in-class rule can be effectively based on the ESG scores of stocks in the analysed stock indices.

Figure 2 Monte Carlo simulation results: ESG ratings           

Figure 2 Monte Carlo simulation results: ESG ratings


Finally, the Monte Carlo simulation results were aggregated onto the stock benchmark level (Figure 3). European stock exchanges provide better ESG investment opportunities compared to other exchanges on average. The stock indices of Hong Kong–China (Hang Seng), Japan (Nikkei225) and Moscow (RTS) are on the other side of the distribution, probably due to the higher share of industries with more ESG risks and controversies. It should be noted that the uncertainty analysis changed the position of some exchanges significantly. For instance, the Helsinki Stock Exchange Index (HELOMX) is ranked relatively lower in the uncertainty analysis compared to the simple rankings based on ESG score averages of benchmarks. This finding confirms the challenges stemming from the substitution of E, S and G issues and is a reminder for investors to take the ESG scores always with a pinch of salt.

Figure 3 Box plot of ESG scores by exchanges, Monte Carlo simulation results

Figure 3 Box plot of ESG scores by exchanges, Monte Carlo simulation results

Note: 5%, 30%, 70%, 95% percentiles and median values.

Final remarks on remaining challenges of ESG ratings

In sum, uncertainty in ESG ratings of stock investments is considerable. This is partly because of the different methodological approaches of rating providers due to the complexity and multifaceted nature of the underlying ESG concept. After the ESG rating framework is carved into a possible regulation, convergence of methodologies may help to improve the association of ESG ratings from different providers, which in turn may restore market confidence and increase investor protection. Regulators and rating providers would need to agree on the details, on the concepts, on the underlying indicators and units of measurement. However, this may take some time.


Berg, F, J Kölbel and R Rigobon (2019), “Aggregate confusion: The divergence of ESG ratings”, SSRN Electronic Journal.

Drempetic, S, C Klein and B Zwergel (2020), “The influence of firm size on the ESG score: Corporate sustainability ratings under review”, Journal of Business Ethics 167.

Dorfleitner, G, G Halbritter and M Nguyen (2015), “Measuring the level and risk of corporate responsibility–An empirical comparison of different ESG rating approaches”, Journal of Asset Management 16.

Erhart, S and K Erhart (2023), “Environmental ranking of European industrial facilities”,, 13 April.

Erhart, S (2022), “Take it with a pinch of salt—ESG rating of stocks and stock indices”, International Review of Financial Analysis 83.

European Council (2024), “Environmental, social and governance (ESG) ratings: Council and Parliament reach agreement” , 5 February.

OECD-JRC (2008), “Handbook on constructing composite indicators: methodology and user guide”, European Union Joint Research Centre.

Reichelstein, S, I Kadach, G Ormazabal and S Cohen (2022), “Executive compensation tied to ESG performance”,, 2 August.

Koskinen, Y, R Santioni and R Albuquerque (2021), “Mutual funds’ loyalty helped to stabilise ESG stocks during the COVID-19 market crash”,, 23 November.


  1. There were 15 European stock indices in the sample (Austria - ATX, Belgium - BEL, Denmark - COPOMX, Finland - HELOMX, France - CAC40, Germany - DAX30, Hungary - BUX, Italy - MIB, the Netherlands - AEX, Norway - OBX, Russia - RTS, Spain - IBEX, Sweden - OMXSTO, Switzerland - SMI, United Kingdom - FTSE100), 2 from North America (United States - SNP500, Canada - TSX), Australia - ASX and two from Asia (Hong Kong, China - Hang Seng and Japan - NIKKEI225). The two sources of our ESG rating dataset were Sustainalytics published by Yahoo Finance and Refinitiv. in November 2020 and in April 2022.