Leveraging machine learning techniques to predict fundamental data


Europe-based asset manager with significant AuM managed using quantitative strategies

Business situation

The client, quantitative asset manager, was looking to apply ML techniques to predict fundamental values based on linkages across fundamental data items

Benefits and outcomes of our engagement

  • Delivered a high % of accuracy in being able to predict missing values during the period 2016-17
  • We are currently in the process of assessing how time series models can be utilized to predict future fiscal periods based only on past data (using potentially ARIMA, Holt Winters, Structural TIME Series models)

SGA approach

Our client had the hypothesis that since corporate fundamentals include data items that have known / learned relationships between them, it is possible to apply ML techniques to predict fundamental values. SGA was asked to apply ML techniques to prove the client hypothesis. Our client received fundamental data from multiple different vendors
  • SGA deployed Bayesian Networks as its preferred algorithm
  • SGA’s model established specific predictor relationships across all variables. These were defined via an equation for each variable that was to be predicted
  • As part of the backtesting process, SGA leveraged fundamental data for select US corporates during the timeframe 2010-16. These algorithms were thereafter applied to fundamental data beyond 2016 across data sets with values deleted randomly
  • The challenge SGA faced was to build the final graph structure such that the relationships among the variables are well established and provided good validation for the test data
  • To expand the model to cover companies globally, SGA built generalized models to account for similar data points across multiple companies. Where generalized models were found to be appropriate, geography and sector-specific company information was grouped for empirical data
  • As part of the hypothesis testing, we also worked on alternate problem statements including expanding the task to run the algorithm across multiple vendors and non-fundamental inputs such as analyst estimates and leading economic indicators