Bayesian Federated Inference for estimating statistical models from non-shared multicenter datasets

Knowledge bank Publications Bayesian Federated Inference for estimating statistical models from non-shared multicenter datasets

To reliably identify predictive factors for an outcome via multivariable regression analysis, the data set must be large enough relative to the number of possible factors. In practice, sufficient data are often lacking. Using small data sets can lead to "overfitting" of the statistical model and, as a result, inaccurate estimates of the parameters in the model and unreliable predictions of the outcome of new patients. Combining data from different centers or datasets into one (larger) database would alleviate this problem, but is challenging in practice because of regulatory and logistical issues.

In this article, the authors describe a Bayesian Federated Inference (BFI) framework for multicenter data. The goal is to construct from local conclusions in individual centers what would have been derived if the datasets were pooled. The framework attempts to harness the statistical power of larger data sets without actually creating them. The BFI framework is designed to deal with small datasets by locally inferring not only the optimal parameter values but also additional features of the posterior parameter distribution. Importantly, a single inference cycle across centers is sufficient for the BFI method, while most Federated Learning strategies require multiple cycles across centers. The performance of the proposed method appears to be excellent. An R package has been developed to perform all the calculations and a user-friendly manual is available.

Read the article here: Bayesian federated inference for estimating statistical models based on non-shared multicenter data sets (wiley.com)