MLReal: Bridging the gap between training on Synthetic data and real data applications in ML (Alkhalifah et al., 2021)

One of the biggest challenges we face in machine learning applications is getting our trained neural network (NN) models to work on real data. If we train the NN on real data, we have to somehow extract the labels (if available, like first arrival picks), either manually or using hand crafted algorithms. However, the trained model predictions will be, at best, as accurate as these labels (if not worse). Synthetic data offer, in most cases, instant labels, however, they suffer from the curse of the domain shift; it is hard to make synthetic data statistically imitate real data, which is a requirement for NN generalization accuracy. So under the banner of domain adaptation, we suggest direct linear transformations to input seismic data that will mitigate the gap between synthetic training and real application data.

Applications on a microseismic location task directly from waveforms, and a low frequency extrapolation task demonstrate that there is life after training your NN model on synthetic data.

Figures_for_Research6_Tariq

 

References

Alkhalifah T., Wang H. and Ovcharenko O., 2021, “MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning”, EAGE Technical Program Expanded Abstracts.

Alkhalifah T., Wang H. and Ovcharenko O., 2021, “MLReal : Bridging the gap between training on synthetic data and real data applications in machine learning”, arXiv preprint arXiv:2109.05294.

Alkhalifah T., Wang H. and Ovcharenko O., 2021, “MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning”, EAGE Technical Program Expanded Abstracts.

Alkhalifah T., Wang H. and Ovcharenko O., 2021, “MLReal : Bridging the gap between training on synthetic data and real data applications in machine learning”, arXiv preprint arXiv:2109.05294.