Physics-constrained symbolic regression for discovering closed-form equations of multimodal water retention curves from experimental data

Researchers developed a physics-constrained machine learning framework that automatically discovers closed-form mathematical models for multimodal water retention curves in porous materials. The method uses genetic programming with embedded physical constraints to ensure discovered equations are physically consistent and mathematically robust, addressing limitations of traditional superposition approaches. The full implementation has been made publicly available in an open-source repository.

Physics-constrained symbolic regression for discovering closed-form equations of multimodal water retention curves from experimental data

A team of researchers has introduced a novel physics-constrained machine learning framework that automatically discovers closed-form mathematical models for the complex water retention behavior of porous materials. This work represents a significant shift from traditional empirical curve-fitting, offering a more interpretable and generalizable approach to a fundamental problem in geotechnical engineering, hydrology, and materials science.

Key Takeaways

  • A new AI framework uses genetic programming to automatically discover closed-form equations for multimodal water retention curves from experimental data.
  • The method embeds physical constraints directly into the learning process, ensuring discovered models are physically consistent and mathematically robust.
  • This approach addresses the limitations of superposing multiple unimodal models, which requires separate parameter fitting for each pore size mode and lacks generalizability.
  • The full implementation has been made publicly available in an open-source repository to enable validation and extension by the scientific community.

A Physics-Constrained AI Approach to Modeling Porous Materials

Modeling how water is retained in unsaturated porous materials with complex, multimodal pore size distributions is a long-standing challenge. Standard hydraulic models often fail to capture these multi-scale characteristics. A prevalent workaround involves superposing multiple unimodal retention functions, each calibrated to a specific pore size range. However, this method necessitates separate parameter identification for each mode, which limits model interpretability and generalizability, particularly in data-sparse scenarios common in field studies.

This research proposes a fundamentally different strategy: a physics-constrained machine learning framework for meta-modeling. The system is designed to automatically discover closed-form mathematical expressions for multimodal water retention curves directly from experimental data. The framework represents potential equations as binary trees and evolves them using genetic programming, a type of symbolic regression. Crucially, physical constraints—such as ensuring the retention curve is monotonic and bounded between 0 and 1—are embedded into the loss function. This guides the symbolic regressor toward solutions that are not only accurate but also physically consistent and mathematically robust. The team's results demonstrate the framework's ability to discover equations that effectively represent water retention in materials with varying pore structures.

In a commitment to open science and reproducibility, the researchers have made the complete implementation publicly available in an open-source repository, facilitating third-party validation, application, and extension of the work.

Industry Context & Analysis

This research sits at the convergence of two major trends: the application of AI to scientific discovery (AI for Science, or AI4Science) and the growing demand for interpretable machine learning in engineering. Unlike "black-box" models like deep neural networks, which can achieve high accuracy but offer little insight into underlying mechanisms, symbolic regression seeks to discover human-readable equations. This aligns with efforts by companies like Symbolica and research from entities like the MIT-IBM Watson AI Lab, which focus on making AI outputs more interpretable for scientific and engineering tasks.

The proposed method offers a distinct advantage over the current industry-standard approach of superposing unimodal models like the van Genuchten or Brooks-Corey equations. That traditional method, while practical, often results in models with 6-10 or more fitted parameters that lack physical intuition and do not generalize well beyond their calibration dataset. In contrast, this AI-driven meta-modeling framework aims to distill the complex behavior into a single, coherent mathematical expression with inherent physical constraints, potentially reducing overfitting and improving predictive capability for novel materials.

From a technical perspective, the integration of hard physical constraints into the loss function is a critical innovation. It moves beyond post-hoc validation, actively steering the search toward physically plausible solutions—a technique gaining traction in physics-informed neural networks (PINNs). The choice of genetic programming is also strategic; while libraries like PySR (with over 1.2k GitHub stars) have popularized symbolic regression, custom frameworks that incorporate domain-specific knowledge, as seen here, are often necessary for cutting-edge scientific applications. The public release of the code is a significant contribution, as reproducibility remains a major hurdle in AI-driven science, where many published models lack accessible implementations.

What This Means Going Forward

The immediate beneficiaries of this research are scientists and engineers in geotechnics, hydrology, and porous materials research. By providing a tool to automatically generate interpretable, physics-consistent models, it could accelerate material characterization, improve the accuracy of subsurface hydrology models, and enhance the design of construction materials or filtration systems. The open-source nature of the work lowers the barrier to entry, allowing both academic and industrial labs to test and adapt the framework for their specific needs.

Looking ahead, the methodology's success suggests a broader application beyond water retention curves. The core concept—using physics-constrained genetic programming for meta-modeling—could be adapted to discover constitutive models for other complex material behaviors, such as stress-strain relationships in composites, chemical reaction kinetics, or thermal properties. This positions the work as a potential template for automating the discovery of mathematical laws in various engineering domains.

A key area to watch will be the framework's performance on larger, more diverse datasets and its comparison against state-of-the-art machine learning surrogates in terms of accuracy, computational cost, and, most importantly, generalizability. If it consistently produces simpler, more robust models than current empirical or pure ML approaches, it could catalyze a shift toward AI-assisted, first-principles-informed modeling in applied physics and engineering, moving the field closer to truly automated scientific discovery.

常见问题