The development of a physics-constrained machine learning framework for automatically deriving closed-form equations for complex water retention curves represents a significant leap in hydrogeology and materials science. This approach directly tackles the long-standing challenge of modeling multimodal porous materials, moving beyond empirical curve-fitting to enable the discovery of interpretable, physically consistent models directly from data.
Key Takeaways
- A new AI framework uses genetic programming and symbolic regression to automatically discover closed-form mathematical expressions for multimodal water retention curves from experimental data.
- The method embeds physical constraints directly into the model's loss function, ensuring discovered equations are physically consistent and mathematically robust, unlike standard superposition workarounds.
- The full implementation has been made publicly available in an open-source repository to support validation and application by the broader scientific community.
A New Paradigm for Modeling Multimodal Porous Materials
Modeling the unsaturated hydraulic behavior of porous materials with multimodal pore size distributions—where pores exist across multiple distinct size ranges—is notoriously difficult. Standard hydraulic models, like the widely used van Genuchten or Brooks-Corey equations, are typically unimodal and often fail to capture these complex, multi-scale characteristics. The common engineering workaround involves superposing multiple unimodal retention functions, each tailored to a specific pore size mode. However, this method requires separate parameter identification for each mode, which limits model interpretability and generalizability, especially in data-sparse scenarios common in field studies.
This research introduces a fundamentally different, meta-modeling approach. The proposed physics-constrained machine learning framework is designed to automatically discover closed-form mathematical expressions for multimodal water retention curves directly from experimental data. The system represents potential mathematical expressions as binary trees and evolves them using genetic programming, a type of symbolic regression. Crucially, physical constraints—such as monotonicity and boundary conditions—are embedded into the algorithm's loss function. This guides the symbolic regressor toward solutions that are not just accurate fits to the data, but are also physically consistent and mathematically robust, yielding interpretable equations.
The results demonstrate that the framework can successfully discover parsimonious closed-form equations that effectively represent the water retention characteristics of porous materials with varying, complex pore structures. To ensure transparency and foster collaboration, the authors have made the complete implementation publicly available in an open-source repository, facilitating third-party validation, application, and extension.
Industry Context & Analysis
This research sits at the convergence of two major trends: the digitization of geoscience through AI/ML and the growing demand for interpretable AI (often contrasted with "black-box" deep learning) in scientific domains. While deep learning models like neural networks can achieve high accuracy, their lack of interpretability is a significant barrier in fields like hydrogeology, where understanding the physical process is as critical as prediction. The choice of symbolic regression via genetic programming is a direct response to this need, prioritizing the discovery of human-readable equations over opaque, high-parameter neural networks.
The work directly challenges the incumbent modeling paradigm. The standard practice of superposing unimodal models (e.g., multiple van Genuchten equations) is not only cumbersome but is fundamentally a curve-fitting exercise with limited physical insight. Each mode requires its own set of fitted parameters (e.g., α and n in the van Genuchten model), which often lack clear, independent physical meaning when combined. In contrast, this AI-driven meta-modeling aims to discover a single, unified constitutive law that inherently describes the multimodal system, potentially offering greater insight into the underlying porous structure.
From a technical standpoint, the integration of physics constraints into the loss function is a critical innovation. It represents a move from purely data-driven machine learning to physics-informed machine learning (PIML). This is an active research frontier; for instance, NVIDIA's Modulus platform and the open-source DeepXDE library are popular frameworks for solving PDEs with neural networks. However, this paper's application of PIML principles to symbolic regression for constitutive model discovery is a distinctive and valuable contribution. It ensures the model obeys basic thermodynamic principles (like capillary pressure increasing as saturation decreases), which purely data-trained models might violate, especially when extrapolating.
The public release of the code is also strategically significant. In computational hydrology, reproducibility and community adoption are major hurdles. Established simulators like HYDRUS (with over two decades of development and thousands of citations) and TOUGH3 from Lawrence Berkeley National Laboratory dominate the field. For a new modeling approach to gain traction, accessible tools are essential. By open-sourcing the implementation, the authors lower the barrier to entry, inviting testing and application across a wider range of materials and scenarios, which is crucial for validating the method's generalizability.
What This Means Going Forward
The immediate beneficiaries of this research are hydrogeologists, soil scientists, and civil engineers working on critical problems involving unsaturated flow, such as landfill design, nuclear waste repository safety, agricultural water management, and contaminant transport prediction. For them, a reliable, interpretable model for multimodal materials can significantly reduce uncertainty in simulations, leading to better-engineered and safer environmental systems. The ability to derive a model directly from site-specific data could also streamline site characterization protocols.
In the broader AI-for-science landscape, this work demonstrates a powerful template: using genetic programming as a discovery engine, rigorously guided by domain knowledge (physics constraints), to extract fundamental laws from data. This paradigm could be extended well beyond hydrogeology. Similar challenges exist in modeling the complex constitutive behavior of polymers, composite materials, and biological tissues, where multi-scale structures dictate nonlinear mechanical responses. The framework provides a blueprint for automating the discovery of mathematical models in these data-rich but theory-complex domains.
Looking ahead, key developments to watch will be the framework's application to more extensive and noisy real-world datasets, its integration into commercial and open-source simulation software (like the aforementioned HYDRUS), and its extension to dynamic hydraulic and transport models, not just static retention curves. The ultimate test will be whether the equations it discovers become standard references in the field, akin to the van Genuchten model. If successful, this approach could mark a shift from a century of empirically derived constitutive models to a new era of AI-discovered, physics-consistent fundamental relationships in porous media and beyond.