Causal-Enhanced Feature Validation for Robust Big Data-Driven Employment Market Analysis
Keywords:
Causal Discovery, Employment Market Analysis, Feature ValidationAbstract
This research propose Causal-Enhanced Feature Validation (CEFV), a novel framework for employment market analysis that integrates causal discovery with explainable machine learning to address the limitations of purely correlation-driven feature selection. The proposed method introduces a hybrid architecture combining gradient-boosted models with temporal causal discovery, thereby ensuring that predictive features are both statistically influential and causally plausible. At its core, CEFV employs a Gradient-Boosted Causal Validator (GBCV) to quantify feature importance using SHAP values, which are then cross-validated against causal graphs constructed by a Temporal Causal Discovery Unit (TCDU) based on the NOTEARS algorithm. Furthermore, the framework incorporates a rolling-window LSTM validator to capture dynamic causal relationships in time-series employment data, enabling adaptive feature validation across temporal contexts. The system bridges conventional predictive modeling with domain knowledge by discarding features with high predictive importance but lacking causal support, hence improving interpretability and robustness. Implemented using PyTorch Geometric and distributed computing tools, CEFV replaces manual feature selection with an automated, scalable pipeline that outputs validated feature subsets for downstream predictive tasks. Moreover, the integration of causal explanations into the user interface facilitates transparent decision-making by visualizing feature influences alongside their causal pathways. The key contribution lies in the unification of causal inference and model-agnostic interpretability, which distinguishes CEFV from existing employment analytics systems that rely solely on predictive performance. Experimental validation on real-world datasets demonstrates its effectiveness in identifying stable, causally grounded features while maintaining computational efficiency, making it suitable for large-scale employment market analysis.