Article Preview
TopIntroduction
The exponential growth of e-commerce has transformed the way consumers interact with digital platforms (Sharma et al., 2023), making personalized recommendation and behavior prediction vital components of online retail systems (Nesterov, 2024). Accurately understanding and forecasting user behavior—such as whether a user will make a purchase, when the transaction might occur, and what value it may involve—can significantly enhance customer experience, operational efficiency, and commercial outcomes (Manikandam, 2024). This study aims to address this multifaceted prediction problem by introducing a unified and robust modeling framework that integrates diverse sources of user data.
In recent years, extensive research has been conducted in the domains of purchase prediction (Chen et al., 2024a), sentiment-based preference modeling (Lai & Hsu, 2021), and user intent mining (Kumari et al., 2024). However, the majority of existing works approach these tasks in isolation, relying on narrow input modalities or single-task models that fail to capture the interconnected nature of user decisions. While natural language processing models can extract sentiment from user reviews (Ounacer et al., 2023), and behavior-based models can analyze transaction sequences (Xie et al., 2025), they often neglect the complementary insights embedded in other modalities or fail to generalize across tasks. This fragmentation hinders the development of intelligent, end-to-end e-commerce systems capable of learning from holistic user interaction data.
The core research problem thus lies in how to design a multi-task, multi-modal framework that can learn from text, behavior, and structured data simultaneously, while ensuring both accuracy and scalability. It is hypothesized that a carefully constructed architecture that jointly models heterogeneous inputs and shared user intent representations can outperform traditional approaches and generalize across a wide range of behavior prediction tasks.
The purpose of this study is to explore this hypothesis through the development of a novel deep learning framework, multi-source deep prediction network (MSDP-Net). The model aims to predict three critical aspects of user consumption behavior: whether a user will convert (purchase intent classification), the value of the potential transaction (monetary regression), and the time to purchase (temporal regression). These tasks are formulated in a unified learning structure to maximize knowledge sharing and minimize redundancy.
Designing such a system poses several challenges. First, user data in e-commerce is heterogeneous and sparse, consisting of noisy text reviews, irregular behavior logs, and varied metadata fields. Second, combining multiple prediction objectives risks conflict or overfitting, especially if task-specific signals dominate the learning space. Finally, the real-time demands of online commerce necessitate models that are both efficient to train and explainable in deployment.
To address these challenges, this work proposes a deep neural architecture that includes: dedicated encoders for textual, behavioral, and structured inputs; an attention-based modality fusion mechanism to learn context-aware feature importance; and a shared backbone with task-specific output heads to enable multi-task learning without performance trade-offs. The model is trained and evaluated on large-scale real-world e-commerce data to validate its effectiveness and generalizability.
In summary, this study makes the following primary contributions:
-
•
It formulates a comprehensive multi-task learning framework for e-commerce behavior prediction that jointly addresses classification and regression tasks.
-
•
It introduces an attention-guided multi-modal fusion strategy that dynamically integrates textual, behavioral, and structured features.
-
•
It provides empirical evidence through extensive experiments that the proposed method outperforms strong baseline models in both accuracy and robustness.
-
•
It demonstrates the model’s practical applicability in various user and product segments, highlighting its value for intelligent commercial systems.