Bayesian networks for inferring the relationship between individual behavior and social influence: A case study of early 20th century British travels in China

This study investigated the possibility of applying the Bayesian networks (BNs) in analyzing the relationship between individual behavior and social influence among early 20th-century British travelers in China. While historical studies have provided valuable details about social interactions, existing research using such studies has shown limitations in quantifying and analyzing complex relationships. This study attempts to address this gap by employing Bayesian networks (BNs) to construct a framework for modeling the probabilistic relationships between various factors influencing the travel patterns of British travelers in China in the early 20th century. These factors include political climate, economic considerations, and cultural interactions, which are sourced through historical studies, travel diaries, and other contemporary sources. The performance of the proposed Bayesian network model is evaluated using established statistical methods, including confusion matrices, cross-validation, and sensitivity analysis (SA). The results have shown the significance of the chosen model in analyzing the complex relationship selected analysis.


Introduction
The early 20 th century witnessed a significant period in global history characterized by intensified international trade exchanges and colonial expansion [1] , and this period generated a wealth of data that had emerged from the travelogues of British explorers in China.These data contain documentation of various interactions, including economic exchanges and complex cultural and social engagements [2] .Such interactions are analyzed to obtain the perspectives that provide valuable insight into the Historical Context (HC); such perspectives still influence contemporary global dynamics.However, existing models used for the analysis use descriptive models to gather insights, which cannot include subtle relations among different social factors.This limitation must be addressed by developing a quantitative model that is capable of understanding the complex multidimensional influence on an individual's behavior with respect to the HC [3] .
Current models that are used for the examination of HC are more dependent on linear frameworks, which are not capable of handling the complexities of social networks [4] .The Traditional statistical methods have shown limited ability to model the non-linear and probabilistic nature of social interactions and their impact on individual behavior [5] .These limitations are attributed to the historical and social dynamics involving multifaceted interactions influenced by various factors.Additionally, the existing models cannot represent the likelihood of social influence and individual choice preference in response to evolving social, political, and cultural environments [6] .To address these limitations, this work has chosen Bayesian Networks (BN), a model that employs interdependent variables to handle uncertain and incomplete data, which makes it a good choice for historical social studies.By using BN, the work further explores how individual behaviors were shaped based on the social dynamics of the historical time [7] .
Using the BN model, the work explores the complex interactions between British travelers and the sociocultural environment of China in the early 20 th century.The work starts with data collection through various sources, including travel diaries, letters, official records, and contemporary accounts [8] .Using the data, the probabilistic relationships among the identified factors are built using the BN model.The conditional probability scores for each factor are measured, highlighting the crucial factors influencing travel behavior during that time.The constructed model is evaluated using statistical techniques like confusion matrices, crossvalidation, and Sensitivity Analysis (SA).The results from this analysis prove the BN model's applicability in historical social studies.
The paper is structured as follows: Section 2 presents the background, Section 3 presents the model, Section 4 analyzes the results, and Section 5 concludes the work.

Background
The early 20 th century is identified as a period of significant international interaction and explorations of travelers to many foreign countries.This was when many British travelers had shown interest and traveled to the eastern part of the globe, particularly China.The increase in interest among British nationals is primarily attributed to the trade and the sharing of ideas, beliefs, and social norms [9][10] .
A better understanding of individual behaviors that had raised interest among British travelers during that time helped to understand better the HC associated with it.To assist this objective, a method based on a probabilistic graphical model known as Bayesian Networks is chosen to understand the relationships between individual behavior and social influence [11][12][13] .

Bayesian Networks (BN)
BN are graphical models that represent a set of variables and their conditional dependencies via a directed acyclic graph (DAG).These networks are used to model the probabilistic relationships among variables.Each node in the graph represents a variable, and the edges relate to the probabilistic dependencies between these variables [14][15][16] .The strength of these dependencies is quantified using conditional probabilities.BN's strength is its ability to combine prior knowledge (in the form of probability distributions) with new data to update beliefs about the state of the world using Bayes's Theorem.In the current study, the applicability of BN is particularly useful for modeling complex systems with many interdependent variables [18][19][20][21] .

Historical context
The early 20 th century was a period of significant transformation for both Britain and China that was marked by rapid industrialization, political upheaval, and evolving international relations.In Britain, that time was characterized by the tail end of the Victorian era, which transitioned into the Edwardian period right after the aftermath of the First World War.This period saw Britain at the peak of its imperial power, which had an extensive empire that was often referred to as an empire where the sun never set down.Furthermore, British society was undergoing considerable change related to technological advancements like the steam engine and telegraph, and the social structures were being challenged by emerging political movements, including suffragism and labor rights.At the same time, the Qing Dynasty, which had been ruling China for centuries, was in its final stages in China, and in 1911, the Republic of China was formed.During this period, China was engulfed with internal strife, efforts to modernize, and increased foreign influence and intervention through unequal treaties and establishing foreign concessions.
British travels to China during this time were motivated by different factors, including trade, diplomacy, missionary work, and tourism.Remarkably, the trade related to tea, silk, and porcelain has primarily influenced the traveler's interest in China.Right after the Opium Wars (1839-42 and 1856-60), many treaty ports were created, increasing travel.Cities such as Shanghai and Hong Kong had seen a steady increase in the influx of expatriates from Britain.Further, the invention of steamships and the opening of the Suez Canal in 1869 reduced the travel time between Europe and Asia; this, too, contributed to the increase in the frequency of British travels to China.This accessibility boosted trade and enabled various persons, including scholars, artists, and tourists, to visit China.Such British travelers often documented their experiences in China through writings, photographs, and collections of art and artifacts, which contributed to a fascination with Chinese culture in Britain.However, these interactions were defined by a complex relationship of imperialism, cultural exchange, and mutual curiosity that reflected the broader dynamics of East-West relations during this period.

Data sources
This study uses different HC data sources to analyze the BN model.The data sources cover the entire spectrum of the socio-cultural dynamics of early 20th-century British travels in China.The primary categories of data sources utilized include: (1) Historical records: Official documents, governmental reports, and archives provide information about the political and social context of the era.These records contain details about diplomatic relations, trade policies, and other factors influencing travel patterns.
(2) Travel diaries: Personal diaries of British travelers provide firsthand accounts of detailed descriptions of daily activities, experiences, perceptions, and interactions with local Chinese communities.
(3) Letters: Correspondences between travelers and their families, friends, or associates offer an understanding of the personal feelings attached to the experiences and thoughts of travelers about events or situations encountered in China.
(4) Contemporary accounts: Newspaper articles, books, and journals written during the period provide context and commentary on British travels in China.These are written by journalists, scholars, or other travelers, and such writings offer external perspectives on the behaviors and social influence of British travelers, as well as general public perception.
(5) Photographs and artifacts: Photographs and artifacts from that era offer additional dimensions that reflect the interactions, activities, and environments experienced by British travelers.Artifacts like souvenirs travelers collect can also indicate cultural exchanges and influences.
These varied sources enable a comprehensive and nuanced construction of the BN, capturing both the macro-level influences of social and political forces and the micro-level dynamics of individual decisionmaking and behavior.

Bayesian network model
In this section, we present a BN model for the chosen case study (Figure 1).This model is structured to illustrate the probabilistic relationships between various factors influencing travel behaviors, using standard notations found in research articles.Let's represent the critical variables as nodes in the BN.The edges represent dependencies between variables.For example, an edge from  5 ('Political Climate in Britain') to  1 ('Motivation for Travel') indicates a dependency, which can be represented as ( 1 |  4 ), the probability of  1 given  4 .The entire network is represented by a joint probability distribution, decomposed into a product of conditional probabilities, EQU (1).
where Parents (  ) represents the set of nodes having a direct influence on   .For each node, a conditional probability table (CPT) is formulated in EQU (2).For example, for the node 'Motivation for Travel' ( 1 ), the CPT might look like: where  11 and  12 are the probabilities of a particular motivation given the state of the political climate.To capture dynamic changes over time, we can introduce time-indexed variables, such as  1, for 'Motivation for Travel' at time .The temporal dependencies can be represented as ( 1,+1 |  1, ,  2, , … ).  1 reads as follows: for instance, if a traveler comes from a particular social background ( 3 = 1 ) and is in a specific political climate in Britain ( 5 = 1 ), then there is a high probability (0.9) that their motivation for travel (1) is substantial.
Table 2 indicates that, for example, if a traveler has a high motivation for travel ( 1 = 1 ) and perceives safety and risks favorably ( 7 = 1 ), then there is a high probability (0.8) that the duration and frequency of their travel ( 2 ) will be more significant.Conversely, if motivation and perceived safety are low ( 1 = 0 and 7 = 0 ), the probability that travel duration and frequency are high is much lower (0.1).Table 3 suggests that travelers with more extended and more frequent travels ( 2 = 1 ) have a higher probability (0.85) of having significant interaction with Chinese culture ( 4 = 1 ).In contrast, those with shorter and less frequent travels (2 = 0 ) have a much lower probability (0.15) of such interaction.
Table 4 indicates, for instance, that if both the political climate in Britain ( 5 ) and in China ( 6 ) are favorable, then there is a high probability (0.75) of perceiving high safety and low risks (7).Conversely, if both political climates are unfavorable, the likelihood of perceiving high safety and low risks significantly decreases to 0.25.
By altering the values of particular nodes, we can perform predictive analysis.For instance, modifying  4 to reflect a change in the political climate and observe the resultant probabilities in  1 ,  2 , … ,   .The SA is approached by observing the changes in the output probabilities with marginal changes in the input probabilities, quantified as: indicating how sensitive   is to changes in   .Further, the Probabilistic Interpretation allows for interpreting how varying degrees of social influences impact travel behavior.For instance, calculating the conditional probabilities such as ( 2 |  3 ,  4 ) offers insights into how the social background and political climate influence travel duration and frequency.

Analysis
For assessing the performance and accuracy of a BN model in the context of a case study, the following statistical measures were measured:

Confusion matrix and accuracy metrics
The confusion matrix in Table 5 assesses the model's classification in terms of TP, FP, FN, and TN.In this analysis, the BN model correctly predicted 120 instances as positive (TP) and 305 cases as negative (TN).However, it labeled 30 instances as positive when they were negative (FP) and 45 as negative when they were positive (FN).
To translate the confusion matrix into a more comprehensive evaluation, three key metrics, accuracy, precision, and recall, were used to evaluate the model.The accuracy is the sum of TP and TN among all predictions for which the BN scored 85%.Precision measures the proportion of TP among optimistic predictions for which the BN had scored 80%.For Recall, which is the proportion of TP that were correctly identified, the BN had scored 72.7%.The analysis shows that the accuracy of the BN model is comparatively better than that of precision and recall.This indicates that the model has considerable space for enhancement to reduce the false negatives to increase recall.

Cross-validation
The results of the 5-fold cross-validation process are shown in Table 6 and Figure 2. In this crossvalidation analysis, the input data is divided into five distinct folds, and the model is trained and tested five times, with one different data fold used for testing and the remaining folds for training.

批注 [A1]:
This table should be Table 6.The mean accuracy across all folds was calculated to be 0.84, which shows the model's consistent and reliable performance.The model also showed a low standard deviation of 0.0141 in accuracies across folds, proving the model has minimal variation and is robust in prediction across different data folds.

ROC curve analysis
The ROC curve is a graphical representation of a classification model's diagnostic ability as its discrimination threshold is varied.The curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

批注 [SS5R4]: 批注 [SS6R4]: yes
In the generated ROC curve in Figure 3, the model has an AUC of 0.65.The AUC value ranges from 0 to 1, where an AUC of 1 indicates a perfect model that makes all predictions correctly.An AUC value of 0.5 suggests performance no better than random chance, and an AUC less than 0.5 implies a model performing worse than random chance, which might indicate that the model is inversely predicting the outcomes.The AUC of 0.65 for this model suggests that it can reasonably distinguish between the two classes (e.g., high versus low interaction with Chinese culture).However, there is significant room for improvement, as the model is not far from the AUC of 0.5, representing a random guess.
Given the AUC score, the model is considered to have moderate predictive power.It can correctly rank a randomly chosen positive instance higher than a randomly chosen negative instance with a probability of 0.65.For our case study, this means that while the model has some predictive capabilities, it may not be highly reliable in all scenarios.It could be more effective in some areas of prediction, such as identifying travelers with high motivation for cultural interaction, but less effective in others.

Sensitivity analysis
The analysis explores the extent to which changes in input variables influence the likelihood of increased travel frequency, demonstrating the model's responsiveness to different socioeconomic and technological factors.In Table 7 and Figure 4, for the political climate, a change of 0.10 (presumably indicating a 10% improvement in political stability or relations) leads to a 0.05 (or 5%) increase in travel frequency.This suggests a moderate sensitivity of travel frequency to political conditions, reflecting how diplomatic and political environments could impact travel decisions.For economic factors, a more substantial change of 0.20 (indicative of a 20% improvement in economic conditions or opportunities) results in a significant 0.15 (15%) increase in travel frequency.This indicates a strong correlation between economic conditions and travel behavior, highlighting the importance of financial incentives in influencing travel.For social norms, a minor adjustment in social norms by 0.05 leads to a 0.07 increase in travel frequency.This 7% increase, resulting from a relatively small change, underlines the sensitivity of travel behavior to social factors, possibly reflecting changes in societal attitudes towards travel or interactions with different cultures.For technological advancement, a 0.15 (15%) advancement in technology correlates with a 0.10 (10%) increase in travel frequency, showing that improvements in transportation or communication would encourage travel.

Conclusion
The application of Bayesian Networks (BN) in this study has provided valuable insights into the complex relationship between individual behavior and social influence during the early 20th-century British travels in China.By employing different historical data for analysis using a probabilistic graphical model, the research has successfully quantified and analyzed the dynamics of this historical period.The findings revealed how various factors, such as political climate, economic conditions, and technological advancements, influenced British travel behaviors.The models were experimented with using different statistical evaluations, and the results have shown the potential of BN in analyzing the historical and social science aspects.This analysis using BN provides an understanding of the past and also provides ways for applying advanced statistical methods in the exploration of historical and cultural phenomena.The study has shown the significance of interdisciplinary approaches in understanding the complexity of human behavior and social interactions across different historical epochs.

Figure 1 .
Figure 1.BN for the case study.
For instance,   1 for 'Motivation for Travel'   2 for 'Duration and Frequency of Travel'   3 for 'Social Background of Travelers'   4 for 'Interaction with Chinese Culture'   5 for 'Political Climate in Britain'   6 for 'Political Climate in China'   7 for 'Perceived Safety and Risks' 727 (Approximately 72.7% )

Table 1 .
The conditional probability table (CPT) for the node 1.Table

Table 2 .
The conditional probability table (CPT) for the node 2.

Table 3 .
The conditional probability table (CPT) for the node 4.

Table 4 .
The conditional probability table (CPT) for the node 7.