Spatial heterogeneity analysis and source identification of heavy metals in soil: a case study of Chongqing, Southwest China

Heavy metal pollution in urban soil is an important indicator of environmental pollution. Selecting the best interpolation method can accurately reflect the distribution characteristics of heavy metals in soil. In addition, source analysis can reveal heavy metal pollution of soil and help manage and protect the soil environment. This study used a uniform sampling method and obtained a total of 342 sampling points. After acid reduction, the concentration of As, Cu, and Mn in each sample was determined by ICP-MS (Agilent VISTA, USA). The accuracy and results of different spatial interpolation methods were compared and the CATREG model was used to identify the sources of heavy metal pollution. The average concentration of As, Cu, and Mn were 5.802 mg kg −1, 23.992 mg kg−1, and 573.316 mg kg−1, respectively, lower than the soil background value of Chongqing. Compared to other Chinese cities and countries in the world, the concentration of As and Cu was lower in Chongqing, while the concentration of only Mn was higher. The interpolation results of inverse distance weighting (IDW) and radial basis function (RBF) largely retained the maximum information of element concentration. Soil source identification found that population density mainly affected Cu (0.539), slope Mn (0.206), and water quality As (0.453). The highest hotspot value (99% confidence interval), high hotspot value (95% confidence interval), and high hotspot value (90% confidence interval) of As were adjacent to the secondary water environment. Furthermore, the highest hotspot value of Cu was mainly located in the surrounding areas with population density > 3000/km2 and 1000–3000/km2. Mn was distributed along the slope direction and diffused from center to periphery. Different spatial interpolation methods are significant for the analysis of soil properties. Heavy metals have a high degree of coincidence with environmental factors such as slope and population. The results of this research provide a reference for formulating effective control and management strategies for heavy metal pollution of soil.


Background
Due to environmental changes (soil-forming processes and spatial continuity of climate zones), soil characteristics are interrelated, not uniform or independent [1]. Generally, soils are long-lasting and defined by irreversibility, concealment, and hysteresis [2]. Urban soil is an important part of urban environmental factors. Its quality is directly related to human health and safety, which is of great significance for the sustainable development of the city. However, urban soil is also the main gathering place of urban pollutants, such as heavy metals. They are widely present in various forms and media of the urban environment, such as atmospheric dry and wet deposition, dust, soil, etc. Heavy metals enter urban soil in large quantities, resulting in heavy metal pollution and the degradation or even loss of the original function of urban soil [3]. Therefore, heavy metal pollution of urban soil is an important indicator of environmental pollution and has become the focus of research on urban environmental pollution.
Geostatistics is a mathematical and geological method based on regionalized variables, spatial correlation, and variation functions. After years of development, geostatistics has become a mature tool for studying spatial variability because it can maximize the preservation of information on spatial variability. It has also reached a mature stage in the application of mineral geology and has been used in hydrology, soil science, and other fields [4]. Optimizing the spatial interpolation method is the key to accurately predict spatial distribution characteristics and pollution risks of heavy metals in regional soil [5]. The selection of interpolation models determines how effective the evaluation of heavy metal pollution of soil will be. The spatial interpolation methods widely used in the evaluation include inverse distance weighting, ordinary kriging, spline function, multiple regression, radial basis function, etc [6,7].
With the development of industry, people had gradually realized the threat and impact of heavy metals on the human body. Scholars began to study heavy metals in the 1980s and 1990s. DeRosa [8] evaluated the impact of toxic lead on children's physical and mental development. Before 2000, most research focused on the spatial distribution of heavy metals and assessment of heavy metal pollution. With the development of geostatistics, the kriging and inverse distance weighing interpolation methods have been used in spatial distribution research. From 2000 to 2010, the research focused on the pollution index, spatial variation, biological stimulation, and biological monitoring of heavy metals in soil. After 2010, monitoring heavy metal pollution sources has become a hot topic in soil science. At present, source analysis models in soil science mainly include absolute factor analysis/multiple linear regression (APCS/MLR), positive definite matrix factorization (PMF), UNMIX, isotope ratio method, etc. Due to the differences in parent material, climatic conditions, and human disturbance, the sources of heavy metals are quite different. It is necessary to detect specific sources of heavy metals in different soils. For example, Liu [9] found that Hg in agricultural soil came from coal combustion, Cd from agricultural practice, and Cu, Pb and Zn from agricultural practice and industrial activities in the southern Shandong Peninsula of China. Aguilera [10] noted that human activities were the main sources of Pb, Zn, Cu, and Cr in street dust of Mexico City. Ivankovic [11] believed that Ni in the urban soil of Belgrade in Serbia came from peridotites and serpentinites, while Pb came from traffic activities. This study shows the applicability of OK, IDW, LPI, and RBF to spatial interpolation of three heavy metals. It also highlights the necessity of using different interpolation methods when studying the spatial distribution of soil properties. In addition, analyzing variation in the spatial distribution of heavy metal pollution and identifying its main influencing factors on the soil of Chongqing can provide a basis and reference for revealing the sources of heavy metal pollution and exploring the ecological restoration model in the central city of Chongqing.
Previous studies have shown that the sampling method and regional scale play an important role in the research on heavy metal pollution of soil [12,13]. Different interpolation methods are selected to show different emphases of heavy metals in soil. It is assumed that uniform sampling can more accurately reflect the actual concentration of heavy metals in different areas, while the interpolation method can partly retain the information on spatial variation and quantify the factors that affect spatial heterogeneity of heavy metals in soil (such as traffic, air quality, industrial activities, parent materials, etc.) to obtain the influence of various factors on heavy metals. On the basis of ecological geochemistry research in Chongqing, three representative heavy metals (As, Cu, Mn) were screened out from 11 elements (As, Cd, Cr, Pb, Cu, Zn, Ni, Hg, Sb, Mn, Mo) according to the content of soil elements and the degree of pollution to the environment. As a city with rapid urbanization, the enrichment of As, Cu and Mn can pose varying degrees of threat to human health. The enrichment of As can enter the human body through respiratory tract, digestive tract and skin contact. If the intake exceeds the excretion, it can accumulate in the liver, kidney, lung and other parts of the human body, resulting in chronic arsenic poisoning. If Cu is excess, it will cause liver cirrhosis, diarrhea, vomiting, motor disorder and sensory nerve disorder. The occurrence of Mn poisoning can cause different degrees of tears, photophobia, cough, and even pulmonary edema. Therefore, As, Cu and Mn were selected to explore their distribution, spatial variation and source analysis. Therefore, the purpose of this study is (1) to describe the characteristics of As, Cu, and Mn concentration in the soil of Chongqing city; and (2) to visualize the distribution of the three metals and reveal the spatial distribution law of their concentration in soil with four interpolation methods, namely ordinary kriging (OK), inverse distance weighting (IDW), local polynomial interpolation (LPI), and radial basis function (RBF); (3) to analyze the relationship between heavy metals and environmental factors by CATREG; (4) to study the spatial variation of heavy metal content in soil by ArcGIS hotspot analysis.

Study area
Chongqing is located in the transition zone between the Tibetan Plateau and the middle and lower reaches of the Yangtze River in southwestern China with a longitude of 105°11′-110°11′ E and a latitude of 28°10′-32°13′ N ( Fig. 1). Chongqing's terrain gradually reduces from north to south towards the Yangtze River valley. The northwest and central parts of Chongqing constitute hills and low mountains, while the southeast part stands between Daba and Wuling Mountain.
The main urban area of Chongqing includes six districts Yuzhong, Jiangbei, Dadukou, Shapingba, Nanan, and Jiulongpo in the core urban area and three districts Banan, Yubei, and Beibei in the peripheral metropolitan area. The study area has a monsoon humid subtropical climate with warm winters and hot summers. The annual average temperature is 16.0 -18.0 °C. The average temperature is 5.0 -0.9 °C in January and 28.0-4.4 °C in July [14]. The Yangtze River runs across the central part of the study area and joins with the Jialing River in the north. Due to the complex lithology of the parent rock, the soil in the study area is rich and diverse. It can be divided into 8 types, namely paddy soil, neo-accumulative soil, yellow soil, yellow brown soil, purple soil, limestone soil, red soil, mountain meadow soil, and 16 other subgroups.

Soil sampling and data collection
The soil was sampled in the main urban area of Chongqing in China. According to the industrial technical standards "Specifications for Geochemical Evaluation of Land Quality" (DZ/T0259-2016) and "Digestion of Total Metal Elements in Soil and Sediments by Microwave Digestion Method" (HJ832-2017) of the Ministry of Environmental Protection, grid sampling was carried out with the principle of uniform sampling points at the sampling depth of 0-40 cm in the whole study area. Taking the GPS-located sampling point as the center, radiating 30-50 m around to determine 3-5 sampling points, and totally collecting 342 single samples of topsoil. During sampling, special areas such as roads and ditches were avoided. After removing impurities, 1 g of soil samples was taken according to the quartering method, then bagged and numbered, and sent sealed to the laboratory. Afterwards, the samples were air-dried, homogenized, and screened, i.e., smashed and grinded in a 100 mesh nylon sieve. All of the samples were analyzed in the Chengdu Comprehensive Rock Mineral Testing Center in China. After digestion with HF-HNO3-HClO4, the concentration of As, Cu, and Mg in each sample was determined by ICP-MS (Agilent VISTA, USA). Blanks were added in the measurement process to check the accuracy of the results. In addition, three determinations were made using parallel sampling. According to the technical standard of geological survey "Technical Requirements for Analysis of Eco-geochemical Evaluation Samples (Trial)" (DD 2005-03) of China, the detection limits of As, Cu, and Mn in the samples were calculated by different determination methods. The detection limits were up to 0.01 mg kg −1 , while the relative standard deviation was below 5%. The national soil standard material GSS026 (GBW07455) was used for quality control. The recovery rates of all heavy metals were above 85%.
Based on the study of ecological geochemistry in Chongqing, three representative heavy metals (As, Cu, Mn) were selected from 11 elements (As, Cd, Cr, Pb, Cu, Zn, Ni, Hg, Sb, Mn, Mo) according to the content of soil elements and the degree of pollution to the environment, and the spatial heterogeneity and source analysis were carried out. Air and water quality and population density were derived from the China Statistical Yearbook 2000-2010. In order to more accurately describe the impact of industrial activities on heavy metals, industrial GDP was used to characterize the intensity of industrial activities. Population distribution (2000a) was the result of the fifth census in China. Land use (1:100,000), elevation (DEM) (1:250,000), and slope and traffic data (1:250,000) were taken from the Resource and Environmental Science Data Center. (http:// www. resdc. cn/ defau lt. aspx). Furthermore, soil type (1:1,000,000) and parent material (1:500,000) were taken from China Soil Database (http:// vdb3. soil. csdb. cn/) and China Geological Map, respectively. Standard statistical analysis included the maximum, average, and standard deviation, coefficient of variation, and other to illustrate the soil reserves and trends of As, Cu, and Mn. In order to meet the normality assumption of geological statistical analysis, GS + 10.0 was used for the logarithmic transformation of original data, while the inverse transformation was carried out through the weighted average. In this study, four interpolation methods were compared, namely ordinary kriging (OK), inverse distance weighting (IDW), local polynomial

Soil quality assurance (QA) and quality control (QC)
Strict quality control was ensured during the experiment and samples were analyzed in triplicate. In this study, the elements in soil were determined by inductively coupled plasma optical emission spectrometry (ICP-AES), X-ray fluorescence spectrometry (XRF), and atomic fluorescence spectrometry (AFS). The recoveries of Cu, As, and Mn were 91-104%, 97-105%, and 90-93%, respectively, indicating that the method was highly accurate and that it can meet the requirements of detection and analysis. The relative standard deviation of the elements was less than 6%, which also proves that the method had a high repetition rate and precision. The detection limits of Cu, As, and Mn were 1, 0.2, and 5 mg kg −1 , respectively, and the relative deviations were less than 10%.

Analysis of spatial structure characteristics
Geostatistics is a mathematical method based on the concepts of regionalization variables, random functions, and implication and stationary hypotheses. The semivariogram is a key function for studying soil variability in geostatistics. Usually, the grid sampling method is used to estimate the semivariance of soil properties, which has conditional negative qualitative. The calculation formula is [15]: where N (h) is the number of pairs of all observation points with h spacing (N(h) = n − 1 if there are n sampling points); γ (h) is the semivariance, usually a semivariance function graph in a characteristic direction of h with γ(h).
The semivariance function has three extremely important parameters, namely the block gold value (nugget), range (range), and base value (sill). They represent quantitative indicators of the spatial variation and correlation degree of regionalized variables at a certain scale. Nugget (C 0 ) represents a variation caused by non-sampling point spacing. This type of spacing is a random variation that reflects the spatial variation caused by random factors such as the socio-economic ones [16]. Range (a) reflects the spatial variability of soil properties and analyzes the mobility of variables, i.e., spatial dependence, which is spatially independent outside the range and correlated within the range. The base value (C 0 + C), also known as the top value, refers to the maximum value of semivariance in different sampling intervals. These intervals reflect the spatial variation composed of random and structural variation and caused by natural (such as parent material, topography, etc.) and socio-economic factors (such as fertilization, planting system, etc.) [17]. The purpose of analyzing the spatial structure characteristics of variables was to determine the best fitting model of semivariance, i.e., nugget (C 0 ), sill (C + C 0 ), nugget effect (C 0 /C + C 0 ) and range (a), combine the spatial correlation distances to evaluate the spatial correlation of each attribute, and analyze the variation law, degree, and causes [15].

Ordinary kriging
Ordinary kriging (OK) is based on the theory of regionalized variables and its main tool is the variation function. The advantage of this method is its consideration for the random distribution of sample points in a spatial structure. The accuracy of the estimated value depends on the selection of the weight coefficient, while the optimal weight coefficient depends on the selection of the mutation function model [18]. The variation function is used to calculate the integrity of spatial continuity in one or more directions [19]. To accurately reflect the spatial heterogeneity of heavy metals in soil, their theoretical models were constructed according to the determination coefficient R 2 and residual of semivariance. The linear, spherical, exponential, and Gaussian model can be used to construct the semivariance function of heavy metal concentration in surface soil. The best model out of the four was selected to analyze the spatial structure and provide the input parameters for interpolation [20].

Inverse distance weighting
Inverse distance weighting (IDW) is a simple interpolation method based on Tobler's theorem [21]. Sets that each measurement point is affected locally, which is inversely proportional to the distance. The interpolation is carried out by the weighted average of the measured values of each point near the measured point. According to the spatial autocorrelation principle [22], the weight of the point nearest to the predicted position is larger and the weight is reduced as a function of distance. The formula for inverse distance weighting is as follows: where Z is the estimated value of the interpolation point; Z i (i = 1 ~ n) is the measured sample value; n is the measured sample number used to estimate, D i is the distance between the interpolation point and the i control point; and r is the power of the distance, r = 2.

Local polynomial interpolation
This method is suitable for a polynomial ordered to interpolate all the points in the search neighborhood. The formed surface mainly depends on local variation, which is vulnerable to the influence of the distance between adjacent regions. Therefore, small-range variation in the dataset is most suitable for this method. The formula for local polynomial interpolation is as follows: where x i is the interpolation node; n is the number of samples; Z(X 0 ) is the predictive value of the first sample; R(x) is the interpolation term; and f(x) is the kernel function [23].

Radial basis function
The radial basis function is used to approximate the predicted value of the measured function F = F(x). The purpose of the radial basis function is to construct the approximation function. Compared to other interpolation methods, it is more complex and suitable for interpolation of large amounts of data. The formula of the radial basis function is as follows: where φ(d i ) is the radial basis function; d i is the distance between the interpolation point i and the sampling point x; and f j (x) is the trend function.

Data verification
Prediction accuracy of As, Cu and Mn can be achieved by cross-validation [24][25][26]. The cross-validation method assumes that the content value of each sampling point is unknown and uses the value of the surrounding sample point to estimate it. Afterwards, it calculates the error between the estimated and the actual measured value and evaluates the advantages and disadvantages of the interpolation method according to the error statistical results [27]. The most common indicators are root-meansquare error (RMSE), mean error (ME), and inaccuracy (IP), used to compare the interpolation accuracy of different methods. These indices are calculated as follows: where P i and M i represent the predicted value, measured value, and measured average value, respectively. The cross-validation method is used to evaluate the interpolation error of sample points. However, it cannot reflect the spatial distribution characteristics of interpolation errors. According to ordinary kriging (OK), inverse distance weighing (IDW), local polynomial (LPI), and radial basis function (RBF) interpolation principles and semivariance function fitting parameters, the Geostatistical Analyst module in the ArcMap software was applied to carry out spatial variation interpolation. With the module, the study was able to obtain the map of the spatial distribution map of the heavy metal concentrations in the main urban area of Chongqing.

Influencing factors of heavy metals in soil
The main influencing factors of soil heavy metals are natural factors and human factors. Nine common influencing factors were selected for correlation analysis, including Normalized Difference Vegetation Index, air quality index, slope, soil type, soil parent material, population density, water quality category, road distance and industrial activities. The specific influencing factors are as follows: Normalized Difference Vegetation Index There is an interaction between vegetation index and soil heavy metals. The elements needed for vegetation growth are mainly from soil and parent rock, and soil heavy metals largely determine their growth status.
Air quality index Air quality index comprehensively represents the degree of air pollution or air quality grade, which is an important indicator to reflect the status of soil heavy metals.
Slope The size of slope affects the distribution and flow of surface runoff. Runoff carrying soil particles to the downstream will change the original spatial distribution pattern of soil heavy metal content, and redistribution of soil heavy metals.
Soil type Soil types determine soil properties and have different effects on heavy metals. The contents of clay minerals, oxides and organic matter in soil types are also different, and the adsorption degree of heavy metals on soil is also different, thus affecting the migration behavior and distribution of heavy metals in the surface environment.
Soil parent material Soil parent material is the source of soil minerals. Its mineral composition, chemical composition and mechanical composition (particle size) affect the formation and properties of soil.
Population density Population density has a significant effect on the content of heavy metals in most environmental media. The denser the population is, the greater the intensity of human activities is, and the higher the interference to the natural environment is, which is positively correlated with the soil heavy metal content.
Water environment quality Water environment quality is directly related to soil heavy metal content. The lower the quality of water, the higher the content of heavy metals carried, which has a direct impact on soil. Circulation and irrigation of sewage may lead to heavy metal pollution in soil, especially the soil on both sides of the river.
Road distance Road distance is correlated with soil heavy metal content, and traffic activities may lead to soil heavy metal pollution on both sides of the road. Pollutants are mainly distributed on both sides of the road within 50 m.
Industrial activities The gross industrial product represents the intensity of industrial production activities in the corresponding region, and the exhaust gas, wastewater and solid waste generated by industrial activities are important factors leading to heavy metal pollution in soil. In this study, GDP is used to characterize the intensity of industrial activities.

CATREG
To identify the relative importance of variables affecting heavy metals, CATREG, a non-parametric multivariate regression analysis model, uses an optimal scaling program to scale the dependent and independent variables and analyze the classified and numerical variables. Usually, it is used to test the influence of multiple prediction factors on the dependent variables [28,29]. Classification regression is an appropriate method to identify the factors affecting the concentration of heavy metals in soil. The Austrian scholar Claudia Gundacker [30] conducted bivariate analysis through categorical regression (CATREG) to explain the relationship between Pb and Hg exposure and genetic background. Furthermore, Yang [31] used classification regression analysis to analyze heavy metals in soil for the first time. The results showed that the parent material, soil type, land use type, and industrial activities were the main factors affecting the concentration of heavy metals in the soil in Beijing.
The CATREG model is a classical linear regression model with application and transformation variables. In this study, the classification regression (CATREG) model was analyzed and the discrete variables pre-test measures were inputted in the regression. Category quantization was used to quantify the class variables by a specific nonlinear transformation and then iteratively find the optimal equation.
GATREG is an econometric method that deals with data sets containing nominal, ordinal, and interval variables. In the simple linear regression model, a response variable is predicted from prediction variables m in x, trying to find a linear combination of X*b with the highest correlation. The optimal scaling involved in the feasible nonlinear function analysis between Z and m j=1 b j ϕ j (X i ) is expressed as: The classification variable h j defines the binary indicator matrix G j of n rows and columns l j . h ij defines g ir(j) as follows: where r = 1,2……l j is the running index, indicating the class number in j. If the class quantization can be expressed by y j , then the variable of transformation can be written as G j y j . For example, the weighted sum of the predictor variables can be expressed as: This is the same as the standard linear model. Finally, CATREG is equivalent to a linear regression model, which can be expressed as follows: In the formula, X * represents the coefficient matrix; Z * is the observed value vector; b is the normalized coefficient vector; and ε is the error vector. The qualitative variables are transformed into quantitative ones through the optimal scaling process. Classification variables are quantified to reflect the characteristics of the original category. The use of nonlinear transformations allows the analysis of variables at various levels to find the best fitting model.

Results
Descriptive statistical analysis Table 1 shows the descriptive statistics of As, Cu, and Mn in the study area. The background value of soil heavy metals in Chongqing was obtained from the statistics of geochemical survey data of land quality. The survey range included the main urban area, the western region, the northeast and the southeast of Chongqing. The covered area was nearly 60,000 km 2 , accounting for 72.8% of the total land area of Chongqing, which could basically represent the background value of soil elements in Chongqing [32].The concentration of As in the soil in the main urban area of Chongqing ranged from 1.965 to 21.180 mg kg −1 . For As the difference between the maximum and minimum value was 10.8 times, while the average concentration (5.802 mg kg −1 ) was lower than the background value (6.620 mg kg −1 ) of soil in Chongqing. Among the three elements, the coefficient of variation of As was the highest (55.71%). Furthermore, the concentration of Mn ranged from 107.900 mg kg −1 to 1584.000 mg kg −1 . For Mn the difference between the maximum and minimum value was 14.70 times, while the average value (573.316 mg kg −1 ) was lower than the background value (615.000 mg kg −1 ). The coefficient of variation of Mn (32.21%) was the lowest out of the three elements. Finally, the concentration of Cu was 6.208 to 76.600 mg kg −1 . The difference between the maximum and minimum value was 12.3 times. For Cu the average value (23.992 mg kg −1 ) was not significantly different from the soil background value (24.600 mg kg −1 ) in Chongqing and the coefficient of variation was moderate (35.73%).

Theoretical model of the GS + fitting semivariance function
After the analysis and fitting of the variation function, the best one was selected based on the principle of the maximum determination coefficient (R 2 ) and the minimum residual error (RSS). Table 2 shows the optimal theoretical model and related parameters for semivariance fitting of the three heavy metals. The results show that the theoretical semivariance model of As and Mn was the Gaussian one, while that of Cu was the exponential model. Moreover, spatial correlation was the result of structural (such as parent material, soil type, the climate, etc.) and random factors (such as farming, management measures, cropping systems, pollution, etc.) [33,34].

Comparison of four interpolation methods
Check interpolation prediction accuracy by leave-oneout cross-validation (Figs. 2, 3, 4). Different scatter patterns indicate that different methods can predict different values at the same point [35]. The linear model and 1:1 line intersect with As, Cu, and Mn concentrations. On the cross line, the linear model is higher than the concentration of As, Cu, and Mn, and vice versa. This method aims to achieve unbiased estimation of the mean value [36,37]. It can be seen from Figs. 2, 3, 4 that the IDW method had the largest correlation coefficient between the predicted and measured values for As, followed by LPI, RBF, and then OK. Similarly, the IDW method had the largest correlation coefficient between the predicted and measured values for Cu, followed by LPI, RBF, and OK. Likewise, the IDW method had the largest correlation coefficient between the predicted and measured values for Mn, followed by OK, LPI, and then RBF. In general, the correlation coefficients between the predicted and measured values of the OK method were between 0.2854 and 0.3186, of the IDW were between 0.3365 and 0.4384, of the LPI were between 0.2570 and 0.3949, and of the RBF were between 0.2325 and 0.3923. Overall, the correlation coefficients of As, Cu, and Mn in the IDW were higher than those in the OK, LPI, and RBF method.
To compare the accuracy of the four interpolation methods, their mean square error (RMSE), mean error (ME), and inaccuracy (IP) were calculated. The smaller the value of RMSE, the higher the accuracy, the closer the ME to 0, the smaller the interpolation error, and the smaller the IP, the higher the interpolation accuracy. Table 3 shows the cross-validation results of the four interpolation methods for As, Cu, and Mn. The RMSE of the four interpolation methods was significantly different for the three elements, indicating that their predicted values were overestimated. For the same element, there is little difference in the degree of overestimation by different methods. Furthermore, the ME of OK for As was closer to 0, signifying a relatively non-biased predicted value. The difference between the ME of LPI and RBF for Cu were small and closer to 1, respectively, indicating that the LPI and RBF can better predict the concentration of Cu in soil. Unlike the ME of other methods, that of LPI for Mn was closer to 1, resulting in good prediction accuracy.
The maps of the distribution of the three heavy metals under the OK, IDW, LPI, and RBF methods were very  (Figs. 5, 6, 7). However, the boundary between the polluted areas in the transition zone (from high to low concentration) was uncertain. Under the four different interpolation methods, As had different distribution characteristics. The prediction range of As was 1.965-21.180 mg kg −1 and the average values predicted by the four methods were 5.901 mg kg −1 , 5.947 mg kg −1 , 5.816 mg kg −1 and 6.095 mg kg −1 , respectively.
For Cu, the predicted range is 6.208-76.600 mg kg −1 , and the average value predicted by OK method is 23.995 mg kg −1 . Furthermore, the highest values of Cu predicted by the OK method were distributed in the north and west of the study area. In particular, high values were distributed in the largest areas in the north and west, while low values mainly concentrated in the south. The average value predicted by IDW method was 24.010 mg kg −1 , and the values predicted by the IDW method better reflected the local information within a smaller area. Moreover, the average value predicted by LPI method was 23. 977 mg kg −1 , and the values predicted by the LPI method were more concise and concentrated and reflected a wide range of trends. The average value predicted by RBF method is 24.365 mg kg −1 . Although similar to the OK method, the RBF method omitted some local information within small areas and failed to reflect the transition between high and low value areas.
For Mn, the predicted range is 107.900-1584.000 mg kg −1 . The average value predicted by LPI method is 573.316 mg kg −1 and the values of Mn predicted by the LPI method reflected the same characteristics as for Cu. However, the smoothing effect was too obvious to accurately reflect point source and small-scale non-point source pollution. As a result, its interpolation results were not as detailed as for the other three methods. The average value predicted by OK method was 599.219 mg kg −1 . The similarity between the OK and the LPI method was high, but the change trend of OK was more obvious. The average value predicted by RBF method was 620.556 mg kg −1 . Next, the RBF method embodied the pollution situation in a small range in the whole research area in detail, and the performance content is more detailed. Finally, the average value predicted by IDW method was 592.473 mg kg −1 , and the IDW method exhibited the characteristics of point source pollution in areas with high concentration. The change trend of the concentration was not obvious, but the degree of pollution was more accurate.

Distribution and spatial variation
Based on the spatial distribution map of the concentration of As in the surface soil in Chongqing (Fig. 5), it can be seen that the concentration of As in the whole study area shows a decreasing trend from periphery to center, which is generally banded and directed from north to south. In previous studies, Jiang [48] investigated eight kinds of trace metals in surface water along the Yangtze River in Chongqing, China and showed that the content of As was affected by the wastewater discharged from urban activities. Wang [49] studied heavy metals in sediments from the riparian zone of the Three Gorges Reservoir (TGR) over the past 10 years. The results showed that As concentrations were significantly higher than those in soil. He pointed out that the enrichment of heavy metals may be due to formation mechanisms and

Source identification of heavy metals in soil
The R 2 values of the fitting models of the three heavy metals were between 0.449 and 0.734, so all the fitting models passed the F test (P < 0.05) with statistical significance. Through the collinearity diagnosis, the variance inflation factor of all variables was 1.160 -2.263 (VIF < 3), while the tolerance difference was 0.442 -0.862. The tolerance was high enough to ensure the elimination of multiple collinearity problems. For the same heavy metal, the regression coefficients and the visibility of different influencing factors were different (Table 4). Furthermore, beta was used to compare the absolute effect and contribution of various coefficients. The results of the regression coefficient test for the source of each element were significant. Based on the explicitness check, the relative importance measure helped in explaining the contribution of the CATREG model to the regression.
Compared to other values, numerical values indicate the importance of the influencing factors ( Table 5). The factors that significantly affected the concentration of As in soil include the water environment (0.453), soil type (0.236), and slope (0.177). Those affecting the concentration of Cu in soil were population density (0.539) and parent material (0.207), while those affecting the concentration of Mn were slope (0.206) and parent material (0.201).

Spatial distribution of hotspots
Due to the importance of different environmental factors for the distribution of heavy metals (Table 5), the mapping and analyses of hotspots was carried out according to the maximum number of factors influencing each metal. Based on the hotspot analysis of heavy metals in soil, the study revealed the corresponding relationship between the spatial distribution of the metals and the environmental factors (Figs. 8, 9, 10).
The highest hotspot value of As (99% confidence interval) was mainly distributed in the north and west of the study area. The two high hotspot values for the same element (95% and 90% confidence interval) were located in the northern and western parts of the study area. However, As was mainly concentrated in the eastern parts adjacent to the secondary water environment. Overall, As had mostly coincided with the secondary water environment. Next, Cu hotspots were mainly distributed in the northern, western, and eastern regions. The highest hotspot values were mostly located in the areas with population density > 3000/km 2 and 1000-3000/km 2 . Lastly, Mn hotspots were mostly distributed in the west, north, and east. The distribution was influenced by the slope (0.206) and the diffusion trend shifted from center to periphery.

Comparative analysis of heavy metal concentration in soil
Compared to the standard background value of heavy metals in soil in Chongqing, the concentration of heavy metals in the soil in the main urban area of Chongqing is generally high, indicating that the soil in the area poses certain ecological risks [38]. In this study, the average concentrations of As, Cu, and Mn in soil were lower than the background values of the soil in Chongqing. However, the rates over the standard of As, Cu, and Mn were 28.07%, 33.63%, and 33.92%, respectively. Among them, Cu and Mn had rates higher than the standard. Previous research shows that six heavy metals reached a mild or higher level of pollution in the five rivers in the main urban area of Chongqing, with Cu being the most serious pollutant [38], which is consistent with the pollution status of Cu in this paper. Compared to other cities in China and the world (Table 6), the concentration of the three heavy metals in the main urban area of Chongqing were lower than those in three counties in South Shanghai, in the northern part of Hangzhou Bay [39], Xiangfen County, Shanxi Province [40], and other places. In addition, Table 6 indicates that the average concentration of As in the study area was lower than that in the urban soils of East St. Louis, IL [41], the Mediterranean [42], and South Poland [43]. Therefore, the overall pollution level of As is not high. The average concentration of Cu in the study area was higher than that in Ostrava [44], Eglinton County [45], and South Poland [43]. However, the ecological risk of Cu is not high. The average concentration of Mn in the study area was at a higher level than that in other areas, which indicates that the area needs to be monitored.  . 5 The spatial distribution of As when using different spatial interpolation methods Fig. 6 The spatial distribution of Cu when using different spatial interpolation methods Fig. 7 The spatial distribution of Mn when using different spatial interpolation methods human activities, and may pose risks to the environment and human health. Song [50] studied the temporal and spatial variations of As, Cd, Co, Cr, Cu, Ni, Pb and Zn in sediments and suspended solids from 75 sites along the Yangtze River during the dry and flood seasons, indicating that As has high I-geo values in sediments and suspended solids of the Yangtze River. This indicates that the distribution and distribution of As are affected by river water sources, human activities, etc. Figure 6 shows the spatial distribution of Cu in the study area under the four interpolation methods. Cu was mainly distributed in the north and west of the study area in clusters and blocks. Large high value areas were distributed in the north and west, while the middle and low value ones were mainly concentrated in the south, but less in the southeast. According to the variogram, Cu has moderate spatial correlation and is greatly influenced by human factors, which is consistent with previous studies. Based on the regression tree analysis and a GIS-based overlay model, Liu [51] studied the cumulative risks and effects of Cd, Cu, Pb and Zn in suburban soils of Beijing. It shows that the accumulation risk of soil Cu is high in densely populated and highly urbanized areas. Acosta [52] found that the metal concentration in dust increased with the increase of population density, and dust was more likely to migrate Cu from soil than soil. Zhou [53] collected and analyzed eight heavy metals (Cd,  Furthermore, the concentration of Mn was low in the southeast of the study area, but high in the north, east, and west (Fig. 6). Overall, the concentration gradually increased from the southeast to the north and from east to west with good continuity. Dubovik, DV [54] studied the effects of slope aspect and slope gradient on the content and distribution of heavy metals (Mn, Cu, Zn, Co, Ni, Pb and Cd) in Haplic Chernozems profile in Kursk, and slope aspect was found to be a significant factor controlling the distribution of most of the bulk, mobile, and acid-soluble compounds of Mn. Similar to this study, slope factor is an important contributor to the content and distribution of Mn in this study.

The comparison of spatial prediction methods
Under the premise of using the optimal parameters and fitting model for each interpolation method, each method showed different degrees of the prediction error based on the cross-validation results from Table 3. In this study, the IP value of the OK method was smaller than that of the other three methods for As, which points to possible advantages. The RBF method had the highest value and obvious disadvantages among the four methods. For Cu, the difference between the OK, LPI, and RBF method was small and the difference in the prediction accuracy was small between the three methods, as well. It is difficult to directly judge the advantages and disadvantages of the interpolation methods based on the results of cross-validation. For Mn, the IP value of the OK method was the smallest among the four methods, showing a certain prediction advantage. The IP values of the IDW and RBF method were close and higher than those of the OK method.
The range and average value of the prediction of the same element are different. For As element, among the four prediction methods, the value predicted by LPI method is the smallest, which is 5.816 mg kg −1 , and the average value predicted by RBF method is the largest, which is 6.095 mg kg −1 , indicating that different methods have different degrees of prediction trend. Among the four prediction methods for the prediction of Cu, the maximum average value was 24.365 mg kg −1 predicted by RBF method, and the minimum value was 23.995 mg Fig. 9 The spatial distribution of Cu hotspots under the influence of population density Fig. 10 The spatial distribution of Mn hotspots under the influence of the slope kg −1 predicted by OK method. The average value of Mn in the four prediction methods was significantly different, the maximum average value was 620.566 mg kg −1 predicted by RBF method, and the minimum value was 573.316 mg kg −1 predicted by LPI method. Thus, in different element prediction, RBF method is more prominent in different prediction methods.
By analyzing the spatial distribution characteristics of element concentrations in Figs. 5, 6, 7, the OK and LPI methods obtained a smooth surface [55]. However, this result cannot reflect well the information of local point source pollution. Likewise, the smoothing effect was more obvious for Ok and LPI than for the remaining two methods. The OK method is a commonly used interpolation method that is based on the structural characteristic of elements in order to determine the influence weights of the real value on the predicted value in the process of predicting element [56]. The OK method also has a strong smoothing effect that cannot express the information within a small range in detail. This is obvious in the spatial distribution characteristics of the concentration of Mn. Some scholars believe that the OK method can exhibit a strong smoothing effect in areas with a great variation of the element content and poor spatial autocorrelation [57]. This may be because the OK method compresses the change range of data and shows a strong smoothing effect after logarithmic transformation of the element concentration with poor spatial autocorrelation. Furthermore, the LPI method uses the square method the least to fit the spatial distribution trend of element concentration. It also tends to obtain a smooth surface [55], but cannot reflect the information of local point source pollution. The IDW and the RBF methods determine weights based on distance and the local smoothing trend, respectively [56].
Moreover, the IDW and RBF methods belong to deterministic interpolation. This means that the true value at a sample point is equal to the predicted value, so the interpolation results greatly retain the maximum and minimum information of element concentration [58]. Gotway et al. [59] found that the interpolation accuracy of the IDW method was higher than that of the OK method, which was also consistent with the characteristics of the IDW method in this study. Yasrebi et al. [60] analyzed the interpolation accuracy of chemical properties in soil and found that the OK method was better than the IDW.

Analysis of influencing factors and sources
Previous studies have found that the distribution of heavy metals in soil had an obvious spatial relationship with environmental factors, especially population density, soil type, and parent materials [61]. According to Table 5, the factors that significantly affected As, Cu, and Mn in the main urban area of Chongqing were the water environment (0.453), population density (0.539), and slope (0.206).
Water is an important carrier of heavy metals. The Yangtze and Jialing Rivers crisscross the study area and represent the core of human survival and industrial development. Surface runoff and industrial and agricultural production of soil in areas with drinking water can release a large amount of heavy metals into the environment, resulting in the abundance of heavy metals in soil and water [62,63]. As in the soil of the main urban area of Chongqing is mainly distributed through the secondary water environment in three clusters in the east of the study area. Ezekwe [64] explored the relationship between heavy metal pollution and groundwater in the Shijiao mining area in southeastern Nigeria and concluded that there was a close relationship between heavy metal pollution and water quality. Some heavy metals in the river sediments mainly came from the leaching of the surrounding soil. In this study, the eastern downstream area was abundant in As. Presumably, this is a common  [42] occurrence for heavy metals because they move with the river and leach into the surrounding area. Population density is one of the most important sources of heavy metals in soil [65]. Variation in population density is closely related to the accumulation of Cu in soil. Based on the analysis of physical and chemical properties of Cu and its total and chemical forms in soils of cities in southeast of Spain with different population density, the results showed that the behavior of Cu in dust and soil was affected by physical and chemical properties and total concentration [66]. Heavy metals in road dust were investigated in Rafsanjan in Iran. The results of the evaluation of the enrichment factor (EF) showed that Cu was the most concerning pollutant. It was found that human factors, such as industrial and chemical activities and heavy traffic, caused higher levels of Cu [67]. H. Khadem [68] proposed that there was a strong correlation between the concentration of Cu and particle size. It was found that the particle size dependence and metal abundance in street dust were higher than those in urban soil, indicating that anthropogenic pollution had a high contribution to soil heavy metals. This study shows that the concentration of Cu in the soil in the main urban area of Chongqing is closely related to high population density.
In the study area, mountains and slopes are cut into a large number of urban areas. Slope is particularly important for the source analysis of heavy metals. The research of Wei [69] and others shows that the concentration of available manganese in soil varied greatly due to different terrain conditions. The hotspots of Mn in soil distributed along the slope direction and diffused from center to periphery. It shows that under the influence of long-term soil erosion, slope runoff, slope settlement, and other factors, there is a certain degree of heavy metal accumulation in soil [70].

Conclusion
The purpose of this study is to evaluate the prediction effect and spatial heterogeneity of As, Cu, and Mn in the main urban area of Chongqing using different interpolation methods and to reveal the influencing factors of heavy metal pollution in soil through source analysis of those metals. The results show that the overall interpolation error of the three interpolation models was large, while the difference between the methods for predicting the same element was small, which is expected to be caused by strong human activities and prominent spatial changes of the urban environment. Based on the crossvalidation evaluation of the accuracy of interpolation results, the accuracy of the OK method was higher than that of the other three methods for As. For Cu, the difference in accuracy between the OK, LPI, and RBF methods was small. For Mn, the OK method had a certain prediction advantage. Generally speaking, the LPI and OK methods show strong smoothing effects, but the IDW and LPI methods can better reflect the extreme value information and local pollution situation. This shows the necessity of using different methods in studying the spatial distribution of soil properties.
The hotspots of As were mainly concentrated in the area of the secondary water environment. For Mn, they were blocky, distributed along the direction of the slope, and spread from the center to the periphery. The hotspots of Cu were concentrated in areas with population density > 1000/km 2 . Therefore, the concentration of heavy metals in the soil in the main urban area of Chongqing may continue to increase in the future due to intense human activities. According to different elements and research needs, appropriate interpolation methods should be selected for prediction research. Likewise, it is suggested to further the research on the main sources of heavy metal pollution in soil so as to formulate corresponding strategies and measures.