Assessment of the genetic variations of sago palm Metroxylon sagu in three regions of Sarawak, Malaysia using amplified fragment length polymorphism (AFLP) marker

Sago palm (Metroxylon sagu) is an important staple crop in the rural area of Sarawak, Malaysia. The palms grow well in the rough, swampy peat, from low flooded areas to uplands and acidic to neutral soils. Among the starch-producing crops, this palm is the most productive and promising, where it can store a significant amount of carbohydrate in the trunk. Due to the lack of molecular study of sago palm, the work described here aimed to develop molecular markers for identifying M. sagu and accessing the genetic variations of sago palm in different locations of Sarawak. Amplified fragment length polymorphism was employed to determine the genetic variations of sago palm between fifteen populations collected from three different locations: Mukah, Pusa and Samarahan. Based on 35,016 polymorphic fragments generated from 32 primer combinations, polymorphic information content (PIC), marker index (MI) and resolving power (RP) averaged 0.269, 0.005157 and 51.52039 per primer combination, respectively, were obtained, signifying the effectiveness and accuracy of the AFLP marker. Mukah showed higher diversity (I = 0.428, H = 0.232), while Pusa showed the lowest (I = 0.367, H = 0.240). The average value of the Shannon information index was 0.404. AMOVA showed 99% of the variation was found within the population. The Mantel test was performed with 9999 permutations. A significantly positive correlation (r = 0.220, p = 0.010) was observed between the genetic divergence of the population (Nei genetic distance) with the geographical distance. Our study considered AFLP is as an effective and reliable method to assess the genetic variations and the identification of M. sagu.


Background
Sago palm (Metroxylon sagu Rottboel) belongs to the family Arecaceae. Among three subfamilies in the Aceraceae family, fourteen genera are known to produce starch in the trunk, and the genus Metroxylon is the most valuable and promising in producing starch. The distribution of the genus Metroxylon is from South East Asia to Melanesia, Micronesia, and Polynesia [1]. It is divided into Metroxylon (Eumetroxylon) and Coelococcus [2]. Only one species M. sagu Rottb. called the true sago palm, is recognized in the section Metroxylon. Sago palm grows in zone 10 north and south of the equator at the Malay Peninsula extends across Southeast Asia, in Thailand, Philippines, Malaysia, Brunei, Indonesia, and north-western Melanesia, Papua New Guinea (PNG), and the Solomon Islands regions. In Malaysia, the sago palm can be found in Sarawak, Sabah, and Johor and is cultivated mainly in Mukah and Pusa in Sarawak. Sago palm is highly adaptable to the environment and can grow from low flooded areas to uplands and acidic to neutral soils, where other major crops cannot grow well. However, sago palm will grow better in upland conditions with sufficient water and show preferable production than under submerged or flooded conditions. The wild stands of sago palm grow near lakes, on riverbanks, and in wet soil. They can be found up to an elevation of about 1000 m above sea level in PNG and about 700 m in North Sulawesi and can primarily be found growing in swamps and peat moors, which are unsuitable for any cash crop [3]. Nowadays, sago palms are found growing in lowland swamps which are not suitable for agriculture development, which does not mean that these places are the preferred habitat for sago palms [4]. Sago palm can grow well in a humid tropical environment with temperatures of 25 °C and above on average with other tolerations such as air humidity of 70% salinity and incidental light [5].
Genetic diversity is one aspect of biological diversity that is extremely important for conservation strategies [6]. The genetic variations in plants affect the higher level of biodiversity. Without genetic diversity, the population cannot adapt and survive environmental changes. Genetic variation has an impact on the higher level of biodiversity. Analysis of genetic variations between or within different species, populations, and individuals is essential for germplasm evaluation and conservation [7]. While sago palm is considered important in the region of Sarawak, it was given less attention in scientific study, especially in molecular biology. Therefore, limited information on the genetic diversity of sago palm can be found. Increasing utilities of sago palm need genetic diversity information within the species and population level. The sago palm's genetic diversity studies are essential for future breeding programs and germplasm conservation. The relationship between the genetic distance and geographical distribution of sago palms with a high level of diversity was found in the Malay Archipelago and Papua New Guinea using the RAPD marker [8]. The genetic and morphological variations of sago palm in PNG using AFLP marker show no correlation between different morphological characteristics and genetic distances [9]. In Indonesia, molecular markers of sago palms show high variations based on RAPD and polymorphism level and genetic diversity calculation [10]. Later, Wx gene markers were used to assess the genetic variations and genetic relationship of sago palm in Indonesia to show higher variations observed in levels of individual sampling stages, followed by population and the island sampling stages [11]. Wx genes in sago palm show a high polymorphism level, with fourteen genotypes, high diversity, and high heterozygous value.
The amplified fragment length polymorphism (AFLP) technique is based on selective amplification of restriction fragments from a total digested genomic DNA to generate and compare unique fingerprints for the genome of interest. AFLP applies to all plant species and is employed in studying genetic variation within species or closely related species [12] to interfere with population-level phylogenies and biogeographic patterns to generate genetic maps [13] and determine relatedness among cultivars. AFLP is used in the genome-wide screening of genetic diversity and can be applied to almost any organism [14,15]. Besides, AFLP is also applicable for characterizing cytosine methylation levels and target sites [16]. The rapidity and a large amount of data generated by this approach and robustness and repeatability make AFLP analysis a commonly used tool in population genetic and ecological studies [15,15].
The AFLP markers offer several advantages over other DNA molecular techniques. The AFLP can quickly generate many fragments from any organism without prior knowledge of the genomic sequence and requires a small amount of starting template [16], hence having a relatively low start-up cost. It is highly reproducible as it combines both the reliability of RFLP and the flexibility of other PCR-based techniques. AFLP markers sometimes display dominance and are labour-intensive; despite this, they can be easily multiplexed to amplify the same batch of hundreds of genomic fragments from different individuals.
In this study, molecular markers were developed to identify M. sagu. The AFLP markers were used to evaluate the genetic variations of M. sagu among and within 15 populations collected from three different locations. The obtained information will contribute to the conservation and maintenance of such genetic resources.

Polymorphic analysis
The 32 selective AFLP primer combinations produced a total of 35,016 DNA bands from 120 representative samples of 15 populations of M. sagu. The number of bands per primer ranged from 358 to 1592 with a polymorphic frequency of 100% with an average of 1094.28. The number of loci of 15 populations ranged from 716 to 908, with the polymorphic frequency from 66.54 to 84.48%, with an average of 78.36%. The maximum value existed in population Mukah 5 and the minimum value in population Pusa 1 ( Table 1).

Marker attribute Polymorphic information content
The powerfulness and discriminatory power of the informative AFLP profile mainly depend on three parameters; polymorphic information content (PIC), marker index (MI), and resolving power (Rp). The PIC determined for each primer was calculated, and the PIC value for the polymorphic fragments ranged from 0.096 to 0.350 with an average of 0.2678. All the primer combinations presented high discrimination power. The primer combination 56-ROXN/E-ATG/M-GTC was the most informative, with the highest PIC value of 0.350. The lowest PIC value, 0.096, was recorded in the primer combination 5HEX/E-AAT/M-GTC (Table 2).

Marker index (MI)
Marker index (MI) is a feature of marker diversity that represents the product of the effective multiplex ratio (E) and the arithmetic mean heterozygosity (Havp); therefore, the MI value for all the primer combinations was calculated. The MI value ranges from 0.000361 to 0.008072 with an average of 0.005157. The highest value was obtained with the primer combination of /56-ROXN/E-ATG/M-CAC and the lowest value for primer combination 5HEX/E-AAT/M-GTC (Table 2). A strong positive correlation was observed between MI and PIC value and was statistically significant (r = 0.9488, r 2 = 0.9002, p = < 0.00001).

Resolving power (RP)
Resolving power (RP) indicates the discriminatory potential of the marker combination to distinguish between large numbers of genotypes. The resolving power ranged from 17.46667 to 99.33333 with an average of 51.52039. The highest value was scored with primer combination/56-ROXN/E-ATG/M-CAA and the lowest for the combination 5HEX/E-80.

Genetic diversity and genetic structure
The genetic diversity parameters analyzed in this study are presented in Table 3. The genetic diversity analysis  Non-hierarchical analysis of genetic variability was performed using molecular variance (AMOVA) analysis to know the variation among and within different populations from three regions. The analysis indicated that the variation within populations accounted for 99% of the total variation, while among regions, variations contributed for only 1%. There were no variations among populations ( Table 4). The 1% variation among regions was due to Pusa populations. There were no variations between Mukah and Samarahan. The AMOVA results indicate that the differentiation mainly existed within the population. The source of variation among the region's PhiRT value was significantly different (p ≤ 0.001). The source of variation among populations within a region (PhiPR) and within individuals (PhiPT) was significantly indifferent. Fst calculated by AFLP-SURV as an indication of population differentiation was significantly indifferent (p > 0.05). The total Fst value was -0.0491, which indicates no genetic difference between the populations of M. sagu in Sarawak, except a minimal variation in sago palms of the Pusa region ( Table 4).
The pairwise population of Nei's genetic identity and distance shows a high genetic identity of 0.993 and a low genetic distance of 0.007 among all populations.

Cluster analysis
Genotyping data obtained from all the polymorphic fragments from 32 primer combinations were used to estimate pairwise similarity comparisons among the fifteen populations. A distant genetic matrix was calculated using Nei genetic distant coefficient using the GenAlex program. Therefore, the Nei genetic distant matrix was used to construct the unweighted pair group method arithmetic (UPGMA) dendrogram visualized through the Xlstat program. Nei's genetic distance coefficient value between the populations was in the range of 0.006-0.014, which showed a very low genetic divergence among different populations of M. sagu. The genetic distance-based UPGMA dendrogram ( Fig. 1) showed two distinctly major clusters at a distant coefficient of 0.010. Clusters I comprised four populations from Pusa locations. Cluster II contains all other populations from all three regions. As a whole, for clustering results, most of the populations cluster together, representing a closer relationship between all the three populations. Although four among five populations from Pusa cluster together in cluster I.
A genetic distant matrix obtained based on Nei's distant coefficient was also subjected for principal coordinate analysis (PCoA). As shown in Table 5 and Fig. 2, the plot of the first principal axes represents 45.53% of the variation. The second and the third principal axes represent 11.74% and 9.09%, respectively, giving a cumulative variation of 66.36%. Populations in diverse groups were intermixed, and PCoA failed to resolve accurately according to their geographic origin. The Mantel test (Fig. 3) was performed with 9999 permutations was performed. A significantly positive correlation (r = 0.220, p = 0.010) was observed between the genetic divergence of the population (Nei genetic distance) with the geographical distance.

Discussion
High-throughput AFLP markers combined with a DNAbulking approach were used to develop the molecular marker and evaluate the genetic variation of M. sagu. The AFLP technique is a highly reproducible method, which combines the reliability of RFLP and the power and sensitivity of the PCR method [18] to produce highly polymorphic markers. AFLP has been successfully used to investigate the genetic diversity and phylogenetic relationship between different species and closely related genotypes [12].
In the case of M. sagu, relatively less attention has been given to it. Therefore, a few studies have been conducted to assess molecular diversity using different markers. Sago palm shows genetic variations using RAPD markers in the progenies derived from natural pollinations [19].

AFLP profiling and marker polymorphism
The efficiency of any molecular marker technique depends on the amount of polymorphism detected [20]. In the present study, high-quality marker profiling was obtained using 32 AFLP EcoR1 and Mse1 primers combinations on 15 M. sagu samples collected from different locations. These AFLP markers generated 100% polymorphism, which was confirmed by a DNA profiling and genetic diversity analysis. All the primer combinations used were found polymorphic and generated 35,016 differently sized alleles with an average of 1094.25 per primer across all the populations of M. sagu. This study of M. sagu using AFLP markers showed more number of alleles as compared to the previous studies of M. sagu using RAPD  ( Random amplified polymorphic DNA), in which 83 number polymorphic fragments were reported [10], and the average polymorphism of 93% was reported [8], indicating that AFLP markers are useful in the genetic variation study of M. sagu. The number of loci of 15 populations ranged from 716 to 908, with  All these results indicated that high polymorphism existed in the population of M. sagu. The percentage of polymorphic markers become reliable when a large number of loci are generated [21,22]. The most significant advantage of this technique is to generate a large number of polymorphic DNA fragments, as the accuracy of measurements of genetic distance increases with the number of loci used [23]. In a study, twenty EcoR1 + Mse1 primer combinations were used in 13 Argan tree populations, which showed 100% polymorphism [24]. The genetic diversity of 127 individuals from endangered species of Glehnia littoralis using ten pairs of AFLP primers generated a total of 1929 bands with 100% polymorphism [25]. In comparison, the present study detected a high polymorphism and provided higher resolution loci. Highresolution loci were generated using an electrophoresis system with different fluorescent-labelled primers and laser detectors in ABI genetic analyzer, which have higher detection power than the ordinary PAGE system.

Discriminatory power of AFLP primer combination
The discriminatory power of AFLP markers to assess the genetic diversity has been identified by several parameters, such as PIC, RP, and MI [26,27]. However, the PIC value has been used predominantly to determine the discriminatory power or informativeness of markers in a majority of diversity studies [28]. The concept of Rp (resolving power) was proposed to evaluate the discriminatory power of the AFLP primer combination [29]. MI (Marker Index) is used to calculate the efficiency of markers [30].
The PIC value of different fragments obtained by different primer combinations was 0.096-0.350 ( average = 0.269). The maximum value of PIC for the dominant marker such as AFLP can be expected below 0.50 [20]. The PIC value, which we found in our study, is considered informative. MI together with PIC value has been used to assess the informativeness of AFLP primer combinations in various plant species, e.g., [31], Orchardgrass (PIC = 0.252, MI = 16.32, Elymus tangutorum ( 0.250, MI = 23.07) [32]. Our study found that the MI values were in the range of 0.000361 to 0.008072 (average = 0.005157). Resolving power was used to assess the discriminatory power of different AFLP combinations. The primer combination used in our study showed the RP value in the range of 17.46-99.33 (average = 51.52039).
These indexes have been used to compare AFLPs with other molecular markers. A strong linear relationship has been found between the capability of primer to discriminate genotypes and the resolving power (Rp), but not the marker index (MI) [33,34]. In our study a strong positive correlation was found between PIC and MI (r = 0.9488, r 2 = 0.9002, p = < 0.00001), also there was a positive correlation between MI and RP (r = 0.4883, r 2 = 0.2384, p < 0.004606).In a study, it was reported that a strong correlation was found between MI and RP (r 2 = 0.99, p < 0.005), which signifies that the discriminatory power of primer combinations could also be determined by [35,36]. These results indicate that the AFLP markers are highly reliable, discriminative, and powerful markers. Markers with ROX fluorescent dye have proven to be best to determine the informativeness compared to other combinations.

Genetic diversity and genetic structure
The Shannon index is a precise alternative measure of diversity due to no need to estimate allele frequencies under Hardy-Weinberg equilibrium. The Shannon index may vary from 0 to 1, and lower genetic diversity is represented by a value closer to zero [37]. In this study, the mean Shannon's Index (I) (0.404) was higher in other plants using different molecular markers. Elymus sibiricus germplasm from Western China showed I = 0.2850 using SCoT marker [38] and I = 0.237 using EST-SSR marker. The genetic diversity and the population structure are revealed by environmental factors, such as geographic distribution, life cycle, selection, and adaption [39]. Considering the genetic differences among studied regions, the populations from Mukah showed higher diversity (I = 0.428, H = 0.282), while the Pusa population showed the lowest (I = 0.367, H = 0.240).
The genetic diversity of the population is a foremost ability to adapt to a variety of changing environmental conditions [40]. However, in this study, AMOVA analysis showed that the major proportion (99%) of the total genetic variation of M. sagu was found within a population rather than among populations and geographical regions. Even though it accounted for only 1% of the variation, the variation between regions was statistically significant. In the previous study on sago palm using cpDNA markers, the highest percentage of the population (95%) was also observed within a population rather than among population and island [41]. AMOVA analysis also showed 2% variation among regions between Pusa and Mukah and 1% variation among regions between Pusa and Samarahan, while there is no variation among regions between Mukah and Samarahan. The PhipT / FsT value is very low (0.), equal to zero. This indicates no genetic difference between the population of M. sagu in Sarawak. This result is also supported by the low Nei's genetic distance (0.007) between all the fifteen populations and high Nei's genetic similarity (0.993).
The cluster analysis using the UPGMA approach grouped fifteen M. sagu populations into three groups. Cluster I grouped three populations from all the three locations, pop 7 from Pusa and population 11 from Samarahan together in one subgroup and population 2 from Mukah separately in another subgroup. Cluster II grouped eight populations from two locations, and four populations were from Mukah and four from Samarahan. Cluster III grouped all the four populations from Pusa together. The UPGMA dendrogram based on Nei's genetic distance did not show a clear separation of the 15 populations of sago palm from three different locations. This result was supported by a study by [1] in which the M. sagu species cluster into one group. The PCoA, which explained 66.36% of total variation on three components, also supports the UPGMA analysis, which group the different populations together. This also indicates no genetic difference between the different populations of M. sagu in Sarawak. In a study of M. sagu in Papua, 14% of the total variation was accounted on two principal components and PCoA, which indicated that the accession from the same locality showed a weak tendency to group together [9]. The stepwise Mantel test indicates a positive correlation between genetic and the geographic distance of the M. sagu in three regions of Sarawak [8,9] also showed a strong connection between genetic distance and geographic distance. According to AMOVA and UPGMA analysis results, the Pusa population slightly differs from Mukah and Samarahan, and the Pusa population is closer to Samarahan than Mukah. This study shows that AFLP markers can be considered a powerful and reliable method to assess the genetic variation and identify the M. sagu. The high level of diversity within population and the low level of genetic diversity among the populations occurred probably by the limited sample size. These observations showed that the site of within population should be the main focus for conservation and sustainable rather than the site among population and regions.

Conclusions
This study aimed to develop molecular markers and study the genetic variation of M. sagu from different locations of Sarawak (Mukah, Pusa and Samarahan). Prior to the marker study, evaluation of the DNA extraction method showed that the CTAB method is best for the isolation of pure DNA for the M. sagu. These AFLP markers generated 100% polymorphism, which was confirmed by a DNA profiling and genetic diversity analysis. All the statistical analyses, including AMOVA, mental tests, UPGMA, PCoA, showed genetic diversity and suggested that the sago palm in Sarawak shows no difference genetically. They belongs to the same M. sagu species. This study showed that amplified fragment length polymorphism (AFLP) is considered the most powerful and suitable molecular marker for studying the genetic variations of M. sagu.

Sample collection
In this study, the plant material chosen was the leaf sample from sago palm (M. sagu). Leaf samples were collected from five different localities in each Pusa, Mukah and Samarahan. GPS reference WGS 84 datum for each palm sampled and the phenotypic features of leaf samples is mentioned in Table 6. The sago plantation from Mukah, Pusa and Samarahan is shown in Fig. 4. All the samples were taken in the Rosette stage of the sago palm. The leaves were collected from the third latest developed frond of the sago plant, wiped with 70% ethanol, and kept immersed in liquid nitrogen in a 50 mL centrifuge tube. The samples were removed from liquid nitrogen in the lab and kept at −80 °C until needed for processing.

Sample preparation
The sago leaves were taken out from a −80 °C freezer, washed with distilled water, and wiped with 70% ethanol. The midrib of the leaves was removed, weighed, and cut into small pieces using sterile scissors. 1.6 g of leaf sample was grounded into a fine powder using motor and pestle in the presence of liquid nitrogen. The powder was transferred into a 50 mL polypropylene Falcon tube and put on ice until used from DNA isolation.

DNA extraction
The Genomic DNA was extracted and isolated using the CTAB extraction method with some modifications. A total of 3 replicates for each sample were carried out prior to genomic DNA extraction. Recovered DNA was checked for purity and integrity using a bio-photometer and standard 1% gel electrophoresis with ethidium bromide staining. The DNA concentration was adjusted to 500 ng/5.5 μl volume for polymerase chain reactions (PCRs).

AFLP analysis
AFLP fingerprinting was performed following the method described by Vos et al., 1995 [42] with some modifications. The primers were labelled with fluorescent dyes at the 5'end. 500 ng of genomic DNA was double digested with the restriction enzymes EcoR1 and Mse1(NEB), following the ligation of EcoR1 and Mse1 adapters using T4 Ligase (NEB) to generate template DNA for PCR reactions. After successful restriction ligation reaction, pre-selective amplification was carried out with EcoR1 + 1 (GAC TGC GTA CCA ATTC + A) and Mse1 + 1 (GAT GAG TCC TGA GTAA + C) primers each having one selective nucleotide. The pre-selective amplified products were used as the selective PCR amplification reaction template. Selective amplification was conducted using the four EcoR1 primers, each having three different extra nucleotide bases and eight Mse1 primers, each with three different extra nucleotide bases. Ecor1 primers were fluorescently labelled at 5' end with different fluorophores. A total of 81 primer combinations were tested on the DNA of each 15 individuals of M. sagu, and 32 primer combinations per sample were chosen for the selective PCR amplification to detect the polymorphism among the fifteen populations of M. sagu (Table 7). Capillary electrophoresis of selective amplification products was performed using ABI genetic analyzer. The amplified AFLP fragment profile was analyzed with the GeneMarker 3.0.1 analysis software (Softgenetics, USA). The amplified fragment profiles were assembled in binary format and visually scored for allele presence (1) and absence (0) for all the samples. The threshold for allele calling was set at 100 relative fluorescence (rfu) so that any peaks at 100 (rfu) or higher were assigned as 1, and those lower than 1 were assigned as 0.