Development of Single Nucleotide Polymorphism (SNP) Markers in Tropical Crops

Understanding genetic diversity, association studies, evolution analysis, quantitative trait loci, marker-assisted selection and genome-wide association in tropical crops are important for improving plant characteristics in order to increase food sustainability in tropical countries. Single nucleotide polymorphism (SNP) marker is becoming the most popular molecular marker for those studies. By using SNP marker, genes associated with important traits can be identified efficiently compared to the other molecular markers. This review describes about how SNP can be discovered in the plant genomes and the application of SNP in plant breeding, especially in tropical crops such as rice, maize, peas, potato, tomato, cassava, taro, etc.

USA) are computationally well equipped and capable of filtering out duplicated SNPs. These systems were successfully applied to discover SNPs in crops with and without reference genome sequences. Although GBS has the potential to discover several million SNPs, one of the major drawbacks of this technique is large numbers of missing data. To solve this problem, computational biologists developed data imputation models such as BEAGLE v3.0.2 and IMPUTE v2, to bring imputed data as close as possible to the real data.

SNP Validation.
The discovered SNPs must be validated to identify the true SNPs and get an idea of the percentage of potentially false SNPs resulting from an SNP discovery exercise. Validation can serve as an iterative and informative process to modify and optimize the SNP filtering criteria to improve SNP calling. For example, a subset of 144 SNPs from a total of 2,113,120 SNPs were validated using the Ilumina GoldenGate assay on 160 accessions in apple.

0200202-05
SNP genotyping is the downstream application of SNP discovery to identify genetic variations. SNP applications including phylogenic analysis, marker-assisted selection, genetic mapping of quantitative trait loci (QTL), bulked segregant analysis, genome selection, and genome-wide association studies (GWAS).

Figure 2.
Overview of SNP discovery in plants through genotyping by sequencing (GBS) system [6] In addition, analytical methods to discover novel SNPs and detect known SNPs include: capillary electrophoresis; mass spectrometry; single-strand conformation polymorphism (SSCP); single-base extension; electrochemical analysis; denaturating HPLC (High-performance liquid chromatography) and gel electrophoresis; restriction fragment length polymorphism; and hybridization analysis.

SNP discovery in tropical crops
In potato (Solanum tuberosum), 575,340 SNPs were identified within three cultivars: 'Atlantic', 'Premier Russet' and 'Snowden' using Illumina sequencing. DNA was extracted from 248 potato lines using the Qiagen Qiaxtractor DX system (Qiagen Inc., Valencia, CA). Samples were loaded at 50 ng/μl on an Illumina BeadXpress Analyzer (Illumina inc., San Diego, CA) and data were analyzed using the Illumina GenomeStudio software. Cluster positions for three marker classes (AA, AB, and BB) were manually determined for each marker within the Illumina GenomeStudio software. The SNPs identified in this study will enable more efficient marker-assisted breeding efforts in potato.  (2011) [18] A set of 100,000 to 250,000 SNPs were discovered in sorghum (Sorghum bicolor L.) by High Throughput-Next generation sequencing (NGS) -RAD-Seq [19]. In common bean (Phaseolus vulgaris L.), 827 non-genic SNPs were discovered using the GoldenGate technology of Illumina [20]. Moreover, in chickpea (Cicer arietinum L.), 1022 SNPs were identified based on Illumina GoldenGate Genotyping Technology [21]. About 533 SNPs were discovered in Camelina sativa by Illumina GoldenGate [22]. In quinoa (Chenopodium quinoa Willd.), 14,178 SNPs were identified based on KASPar genotyping chemistry and were detected using the Fluidigm dynamic array platform [23]. In rice (Oryza sativa), more than four million SNPs from around 500 rice landraces were identified by Yu et al. (2014) [24] using GBS. In Nipponbare genome 0.64 SNP was found per one kb, while in Dongjin genome contains 0.45 SNP/kb. Moreover, Huang et al. (2009) [25] detected 122,791 SNPs in indica cv."9311" and japonica cv. "Nipponbare" that was average 3.2 SNPs/kb.

SNP Identification for a trait
SNPs are involving in genetic and genome mapping, association studies, genetic diversity analysis, and tagging important genes. In figure 5 is one of the example of association SNPs with a trait (plant height).

Figure 5. SNP identification for plant height
In this example, each SNP conducts a test of association with a trait (plant height). In here, SNP (A/G) associated with plant height. In this significant SNP/trait association suggests SNP has direct biological function (functional polymorphism) and SNP in Linkage Disequilibrium with functional polymorphisms. Furthermore, association studies can determine whether a genetic variant is associated with a trait (for example, disease). Based on Shi et al. (2011) [26], SNPs have been discovered and verified in tomato and successfully used in selection of resistance to viruses, bacterial speck and bacterial spot. Below is the example of four tomatoes that resistance and susceptible to root-knot nematode ( figure  6). SNP (G/C) associated with root-knot nematode. Mi is root-knot nematode resistance and mi is root- mi AAGTAGACGACGTTAGTAAAAT Figure 6. SNP identification for resistance and susceptible to root-knot nematode in tomatoes [26] 1.6. Application of SNP markers in molecular breeding SNP is becoming to be the most useful as molecular marker in genome mapping, comparative mapping, framework mapping, varietal/line identification, map-based cloning, genome selection, association studies, diversity analysis, genetic variation, population structure, evolution analysis, bulk segregant analysis, tagging genes for economically important traits, accelerated cloning of gene/QTL of interest, marker-assisted selection (MAS) and genome-wide association studies (GWAS) in crops because of their abundance and automated high throughput genotyping. MAS has many advantages than conventional phenotypic selection because MAS is simpler than phenotypic screening which can save time, resources and effort; selection can be carried out at the seedling stage; and single plants can be accurately selected. Moreover, SNP markers can be used in molecular breeding in population genetics, such as disease association and pathogen detection.
In tomato (Solanuum lycopersicum L. syn Lycopersicon esculentum Mill.) breeding, SNP markers were used in selection Tomato Mosaic Virus (ToMV) and Tomato spotted wilt virus (TSWV) resistance genes [11]. In this research, SNP markers were used in association study, to create a genetic test that will screen for a disease in which the disease-causing gene had already been identified. Then, collect leaves samples from a group of plants affected by the disease and analyze their DNA for SNP patterns. Furthermore, compare these patterns to patterns obtained by analyzing the DNA from group of plants unaffected by the disease. This comparison can detect differences between the SNP patterns of the two groups of plants. Therefore, SNP marker has become important and useful in the selection of disease resistance genes. These markers will provide breeders with a tool in selection of Tm-2 and Tm-22 resistance genes of ToMV in tomato breeding program. SNPs can differentiate resistance and susceptible allele at Tm-2 locus. In addition, SNP markers also can differentiate Sw5-b and Sw5-a resistance genes and sw5-b susceptible gene of TSWV.
Furthermore, in molecular breeding, SNP markers also can be used in quantitative trait locus (QTL)/gene discovery. For example, in maize (Zea mays), SNP markers have facilitated the dissection of complex traits such as flowering time by using 5000 Recombinant Inbred Lines (RILs) and genotyping with 1,200 SNP markers. Next, the genetic architecture of flowering time was discovered and controlled by small additive QTL rather than a single large-effect QTL [27]. In addition, based on Poland et al. (2011) [28], by using 5000 RILs and 1.6 million SNP markers, 29 QTL were discovered and candidate genes for northern leaf blight disease were identified. Moreover, a study from Pioneer Hi-Bred International Inc. (a private breeding program) by using their proprietary SNP markers developed by them self, reported identifying a high-oil QTL (qHO6) affecting maize seed oil and oleic acid contents. This QTL encodes an acyl-CoA:diacylglycerol acyltransferase (DGAT1-2), which catalyzes the final step of oil synthesis [30]. According to Kump et al. (2011) [30], the genetic structure of northern leaf blight, southern leaf blight, and leaf architecture was studied using ∼1.6 million SNPs in maize. Additionally, five SNP primers which are associated with northern leaf blight resistant gene (Ht1) were identified by Junta et al. (2020) [31]. These SNP primers are MZSNP-0055106, MZSNP-0065744, MZSNP-0070164, MZSNP0063922, and MZSNP-0073150 that located on chromosome 2.
In rice (Oryza sativa) breeding, a GWAS was performed using ∼3.6 million SNPs from ~50,000 rice accessions identified genomic regions associated with 14 agronomic traits [32]. These traits including morphological characteristics (tiller number and leaf angle), yield components (grain width, grain length, grain weight and spikelet number), grain quality (gelatinization temperature and amylose content), coloration (apiculus color, pericarp color and hull color) and physiological features (heading date, drought tolerant and degree of seed shattering). Furthermore, according to McNally et al. (2009) [33], SNPs revealed the breeding history and relationships among the 20 rice varieties; some SNPs are associated with agronomic traits that used in rice improvement. These comprehensive SNP data provide a foundation for deep exploration of rice diversity and gene-trait relationships and their use for future rice improvement. Based on Ayres (2000) [34], SNP markers were used to identify Waxy gene (control amylose synthesis by coding starch synthase enzyme) and sd-1 (semi dwarfing gene). In addition, 1536 SNP markers used for identify genetic diversity on Malaysian rice varieties [35]. Meanwhile, only 932 SNPs that have high quality alleles and amplified across the rice varieties.
In wheat (Triticum aestivum), SNP markers were mapped between the known flanking markers for Fhb1 (Fusarium head blight resistance gene). These new markers would be useful for MAS and fine mapping towards cloning Fhb1 gene [36]. Moreover, SNP markers have also been used to study the evolution of genes such as WAG-2 in wheat [37]. Identification of Pin b gene (Puroindolin b) for grain hardiness [38] and Rht1 and Rht2 (semi dwarfing gene) also using SNP markers [39].
In soybean (Glycine max), SNP markers were used to increase the efficiency and cost effectiveness through MAS and enhance the resolution within the target locus. The gene Rag1 (Soybean aphid resistance gene) was mapped between two SNP markers that corresponded to a physical distance of 115kb and identified several candidate genes [40]. In another aphid resistance gene, Rag2, originally mapped to a 10cM interval, was fine mapped to a 54kb interval using SNP markers that were developed by resequencing of target intervals and sequence-tagged sites [41]. Ha et al. (2007) [42] identified SNP markers tightly linked to a QTL conferring resistance to southern root-knot nematode by developing these SNP markers from the bacterial artificial chromosome (BAC) ends and SSR-containing genomic DNA clones. According to Shi et al. (2009) [43], SNP markers were useful in identification and selection of Soybean Mosaic Virus (SMV) resistance genes. SMV is the most destructive viral disease in soybean.
Then, genetic resistance is the primary method of controlling this disease. These genes are Rsv1, Rsv3, and Rsv4. Furthermore, SNP markers were developed for those genes. There are 2 SNP markers were identified from the 3gG2 gene (the resistance allele at Rsv1 locus and the gene has been cloned and sequenced) at Rsv1 locus, 4 SNPs near Rsv1, 5 near Rsv3, and 5 near Rsv4 were validated. SNP markers also were used to identify Rhg1 and Rhg4 (Soybean Cyst Nematode resistance allele) [44].
In sugar beet (Beta vulgaris), SNP markers were developed to map QTL for Beet necrotic yellow vein virus resistance genes, Rz4 [45] and Rz5 I [46]. Additionally, in cowpea (Vigna unguiculata), a consensus genetic map was developed based on EST-derived SNPs that a very important resource for genomic and QTL mapping studies [47]. In Arabidopsis, SNP markers were utilized to clone the VTC2 gene based on the fine mapping and map based cloning approaches. Jander et al. (2002) [48] fine mapped the gene interval from ~980kb region to a 20kb interval and additional nine candidate genes were identified in that interval and subsequently the underlying mutation was discovered.
In oil palm (Elaeis guineensis), SNP markers were used for genetic diversity [49]. The genetic evaluation of oil palm germplasm collections is important to insight into the variability among populations. The information obtained is also useful for incorporating new genetic materials into current breeding programs. Furthermore, genetic diversity information among populations is important to identify selected palms with economically important traits which are being incorporated into the current breeding programs; these include high oil yield, low height increment, large kernel, long stalk, low levels of lipase, and high levels of carotene, vitamin E, iodine, and oleic acid. In this research, 219 oil palms from two natural Angolan populations are used and a total of 62 SNP markers were designed from oil palm genomic sequences and converted to cleaved amplified polymorphic sequence (CAPS). Based on cluster analysis using unweighted pair group method with arithmetic, the 219 oil palms fell into two clusters or genetic groups that do not coincide with the geographic populations. Cluster I consisted of eight palms and cluster II included 211 palms. Furthermore, molecular variance analysis revealed that only 7% of the total genetic variation was explained by the variation between populations and 93% of the total genetic variation was attributed to variation within populations. Moreover, levels of genetic variation in plants are directly associated with breeding system. This information can be applied in selecting a sampling strategy for genetic conservation. In addition, SNP markers also use for identifying fumarate hydratase 1 (FUM1) gene located on chromosome 4, which associated with nitrogen uptake in oil palm [50].
Genetic diversity study in cassava (Manihot esculenta Crantz) was done by using 5600 informative SNP markers obtained from GBS analysis [51]. The result showed three main clusters from 96 cassava genotypes, at similarity of 0.41. This genetic diversity suggests that cassava genotypes might contains alleles with high additive genetic variance that very important for the genetic improvement, conservation, and provide database for parental selection in order to develop hybrid vigor cassava. Furthermore, genetic diversity study of 70 taro (Colocasia esculenta) accessions of Hawaiian, South Pacific, Pulauan, and mainland Asian origins was done by using 2400 SNP markers [52]. The disease resistant gene in taro was revealed by using this genetic diversity study.
In pea (Pisum sativum), SNP markers were applied in fine mapping within chosen QTL confidence intervals and marker assisted breeding for important traits in pea improvement [53]. Moreover, in eggplant (Solanum melongena), the genetic diversity revealed by SNP markers [17]. In this research, 384 of the 2,201 highest quality SNPs (score > 0.6) were applied to genotype 23 eggplant germplasms that have variation in fruit shape and skin color. About 94 SNP markers in common bean (Phaseolus vulgaris L.), were used in genetic diversity. These markers were evaluated from 70 cultivated and wild accessions according to their gene pool, race and country of origin [20]. In Chickpea (Cicer arietinum), 1022 SNPs were used to make high-resolution genetic linkage map [21]. Furthermore, in Camelina sativa, 533 SNPs were applied to mapped potential candidate genes and to assess genetic variation among a collection of 175 accessions. The SNPs will provide useful tools for future crop improvement of C. sativa as an industrial oilseed [22]. SNP markers were used on GWAS and candidate gene association (CGA) studies. A GWAS approach was applied to understand the genetic architecture of complex traits, for example complex diseases of northern and southern corn leaf blights [30]. As GWAS requires large number of molecular markers, the utility of GWAS in dissection of molecular basis of traits in polyploid crops such as wheat, and cotton has been fairly limited due to the insufficient number of polymorphic markers and the absence of reference genome.

Conclusion
Genomic information of plants obtained from genetic diversity, association studies, evolution analysis, quantitative trait loci (QTL), marker-assisted selection (MAS) and genome-wide association studies (GWAS) are very useful for increasing food sustainability. To achieve food sustainability in this changing climate condition, plants with superior characteristics must be developed by using genomic information. Genes associated with important traits can be identified efficiently with SNP markers. Genomic information of many tropical crops have been identified in order to improve their characteristics.