![]() |
Principal coordinates analysis, or metric multidimensional scaling, is a statistical method employed to reconcile multivariate data to establish relationships based on similarities or dissimilarities. This kind of analysis aims to map the distance matrices of the original high-dimensional data space into a lower-dimensional data space where the distances between individual data points are preserved in a manner that maximizes. This method is more helpful when there are many and intricate relations between variables, like in ecological, genomic, and social scientific studies. Table of Content
Introduction to Principal Coordinates AnalysisPrincipal Coordinates Analysis (PCoA) is a statistical method that converts data on distances between items into a map-based visualization of those items. Unlike Principal Component Analysis (PCA), which is based on Euclidean distances, PCoA can handle any distance or similarity measure, making it more flexible for various types of data.
PCoA excels at analyzing data presented as dissimilarity matrices. These matrices capture the pairwise dissimilarities or distances between objects, making PCoA particularly valuable in fields like ecology, genetics, and social sciences where relationships are often expressed as distances rather than direct measurements. The Mathematical Foundation: From Distances to CoordinatesAt its core, PCoA aims to transform a dissimilarity matrix into a set of coordinates in a lower-dimensional space (typically 2D or 3D) while preserving the original distance relationships as faithfully as possible. This transformation is achieved through the following key steps: 1. Distance Matrix: First, calculate the distance between all pairs of districts. D of size ? × ? where Dij denote and stand for the separation of two points ? and j. 2. Double-Centering: To achieve this, add a weight matrix to the distance matrix such that the output will be a similarity matrix B using double-centering. This involves:
3. Eigen Decomposition: Perform an eigen decomposition of the similarity matrix B [Tex] B = Q \Lambda Q^T [/Tex] where Q is the eigenvector matrix and Λ is an identity matrix containing the eigenvalues on the diagonal. 4. Principal Coordinates: The principal coordinate values are derived by scaling the eigenvectors by the square root of the corresponding eigen values. [Tex] X = Q \Lambda^{1/2} [/Tex] Here, X is the matrix of coordinates of all the points in the new space, where the elements in the matrix are organised in rows. 5. Dimensionality Reduction: Usually, the selection of the top is performed in order to reduce the dimensionality that has been obtained. P Karl eigenvalues and eigenvectors that show most variability in data relative to lower dimensions. How Does Principal Coordinates Analysis Work?PCoA makes use of eigen analysis to find the main axes through a matrix. Then double centring is applied to the matrix (derived by eigenvalue decomposition). As a next step, it calculates a set of eigenvalues and eigenvectors, where each eigenvalue has an eigenvector. The eigenvalues are ordered from the greatest to the least, and the first eigenvalue is considered the leading one. Using eigen vectors, one can explore or visualise the main axes through the initial distance matrix. Here, it doesn’t change the position of points related to each other but rather changes the coordinate system. The algorithm can be divided into following steps:
Implementing Principal Coordinates Analysis Using skbioIn this section we make use of pcoa() method from scikit-bio for principal coordinate analysis. Syntax: skbio.stats.ordination.pcoa(distance_matrix, method='eigh', number_of_dimensions=0, inplace=False) Parameters:
Returns: It returns an object that stores the PCoA results, including eigenvalues, the proportion explained by each of them, and transformed sample coordinates. Example: We can create a dummy city distance dataset using pandas dataframe and let’s apply principal coordinate analysis to this dataset. 1. Importing Libraries:import pandas as pd # for data manipulation from skbio.stats.ordination # perform PCoA import pcoaimport matplotlib.pyplot as plt # for plotting the result 2. Create a Datasetdata = [['Delhi', 0, 1000, 1700, 1500, 2500], ['Patna', 1000, 0, 1900, 1400, 2600], ['Goa', 1700, 1900, 0, 600, 750], ['Hyderabad', 1500, 1400, 600, 0, 1100], ['Kochi', 2500, 2600, 750, 1100, 0]] # Create the pandas DataFrame df = pd.DataFrame(data, columns=['Origin', 'Delhi', 'Patna', 'Goa','Hyderabad', 'Kochi']) The above code creates a city distance dataset, which provides the distance between two cities in India. 3. Creating a Distance Matrix:As a next step, we need to convert the pandas dataframe to distance matrix. Using to_numpy() method one can convert dataframe to matrix. dmatrix = df.iloc[:,1:].to_numpy() Here we remove the Origin column and the remaining columns are used for conversion. 4. Performing PCoApcoa_result = pcoa(dmatrix, number_of_dimensions=2) The distance matrix calculated from the R and G channels is subjected to PCoA to achieve the dimensionality reduction while ensuring that the distances are preserved as much as possible. 5. Extracting Coordinatescoordinates = pcoa_result.samples The coordinates from the Principal Coordinate Analysis are obtained as follows. These coordinates denote the points in the new lower-dimensional space as they are transformed. 6. Plotting the Resultsdf_pcoa = coordinates[['PC1', 'PC2']] df_pcoa['Origin'] = df['Origin'].to_numpy() df_pcoa = df_pcoa.set_index('Origin') print('\n\n', df_pcoa) fig, ax = plt.subplots() df_pcoa.plot('PC1', 'PC2', kind='scatter', ax=ax) plt.title('PCoA Plot') for k, v in df_pcoa.iterrows(): ax.annotate(k, v)
Now let’s implement the code and analyze the output. The code is as follows:
Output: Dataframe Origin Delhi Patna Goa Hyderabad Kochi 0 Delhi 0 1000 1700 1500 2500 1 Patna 1000 0 1900 1400 2600 2 Goa 1700 1900 0 600 750 3 Hyderabad 1500 1400 600 0 1100 4 Kochi 2500 2600 750 1100 0 Distance Matrix [[ 0 1000 1700 1500 2500] [1000 0 1900 1400 2600] [1700 1900 0 600 750] [1500 1400 600 0 1100] [2500 2600 750 1100 0]] PCoA Result:: Ordination results: Method: Principal Coordinate Analysis (PCoA) Eigvals: 2 Proportion explained: 2 Features: N/A Samples: 5x2 Biplot Scores: N/A Sample constraints: N/A Feature IDs: N/A Sample IDs: '0', '1', '2', '3', '4' Applying PCoA will create a set of new dimensions. Since we mentioned number of dimensions as 2, the code will create two new dimensions. The code to fetch the new coordinates as follows:
Output:
We can get the feature data from each coordinates. The code is as follows:
Output: First Principal Coordinates [-1056.34515353 -1180.11914341 613.69991135 237.63596197 1385.12842362] Second Principal Coordinates [-536.88803625 455.97691294 -202.81915198 259.3174025 24.41287279] Visualization of PCoA CoordinatesLet’s plot both the coordinates along the x and y axis.
Output ![]() PCoA plot The result is a scatter plot in which five points are placed according to the values of the first two principal coordinates, which are determined during the PCoA. All the points on the plot are labeled based on the origin column and it has correct axis and title. Explanation of the Plot:
Interpreting PCoA Plots: Untangling Complex RelationshipsThe resulting PCoA plot offers a visual representation of the original dissimilarity matrix. Here’s how to interpret it:
Advantages and Limitations of PCoAAdvantages:
Limitations:
Applications of Principal Coordinates AnalysisPCoA has a wide range of applications in various fields, including ecology, microbiology, and genomics. It is particularly useful for handling non-Euclidean distances, such as Bray–Curtis dissimilarity and unweighted UniFrac distance, which are commonly used in these fields to describe pairwise dissimilarity between samples. PCoA allows researchers to visualize variation across samples and potentially identify clusters by projecting the observations into a lower dimension. Few examples are given below:
Comparing PCoA with Other Multivariate Techniques: When to Use Which
ConclusionPrincipal Coordinates Analysis (PCoA) is a versatile and powerful method for visualizing the similarities and dissimilarities among a set of objects. Its ability to handle various distance measures makes it suitable for a wide range of applications, from microbial ecology to marketing research. By understanding the mathematical foundations and practical implementation of PCoA, researchers can effectively use this technique to gain insights into their data. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 10 |