---
title: "PC Directions in dppca"
description: >
  Principal component direction estimation used in dppca, including
  non-private sample PCA directions and differentially private g-DPPCA directions.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{PC Directions in dppca}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center"
)
```

In ordinary PCA, the principal component directions are obtained from the
eigenvectors of the sample covariance matrix. In `dppca`, these directions can be
computed in two different ways.

1. **Non-private PC directions**: eigenvectors of the sample covariance matrix.
2. **Differentially private PC directions**: private principal component directions obtained through the g-DPPCA procedure.


## Notation

Let

\[
X =
\begin{bmatrix}
X_1^\top \\
X_2^\top \\
\vdots \\
X_n^\top
\end{bmatrix}
\in \mathbb{R}^{n \times p}
\]

be the data matrix used for PCA, where $X_i \in \mathbb{R}^p$is the $i$-th observation. 
We assume that $X$ has been centered, and optionally standardized.


The principal component direction matrix is denoted by

\[
V_k = [v_1,\ldots,v_k] \in \mathbb{R}^{p \times k},
\]

where each column $v_\ell$ is a unit vector representing the $\ell$-th pc direction.

The corresponding score matrix is $Z = X V_k$.



## 1. Non-private PC directions

The classical sample covariance matrix is

\[
\hat\Sigma
=
\frac{1}{n-1}X^\top X.
\]

The non-private PCA directions are obtained from the eigenvalue decomposition

\[
\hat\Sigma
=
\hat V \hat\Lambda \hat V^\top,
\]

where

\[
\hat V = [\hat v_1,\ldots,\hat v_p],
\quad
\hat\Lambda
=
\operatorname{diag}(\hat\lambda_1,\ldots,\hat\lambda_p)
\quad \text{with} \quad
\hat\lambda_1 \geq \hat\lambda_2 \geq \cdots \geq \hat\lambda_p \geq 0.
\]


The $\ell$-th sample principal component direction is $\hat v_\ell$.


Equivalently,

\[
\hat v_\ell
=
\arg\max_{\|v\|_2 = 1}
v^\top \hat\Sigma v
\quad
\text{subject to}
\quad
v^\top \hat v_j = 0,
\qquad j = 1,\ldots,\ell-1.
\]


In the non-private option of `dppca`, the direction matrix used for projection is

\[
\hat V_k = [\hat v_1,\ldots,\hat v_k].
\]


## 2. DP PC directions

[Kim and Jung (2025)](#ref-Kim2025) proposed `g-DPPCA` by adding matrix Gaussian mechanism on the generalized multivariate Kendall's tau matrix which based on the robust data transformation called generalized spatial sign proposed by [Raymakers and Rousseeuw (2019)](#ref-Raymaekers2019). 

For a positive valued scale function $\xi: (0, \infty) \to (0, \infty)$, consider a map $g_\xi: \mathbb{R}^d \to \mathbb{R}^d$ defined as 

\[
g_\xi(t) = \xi(\|t\|_2)\cdot \frac{t}{\|t\|_2}.
\]

$g_{\xi}$ is called as a *generalized spatial sign* with respect to $\xi$.

The *generalized multivariate Kendall's tau* matrix with respect to $g_\xi$ is defined as

\[
K_{g_\xi} = \mathbb{E}_{X, X'}\left[ g_\xi\left( \frac{X - X'}{\sqrt{2}}\right) 
            g_\xi\left( \frac{X - X'}{\sqrt{2}}\right)^\top ~ \right],
\]

where $X'$ is an independent copy of $X$. Importantly, if $X$ follows an elliptical distribution (which including Gaussian and multivariate $t$-distributions), $K_{g_\xi}$ shares the same eigenvectors with same order to the $\mbox{cov}(X)$. So, one can conduct a PCA by estimating $K_{g_\xi}$ and then get eigenvectors of it.

For a convenience, we write $g$ as the given sign function. For a random sample $S = (X_1, \dots, X_n)$, the second order *U*-statistic of $K_{g}$ can be written as

\[
\widehat{K}_g(S) = \frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{X_j - X_i}{\sqrt{2}}\right)
     g\left(\frac{X_j - X_i}{\sqrt{2}}\right)^\top.
\]

Note that the sensitivity of $\widehat{K}_g$ with respect to the Frobenius norm can be upper bounded by

\[
\Delta_F(\widehat{K}_g) 
= \sup_{S \sim S'} \|\widehat{K}_g(S) - \widehat{K}_g(S')\|_F
\le \frac{4\|g\|_\infty^2}{n}.
\]

So, for a dataset $S = (x_1, \dots, x_n)$ the randomized mechanism $\bar{K}_g$ defined as 

\[
\bar{K}_g(S) :=   
\frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{x_j-x_i}{\sqrt{2}}\right)g\left(\frac{x_j-x_i}{\sqrt{2}}\right)^\top + \mbox{vecd}^{-1}(\xi),
\]
where $\xi \sim N_{d(d+1)/2}(0, \sigma_{\varepsilon, \delta}^2 I_{d(d+1)/2})$ and 
$\sigma_{\varepsilon, \delta} = \frac{4\|g\|_{\infty}^2 \sqrt{2 \ln(1.25/\delta)}}{n\varepsilon}$, satisfies $(\varepsilon, \delta)$-DP.

Define $\bar{V}_{g, m}(S) \in \mathcal{O}(d, m)$ as the matrix of the first $m$ eigenvectors of $\bar{K}_g(S)$. Then, $\bar{V}_{g, m}(S)$ satisfies $(\varepsilon, \delta)$-DP due to the post-processing property, and it can be served as a DP principal components. Kim and Jung (2025) calls these process as a `g-DPPCA`.

In the implementation of the function `dp_pc_dir` with option `g_dppca=TRUE`, we use the spherical transformation $g_{sph}(t) = t/\|t\|_2$ to output differentially private PC directions $\bar{V}_{sph,m}$. In this case, it holds that $\|g_{sph}\|_{\infty} = 1$, and thus the variance of additive Gaussian noise is set as $\sigma_{\varepsilon, \delta} = \frac{4\sqrt{2 \ln(1.25/\delta)}}{n\varepsilon}$.




## Summary

The principal component direction step in `dppca` can be summarized as follows.

1. Start with a preprocessed data matrix $X$.
2. Choose a direction estimation method.
3. Obtain a direction matrix $V_k$.
4. Compute projected scores $Y = X V_k$.
5. Use the scores for private scree estimation or private score visualization.

The main distinction is whether $V_k$ is obtained from the ordinary sample covariance matrix or from a differentially private robust PC direction estimator.


## References

<span id="ref-Kim2025"></span>
Minwoo Kim and Sungkyu Jung (2025), "Robust and differentially private principal component analysis," *Statistical Analysis and Data Mining*, 18(6), https://doi.org/10.1002/sam.70053

<span id="ref-Raymaekers2019"></span>
Jakob Raymaekers and Peter Rousseeuw (2019), "A generalized spatial sign covariance matrix," *Journal of Multivariate Analysis*, 171:94–111,  https://doi.org/10.1016/j.jmva.2018.11.010

