# Chromosome structure via Euclidean Distance Matrices

### From Wikimization

m (Protected "Chromosome structure via Euclidean Distance Matrices" [edit=autoconfirmed:move=autoconfirmed]) |
|||

Line 7: | Line 7: | ||

In that data video, the left frame represents autocorrelation of a subset of genes at successively increasing levels of resolution. The right frame is the same autocorrelation except that, prior to decomposition of the signal into different resolutions, the positions of the genes along the chromosome are randomly permuted. '''('''On the left is experimental data, | In that data video, the left frame represents autocorrelation of a subset of genes at successively increasing levels of resolution. The right frame is the same autocorrelation except that, prior to decomposition of the signal into different resolutions, the positions of the genes along the chromosome are randomly permuted. '''('''On the left is experimental data, | ||

on the right is control data; ''i.e.'', what one would get from white noise.''')''' | on the right is control data; ''i.e.'', what one would get from white noise.''')''' | ||

- | The idea is that this represents the autocorrelation one would expect if there were no information in the relative positions of the genes along the chromosome. As such, the right frame is a null hypothesis. | + | The idea is that this represents the autocorrelation one would expect if there were no information in the relative positions of the genes along the chromosome. As such, the right frame is a null hypothesis. <math>-</math>Ronan Fleming |

==Realization of Control data== | ==Realization of Control data== |

## Revision as of 18:50, 3 February 2009

**Chromosome structure via Euclidean Distance Matrices**

The data represents the auto-correlation coefficients (6MB video) for gene expression of 3827 genes from the circular chromosome of E.coli across 49 different experimental conditions. In the data, the axis is ordered according the order in which genes appear in the E.coli circular chromosome with an arbitrary start and end point. The expression was smoothed at various resolutions to highlight spatial patterns at different scales. In this way the correlation matrices compliment each other. Bright green indicates a correlation coefficient of +1 and bright red indicates anticorrelation, -1. It is assumed that the E.coli chromosome is structured such that genes which posetively correlate are close in distance within the cell, whereas genes which anticorrelate are far in distance. The exact relation is unknown but it would be interesting to try the alternate hypothesis to see what effect this has on the structure of the molecule.

In that data video, the left frame represents autocorrelation of a subset of genes at successively increasing levels of resolution. The right frame is the same autocorrelation except that, prior to decomposition of the signal into different resolutions, the positions of the genes along the chromosome are randomly permuted. **(**On the left is experimental data,
on the right is control data; *i.e.*, what one would get from white noise.**)**
The idea is that this represents the autocorrelation one would expect if there were no information in the relative positions of the genes along the chromosome. As such, the right frame is a null hypothesis. Ronan Fleming

## Realization of Control data

%%% Ronan Fleming, E.coli molecule data %%% -Jon Dattorro, August 9 2008 clear all load ecoli frame = 4; % 1 through 12 G = her49imfs12movfull(frame).cdata; % uint8 G = (double(G)-128)/128; % Gram matrix N = size(G,1); Vn = [-ones(1,N-1); speye(N-1)]; [evec evals flag] = eigs(Vn'*G*Vn, [], 20, 'LA'); if flag, disp('convergence problem'), return, end; close all Xs = [zeros(3,1) sqrt(real(evals(1:3,1:3)))*real(evec(:,1:3))']; % Projection of -Vn'D Vn on PSD cone rank 3 plot3(Xs(1,:), Xs(2,:), Xs(3,:), '.')

## E.coli realization

I regard the autocorrelation data you provided as a Gram matrix.

Then conversion to a Euclidean distance matrix (EDM) is straightforward -

Chapter 5.4.2 of Convex Optimization & Euclidean Distance Geometry.

The program calculates only the first 20 eigenvalues of an oblique projection of the EDM on a positive semidefinite (PSD) cone -

Chapter 7.0.4 - 7.1 *ibidem*.

You can see at runtime that there are many significant eigenvalues; which means, the Euclidean body (the molecule) lives in a space higher than dimension 3, assuming I have interpreted the E.coli data correctly.

To get a picture corresponding to physical reality, we obliquely project the EDM on the closest rank-3 subset of the boundary of that PSD cone; this means, precisely, we truncate eigenvalues.

It is unlikely that this picture is an accurate representation unless the number of eigenvalues of that projection approaches 3 prior to truncation.

Matlab Figures allow 3D rotation in real time, so you can get a good idea of the body's shape.

I include a low-resolution figure here (frame 4) for reference.