PCAsuite: Tool for trajectory compression

One of the main shortcomings to popularize the use of Molecular Dynamics are its potentially large trajectory files. State-of-the-art simulation in the high nanosecond time scale can span easily several Gb, especially for large systems. Traditional general compression algorithms like LZW have been used in order to reduce the required space, but they usually do not perform well with this kind of data. However, trajectory data is not random. It follows patterns with well defined meaning that can be exploted for data compression. In particular, higher frequency movements can be discarded without affecting the overall dynamics of the system. Principal Component Analysis is one of the most used techniques to capture the essential movements of a macromolecular system. It implies a change in the coordinate space where reference eigenvectors are chosen according to the amount of system variance explained. The aim is to select the minimum number of reference coordinates that explain a given amount of system variance. The technique allows to select the degree of fidelity to the original trajectory. Chosing all eigenvectors there is no change in the accurancy of the trajectory. However, removing eigenvectors with the lowest amount of explained variance, has little effect on the overall behavior of the system, but has a remarkable effect on the size of the data.