Once upon a time in the history of proteomics and liquid chromatography-mass spectrometry (LC-MS), instrument manufacturers encoded data in vendor-specific formats and offered proprietary software tools for data analysis, with limited or zero data-exporting capabilities. For a long time, there was no common data file format, making it very difficult to share, compare, and analyze MS data obtained from different platforms. And so, to perform custom data analysis, scientists would often have to reinvent the wheel, spending precious time writing programs for common tasks, such as peak picking, deconvolution, peak area calculation, and data visualization.
Initial efforts from the Seattle Proteome Center (mxXML format) and later from HUPO-Proteomic Standard Initiative (mzML format) were key contributions to the field because they provided standard, vendor-neutral formats for MS data. Most vendors joined the parade and provided support or even tools to convert proprietary data into vendor-neutral formats. A second major contribution to the field was the Proteowizard project, which provided the community with a robust, validated modular set of free, open-source tools and libraries to perform analysis of MS and proteomics data. Scientists could finally focus their time and energy on developing novel algorithms and tools that significantly advance the field. So, what is happening within the field of MSI? More than 15 years after its introduction, there is still no widely accepted tool for viewing and performing basic data processing (peak picking, feature recognition, data extraction, normalization). Lessons learned from the history of software development in proteomics and LC-MS suggests that a common, shareable data file format is one prerequisite for such tool to exist. The imzML format has now been around for several years and has been accepted by the MSI community as the common data file format for MSI. Free, robust data file converters from vendor formats to imzML are also widely available.
But what about MSI software? After a quick census, we found 20 different MSI tools, eight of which are commercial products. The disconcerting part is that most of the free software was released in the last two years or is currently “in development”. It is safe to assume that most of the time spent coding these 12 interfaces was not spent developing novel data-processing algorithms, but instead building the user interface and implementing the same basic but necessary MSI tools. On the other hand, brilliant research is focusing on the development of new algorithms for the analysis of MSI data, such as peak picking, automatic feature extraction, data normalization, spatial segmentation, clustering, resampling, but until there is a common, open source MSI platform, they cannot be easily implemented.
Earlier this year, our group introduced MSiReader, an open-source interface for viewing and analyzing MSI data. The fully customizable interface can load most common MS/MSI data file formats, process high mass resolving power data, and comes with a plethora of great data visualization and analysis tools. Currently used by hundreds of scientists in over 50 (academic and industry) laboratories around the world, it already incorporates several important features, such as data export tools, peak extraction, batch processing, spectra viewing, normalization, image co-registration, baseline correction. Do we think MSiReader is the ultimate solution? Well, from the positive response we’ve received from the community so far, we firmly believe that it is at least a very honest attempt. However, there is still a lot of work to do before MSiReader becomes the definitive tool. In fact, we can already think of a small list of improvements: implementation of a comprehensive statistical analysis toolset, optional data resampling, 3D MSI, and co-registration with more data formats. So, let’s all work together and make it happen! We see all feedback and user input as essential to the future development and success of MSiReader.