Skip to content

Ftir Research Papers


IR spectroscopy is an excellent method for biological analyses. It enables the nonperturbative, label-free extraction of biochemical information and images toward diagnosis and the assessment of cell functionality. Although not strictly microscopy in the conventional sense, it allows the construction of images of tissue or cell architecture by the passing of spectral data through a variety of computational algorithms. Because such images are constructed from fingerprint spectra, the notion is that they can be an objective reflection of the underlying health status of the analyzed sample. One of the major difficulties in the field has been determining a consensus on spectral pre-processing and data analysis. This manuscript brings together as coauthors some of the leaders in this field to allow the standardization of methods and procedures for adapting a multistage approach to a methodology that can be applied to a variety of cell biological questions or used within a clinical setting for disease screening or diagnosis. We describe a protocol for collecting IR spectra and images from biological samples (e.g., fixed cytology and tissue sections, live cells or biofluids) that assesses the instrumental options available, appropriate sample preparation, different sampling modes as well as important advances in spectral data acquisition. After acquisition, data processing consists of a sequence of steps including quality control, spectral pre-processing, feature extraction and classification of the supervised or unsupervised type. A typical experiment can be completed and analyzed within hours. Example results are presented on the use of IR spectra combined with multivariate data processing.


The use of Fourier transform IR (FTIR) spectroscopic techniques for the nondestructive analysis of biological specimens is a rapidly expanding research area, with much focus on its utility in cytological and histological diagnosis through the generation of spectral images1,2. Molecular bonds with an electric dipole moment that can change by atomic displacement owing to natural vibrations are IR active. These vibrational modes are quantitatively measurable by IR spectroscopy3, providing a unique, label-free tool for studying molecular composition and dynamics without perturbing the sample. For interrogating biological materials, the most important spectral regions measured are typically the fingerprint region (600–1,450 cm−1) and the amide I and amide II (amide I/II) region (1,500–1,700 cm−1). The higher-wavenumber region (2,550–3,500 cm−1) is associated with stretching vibrations such as S-H, C-H, N-H and O-H, whereas the lower-wavenumber regions typically correspond to bending and carbon skeleton fingerprint vibrations4. Together, these regions comprise a biochemical fingerprint of the structure and function of interrogated cellular specimens. A typical biological IR spectrum with molecular assignments is shown in Figure 1.

Figure 1

Typical biological spectrum showing biomolecular peak assignments from 3,000–800 cm−1, where ν = stretching vibrations, δ = bending vibrations, s = symmetric vibrations and as = asymmetric vibrations. The spectrum is a...

IR microspectroscopy

Although the spectral domain allows chemical identification, the combination with microscopy (microspectroscopy) permits the examination of complex tissues and heterogeneous samples5. Detection by microscopy (see schematic of instrumentation in Fig. 2) may be accomplished by raster-scanning a point illuminated on the sample or by using wide-field illumination and focal plane array (FPA) or linear array detectors6. At present, wide-field scanning of a sample is possible in seconds, providing tens of thousands of spectra. A variety of choices are available for the IR source, including globar7, synchrotron8-12 and quantum-cascade lasers (QCLs)13, as well as for the detector (2D FPA, linear array or single element)14. The three major IR-spectroscopic sampling modes (Fig. 2b) are transmission, transflection and attenuated total reflection (ATR). Each mode offers convenience for some samples and challenges for others. In transflection mode, for illustration, the sample is placed on an inexpensive IR-reflecting surface (such as that found on low-emissivity (Low-E) slides) and measurements are generated by a beam passing through the sample and reflecting back from the substrate (i.e., the reflective surface) through the sample. As is clear from both theoretical and experimental studies15,16, the recorded spectral intensities depend on both sample morphology and chemistry. Hence, care should be taken on substrate choice17,18. Recently, topographical features of the sample and its effects have been shown to be minimized by inputting second derivative spectra in the classification model; better segregation of normal versus various disease categories facilitates potential spectral histopathological diagnosis19. Research by Cao et al.20 has demonstrated that if this pre-processing data analysis approach is performed (e.g., after both transflection and transmission measurements on dried cellular monolayers), the resulting classification is the same. This example suggests that irrespective of sampling geometry, mathematical tools can be applied to minimize confounding effects and to interpret their influence. As such, spectral processing may determine the diagnostic efficacy of spectral processing, not only from a biological perspective but also from the ability to control optical or distorting influences.

Figure 2

The instrumentation underlying the main forms of IR spectroscopic sampling. (a) Schematic of modern FTIR-imaging spectrometer. Reproduced with permission from ref. 6. (b) Schematic representation of the three main sampling modes for FTIR spectroscopy....

FTIR imaging provides spatially resolved information based on chemically specific IR spectra in the form of an information-rich image of the tissue or cell type being interrogated21-23. Further multivariate data analysis allows potential diagnostic markers to be elucidated, thus providing a fast and label-free technology to be used alongside conventional techniques such as histology2,22. At present24, rapid imaging permits imaging in hours for a whole-organ cross-section, such as that from the prostate; this not only allows one to objectively visualize pathology in situ but the aforementioned classification models could also allow one to grade disease on the basis of the cateogries into which spectra might be aligned. One excellent interpretation application of IR imaging data is to consider it as a metabolomic tool that allows the in situ, nondestructive analysis of biological specimens, (e.g., determining the glycogen levels in cervical cytology)25.

Data can be recorded from a variety of samples, ranging from live cells to formalin-fixed, paraffin-embedded (FFPE) archival tissue typical of a pathology specimen. IR spectra representing distinguishing fingerprints of specific cell types (e.g., stem cells versus transit-amplifying cells versus terminally differentiated cells) within a defined tissue architecture (e.g., crypts of the gastrointestinal tract and cornea)9,26 are now easily recorded. Consequently, spectral analyses delineate cellular hierarchy on the basis of protein, lipid and carbohydrate composition and/or DNA conformational changes27. For biomedical analyses, the major goal today is to derive an image of tissue architecture expressing the underlying biochemistry in a label-free fashion28, a development that can considerably extend our diagnostic potential beyond present capabilities. For example, to distinguish cells committed toward a pathological process (e.g., transformation) that conventional methods (e.g., visual scoring) might identify as normal. The screening of cervical cytology specimens to distinguish normal versus low-grade versus high-grade cells4,29, to grade primary neoplasia30, or to determine whether tissue margins and potential metastatic sites are tumor free31,32 are examples of this concept across many types of tissues. It is this bridge from the technology and potential of IR spectroscopy and imaging to biological, mainly clinical, applications that is the subject of this protocol (Fig. 3).

Figure 3

FTIR spectroscopy work flow for imaging and diagnosis. The three major steps are sample preparation, FTIR spectral acquisition and data analysis. Sample preparation may differ depending on the sample format, requiring different materials and procedures....

IR spectroscopy in cancer classification and imaging

By using IR spectroscopy either as an imaging tool or by classifying spectral categories, it has been possible to distinguish between benign and malignant tumors in tissue samples of breast32-35, colon22,23,36, lung37 and prostate8,30,38,39 along with cervical cytology or biopsies4,28,40. IR spectroscopic analysis is also an ideal tool for the study of biofluids such as urine, saliva, serum or whole blood; the use of biofluids is desirable in a clinical setting as samples are obtained rapidly and relatively noninvasively, and minimal sample preparation is required. By using such methods, a spectral fingerprint of the biofluid can be obtained, which allows the subsequent classification of spectra from different categories with computational methods and possibly the identification of biomarkers41-44.

FTIR imaging of tissue and cells

Imaging of live cells is possible using both globar and synchrotron-based light sources, with the latter permitting greater lateral spatial resolution and data quality owing to higher flux21,45-47. Diffraction-limited resolution with ATR-FTIR imaging can also be advantageous as it allows analysis of live cells in aqueous systems21,48. In addition, the spatial resolution of the image can be increased by incorporating optics with a high refractive index21,34.

We describe a protocol that has three components: (i) specimen preparation and removal of possible sample contaminants; (ii) acquisition of spectra with a sufficiently high signal-to-noise ratio (SNR); and (iii) data processing for classification and imaging. As the precise steps in acquisition of spectra and data processing are, respectively, dependent on the instrument and software available, this protocol covers (ii) and (iii) to deliver a general understanding of the steps involved. Supplementary Methods 1–4 correspond to four different examples of standard operating procedures (with troubleshooting) specific to common instruments and acquisition/analysis software. Together, this protocol and the material contained in Supplementary Methods 1–4 are designed to build researchers’ confidence in conducting their studies using their own instrumentation and computational settings.

Application of this protocol to other research areas

The application of this protocol is not limited to the biomedical field. IR spectroscopy has previously been used in the fields of environmental toxicology49-52, consumer safety53,54, taxonomy55-57, and in the food industry58; a non-instrument– and non-software–specific protocol for imaging and classification could be of considerable use to these areas of research.

Experimental design: instrumental options

The main steps required to analyze a sample of interest are sample preparation, instrumental setting, acquisition of spectra and data processing (Fig. 4). Before instrumental options are chosen, it is important for the user to understand the expectations from the intended experiment. These include the desired spectral and spatial resolution and type of study (e.g., diagnostic versus exploratory). In addition, proper consideration must be given to potential sample restrictions such as acquiring appropriate sample thickness for respective modes.

Figure 4

Visual effect of different pre-processing steps on a set of FTIR spectra. Two common pre-processing sequences are rubber band baseline correction followed by normalization to the amide I/II peak and first or second differentiation followed by vector normalization....

Sampling modes

Figure 2b shows a schematic representation of each sampling mode and details of each can be seen in Table 1; however, it is important to note that different manufacturer systems may vary slightly in some parameters, such as sampling apertures. Transmission and transflection sampling modes have been applied to a variety of biological specimens that can be sectioned into a thin layer allowing for accurate spectral data acquisition59. ATR-FTIR mode differs in that the IR beam is directed through an internal reflection element (IRE) with a high refractive index (e.g., diamond, zinc selenide, germanium or silicon)60. The evanescent wave extends beyond the IRE surface penetrating the sample, which must be in direct contact with the IRE. The penetration depth of this wave typically ranges from 1 to 2 μm within the 1,800–900 cm−1 region, but it should be remembered that there is still ~5% intensity at a depth of 3 μm (refs. 18,61,62). It has been shown that samples with thicknesses of <2 μm may give rise to spectral artifacts with IR-reflective substrates such as MirrIR Low-E slides (Kevley Technologies); therefore, when these substrates are used with ATR-FTIR spectroscopy, a thicker sample is recommended18.

Table 1

FTIR spectroscopy modes used for the interrogation of cellular materials.

A magnification-limited digital camera may be used for visualization in order to guide manual navigation across a given sample so as to locate a region of interest and help identify basic microscopic features such as separation between cancer cells and stromal elements. An alternative setup for ATR involves placing the sample directly onto the IRE aperture of the ATR accessory. This is particularly useful for biofluid analysis as it bypasses any potential contributions from any slide substrate that the sample could be placed on (Supplementary Method 1). This methodology may also help to reduce experimentation time owing to reduced sample preparation.

Light sources

In IR microspectroscopy, the user has the option of several light sources: a conventional thermal (globar) or synchrotron radiation source for FTIR interferometric measurements or alternative sources such as QCLs63 and filters64, which obviate the use of interferometers. The majority of benchtop instruments use conventional thermal light sources often in conjunction with single-element detectors. A globar source is composed of a silicon carbide rod that generates IR radiation, and can typically generate a collimated mean of ~1,000 μm in diameter, providing a uniformly illuminated aperture of 20–100 μm of the diameter at the sample65. It has been shown that single-cell investigations can be conducted using standard globar IR sources to derive subcellular information66.

A synchrotron radiation light source is ~100–1,000-fold brighter than current benchtop thermal ones, but it illuminates a much smaller area. Thus, a synchrotron source has a natural sampling aperture of 10–20 μm in diameter with a high SNR67. It is therefore possible to achieve single-cell and large organelle (e.g., nucleus) lateral spatial resolution with these modern sources, allowing subcellular molecular distribution analysis68,69. There are ~50 synchrotron facilities worldwide, all easily accessible for routine use as they operate on a call-for-projects basis70. Alternatively, other available sources that may be advantageous to individual studies include optic parametric oscillator (OPO) lasers, QCLs and free-electron lasers (FELs); traditionally they have been primarily used for gas sensing because of intrinsically narrow linewidths71,72; however, modern QCLs can cover much broader wavelength regions (hundreds of cm−1).

Mapping versus imaging

Broadly speaking, detectors can be separated into single-element, linear array and FPA detectors; the detector choice will be influenced by the requirement being imaging (i.e., FPA) or point spectra with high SNR (i.e., single element). The use of a single-element detector allows for individual point spectra to be obtained across a whole sample (for instance, useful when analyzing biofluids); a particular application has been to derive single-cell–specific fingerprint spectra across a heterogeneous tissue section. Acquiring large data sets containing point spectra is a method regularly used in biomedical and environmental studies coupled with multivariate data analysis40,73. Although time consuming, point spectra often have a high SNR, resulting in high-quality spectra, as spatial resolution is limited by IR apertures74. Maps can be generated when point spectra are collected in a stepwise manner in a grid from a target area, which is useful for comparing the different cell types from that particular area, e.g., gastrointestinal crypt23. Spectral maps take a much longer time than individual point spectra and, thus, in order to make large maps feasible to run, the acquisition time for each point can be reduced leading to a lower SNR. The absorbance intensity at each spectral point within the map becomes an individual pixel in the resultant pseudocolor images, which can give details of how different biomolecules vary across the target area.

In contrast to aperture-based systems, non-aperture-based instruments such as FPA and linear array detectors provide imaging using spatially arranged detectors. Multielement detectors allow for simultaneous spectral acquisition, which, combined with suitable optics, produce spectral images with good SNR and lateral spatial resolution close to the diffraction limit75. Measurements using an FPA detector (typically 32 × 32, 64 × 64 or 128 × 128) are rapid as such detectors allow for the acquisition of thousands of spectra simultaneously76; for a typical methodology see Supplementary Method 2. The acquired spectral data can be used to generate pseudocolor images of the target area such as shown in the characterization of prostate tissue77 and cervical biopsy samples28. The benefits of using a synchrotron radiation light source with FPAs also mean that much smaller pixel sizes can be used (e.g., 0.54 μm × 0.54 μm at some synchrotron facilities) resulting in higher spatial-resolution images of the target area76.

ATR-FTIR spectroscopy coupled to an array detector can allow for sample imaging down to diffraction-limited resolution for the spectral range of interest78. The spatial resolution of non-aperture-based techniques is determined by the optics chosen, and it has been shown that a germanium optic is preferential, although ZnSe and diamond crystals can also be used34. Although transmission and transflection imaging have been widely implemented in biological tissues, imaging in ATR mode is a versatile option, because little sample preparation is required owing to minimal sample-thickness restrictions, which thus means that it has been implemented in biological fields such as pharmacology and subcellular interrogation59,78,79.

Experimental design: sample preparation

Sample formats

The main sample formats for clinical IR spectroscopy are fixed cell and tissue samples, biofluids and live cells. Spectroscopic approaches can be used to examine tissues of human extraction (all require the appropriate ethical approval before their use). The type of sample used greatly determines which type of IR spectroscopy is appropriate and how it should be prepared for analysis. Table 2 shows the main types of samples and how they should be prepared for analysis.

Table 2

Sample types and preparation.

Sample thickness

Sufficient thickness of material needs to be placed onto the support matrix to allow a sufficiently large absorbance intensity to be recorded. In transmission and transflection modes, the specimen thickness needs to be adjusted appropriately: if it is too thick, the detector response function will be nonlinear so that Beer-Lambert’s law cannot be applied anymore. This has serious consequences for subsequent quantitative and classification analyses. In contrast, to achieve an adequate SNR and to avoid interactions of the evanescent wave with the underlying substrate, samples must also not be too thin. For example, when using ATR-FTIR spectroscopy, it is ideal if the specimen is three- or fourfold thicker than the penetration depth (that said, there is no maximum thickness for ATR-FTIR, and samples that are even a millimeter thick can be analyzed). This is pertinent for internal reflection measurements, which are commonly used for the disease diagnosis of biofluids; such samples can be naturally thinner in composition (especially with regard to cerebrospinal fluid (CSF), although this is not so much the case with blood or serum/plasma; serum, for example, is a solution containing a high protein concentration, ~80 mg ml−1). The effect of substrate interference on spectra, especially in reference to transflection measurements, has now been shown independently in the last year by several groups17,18,80. Given this, we would urge extreme caution regarding the use of Low-E slides with transflection measurements; with ATR-FTIR, it is unlikely that there will be optical effects associated with substrate.

Substrate choice

Proper consideration of the substrate (the slide or matrix) upon which the sample will be placed and any preparation steps associated with this are essential in order to acquire the best and most-reproducible spectra. For transmission measurements, this needs to be an IR-transparent material such as BaF2 or CaF2 (the latter, in particular, for live-cell IR spectroscopy), whereas for reflection or transflection measurements an IR-reflective substrate (e.g., Low-E slides) is required because glass alone absorbs the radiation and has a spectral signature in the mid-IR region81,82. Previously, it had been recommended that biological materials be placed on IR-reflective substrates. However, there now appears to be a shift in the general consensus that suggests that transmission or ATR spectroscopy measurements are more applicable to interrogation of biological material.

Microfluidic devices

Traditionally, aqueous sampling environments were unsuitable for IR spectroscopy because of the contribution of water. Development of microfluidic devices and processing to remove the water contribution has made it possible to achieve real-time, live-cell monitoring with IR spectroscopy. Nondestructive to cells, it better replicates physiological conditions; no labeling is required and the resolution is such that single cells can be studied83. The nondestructive nature of these methods has allowed studies to look at samples over time (e.g., stem cells in situ as they differentiate and chemical reactions in flow systems have been monitored84,85.

The key challenge of IR spectroscopy using microfluidics is associated with the materials’ transparency over the spectral range to be studied, and especially when live-cell monitoring is desirable. Many potential window materials are unsuitable on the basis of their water solubility (e.g., KBr and NaCl), toxicity toward the cells under observation (e.g., CdTe) or spectral dispersion (e.g., ZnS and BaF2)86. A flow chamber is used that combines IR transparency and robustness of diamond as window material. Although manufacture is complicated, the windows must be sufficiently thin (0.4–0.8 μm) to avoid multiple internal reflections86. CaF2 is extensively used as a window material, and a simple flow cell with inlet and outlet flow is constructed by clamping two CaF2 plates together. One of the plates is etched to form a 10-μm well, designed for the IR observation of live cells in aqueous media85. A similar device has been used for synchrotron IR spectroscopy of living cells using a surface micro-etched silicon substrate87. Further advances in the field have led to the development of sandwich devices and entirely polymeric devices.

Experimental design: spectral acquisition

Instrumental and operational settings to maximize spectral quality

When acquiring spectra, it is important to maximize as best as possible the SNR in order to produce high-quality spectral data (Table 3). There are a number of noise-related and signal-related parameters, with an effect on SNR, which can be altered depending on the instrument mode being used (e.g., point mode versus imaging)88-91. The instrumental and operational settings will be specific to the user experimental setup; Table 1 compares properties of different sampling modes for optimized spectral acquisition. An initial noise-related parameter that can be altered is the sampling aperture in point or mapping mode; this will reduce the SNR when the aperture size is reduced92. However, in imaging mode there is no aperture. The interferometer mirror velocity may also have an effect on SNR3. Weighting the interferogram with an apodization function will also contribute to a reduction in SNR, as this smoothing effect can incorporate spectral artifacts while one is attempting to optimize the information contained93. In general, the square root of the number of co-additions is proportional to the SNR, and therefore an increased number will enhance the SNR94.

Table 3

Typical conditions of the main variables affecting SNR in spectroscopy instruments.

IR spectroscopy has a spatial resolution that is limited by the diffraction limit; hence, as the resolution approaches this value, the SNR is reduced to a point where there is no further gain in image quality95 A synchrotron radiation source (e.g., at the IR Environmental Imaging Facility (IRENI) at the Synchrotron Radiation Centre (SRC)) in the mid-IR region is 1,000 times brighter than a thermal globar source and thus may generate enhanced SNR spectra when using apertures approaching the diffraction limit; however, when using an FPA detector, this cannot be exploited as the brightness is applied over a larger area. By using multiple beams, such as at IRENI, the single-beam disadvantage when using an FPA may be overcome.

It is important to consider that an optimized and well-aligned benchtop instrument is not considered to be inferior with regard to SNR or image quality to a general synchrotron-based machine63. A number of options regarding the detector can also have an effect on the SNR, such as the choice between a thermal detector versus a quantum detector. A mercury cadmium telluride (MCT) quantum detector usually provides a superior SNR than, for example, a thermal detector such as a deuterated triglycine sulfate detector96. An optimized cooling system in the detector, such as thermoelectrical cooling, will also reduce the dark current produced by the detector, which has been shown to have a detrimental effect on SNR97,98. In addition, signal-related parameters can affect the SNR; for instance, an increase in the optical path length can reduce spectral quality, which has been particularly important in the analysis of aqueous samples such as biofluids33. When producing spectral images with the help of multielement detectors, such as an FPA, one must consider optimizing the SNR. The authors point readers to the authoritative reference on FTIR spectroscopy by Griffiths and De Haseth3 for theoretical and instrumental discussions; this book has supported the authors since their undergraduate studies and continues to support them today3.

Water vapor and instrument purging

The presence of water vapor in the instrumentation and sample area can result in reduced transmission of IR light, potentially obscuring important spectral details even at low spectral resolutions often used in biomedical IR spectroscopy. Water vapor interference can be minimized by computational subtraction of a pure water vapor spectrum from the sample spectrum99. The efficacy of this compensation is limited and it is therefore considered crucial before spectral acquisition to purge the instrumentation with dry air or nitrogen and/or desiccants to remove any water vapor that may contaminate spectra between 1,350 and 1,950 cm−1, and between 3,600 and 3,900 cm−1 (ref. 100). By doing so, ambient CO2 is also purged, thereby reducing its contribution to the spectra.

Acquisition of sample and background

Measurements of an FTIR absorption spectrum involve collecting a ‘single-beam’ spectrum. A background single-beam spectrum provides the source intensity, as modified by the instrument; placing a sample in the beam path and measuring the single beam again, theoretically, provides just the additional effect of the sample absorbance. A logarithm (to the base 10) of the ratio of these quantities provides the absorbance, which is directly related to concentration by Beer’s law. With point spectra, a background spectrum is typically retained for recording 5–10 sample spectra and with each different sample to reduce the effects of constantly changing atmospheric conditions. As spectral maps are composed of a large number of point spectra acquired in a stepwise manner, it is necessary to set up background scans to be taken at set intervals (e.g., at the end of every row) to account for the atmospheric variation over the extended acquisition time66. When acquiring spectral images, background spectra should be acquired over a defined time period, depending on the sample acquisition time.

Experimental design: data processing

Data processing is carried out in a sequence of steps (Fig. 3) and the most important factor determining its workflow is the analysis goal; typical spectroscopy software programs used are shown in Table 4. Here we describe two analysis goals: imaging and diagnosis. Other goals not covered here include pattern finding and biomarker identification101,102.

Table 4

Some existing FTIR spectroscopy data analysis software.

Imaging is defined as data analysis that uses an unsupervised data processing method to reveal tissue structure on a ‘spectral cube’ acquired by a mapping or imaging technique. Imaging allows for the study of shape and penetration of important histopathological features on the basis of the underlying chemistry28.

In contrast, a diagnosis using IR spectroscopy requires a more complex framework that uses supervised classification methods. A supervised data processing method is one that uses classes assigned a priori to each IR spectrum as teaching information to build models that are used later to predict the classes of a data set that does not have classes associated with its spectra103,104. The modeling process for diagnosis requires separate training and testing stages and respective training and test data sets. The optimal size of a training data set (i.e., one that will maximize classification accuracy at a reasonable cost of data set generation) has been underinvestigated to date, but it has been suggested that it may be problem dependent105. For example, in a study, one could start with ten samples (acquiring 5–10 spectra from each sample), creating a trained model with eight samples and testing the model using the remaining two samples; one could then repeat this procedure four more times, each time using two different samples for testing and the remaining eight samples for training (this is called five-fold cross-validation). The number of times that the classifier correctly guessed the class of the testing sample would be counted to calculate a classification rate (i.e., the number of correct guesses divided by the total number of guesses). Next, one could acquire spectra from an additional five samples and repeat the cross-validation process, comparing the new classification rate with the old one (it is expected to improve). The process of adding samples and repeating cross-validation could continue until the classification rate stops improving.

It is important to note that a diagnostic framework may be set to use either point spectra or image maps; in the latter case, the trained classification system can be used to predict tissue structure.

We describe the following data analysis steps: pre-processing, feature extraction (FE), clustering (unsupervised classification) and supervised classification, and we exemplify some visualization options in the ANTICIPATED RESULTS section. Quality control is another step that is not covered in this protocol, but there are guidelines on this available in the literature105,106.


Pre-processing essentially aims to improve the robustness and accuracy of subsequent multivariate analyses and to increase the interpretability of the data by correcting issues associated with spectral data acquisition107. Pre-processing methods may be divided into de-noising, spectral correction, normalization and other manipulations; two or three methods are often combined (e.g., de-noising followed by spectral correction and normalization). The choices of pre-processing methods may depend on the analysis goal, the physical state of the sample, and the time and computing power available.

De-noising of IR spectra may be carried out with Savitzki-Golay (SG) smoothing, minimum noise fraction108 or wavelet de-noising (WDN)101. The latter is known to be the best method for eliminating high-frequency noise while still keeping intact high sharp peaks (this is essential in Raman spectra processing, but WDN works well on IR spectra too). Another option is to decompose the spectra by principal component analysis (PCA), and then reconstruct them from only a few of their principal components (PCs), thus discarding those PCs that represent mostly noise85,109.

Measurement characteristics that may require spectral correction include:

  • Sloped or oscillatory baselines that result from scattering, with resonant Mie scattering in biological materials being the most pronounced effect. The effects of sample (scattering centers, edges and substrates) have often been lumped together and the effects of the same on spectra are termed ‘artifacts’. Although this terminology was initially acceptable, it is now clear that there is a rational explanation for these effects and they arise merely from the coupling of morphology and optics. Hence, we will refer to these as morphological effects on spectra. There are two major efforts in understanding and resolving these effects to recover absorption spectra free from the effects of morphology. The first group of methods is termed ‘physics based’. In this approach, explicit optical image–formation modeling from first principles is used to predict and correct data. Here each sample effect (boundary scattering, scattering centers in the sample and substrate) needs to be explicitly accounted for. The theory has been shown to be generally valid and there are methods now for correcting the same for films, spheres and fibers16,110,111. Extension to more complex samples is still the subject of ongoing research. A second group of methods may be termed ‘model based’. In these methods, a model is assumed to explain all sample effects, typically, Mie scattering. Subsequently, rigorous theory is used to recover spectra, e.g., including extended multiplicative scattering correction (EMSC)112, resonant Mie scattering correction (RMi-eSC)113-115 and rubber band baseline correction116. An indirect way to deal with baseline slope is to apply first or second derivative to spectra using the SG algorithm. This alters the shape of the spectra, but may also resolve overlapped bands. Model-based methods will generally be faster than explicit modeling methods and may prove to be broadly useful but need to be validated in each case. A third approach, which was traditionally used but is now recognized to be of limited value, is to simply correct baselines with a piecewise linear approach. Obviously, this method is the fastest, as it requires the least effort to apply and no modeling. It is as yet unclear which of these methods works best.

  • Spectral contributions may arise from atmospheric water vapor, carbon dioxide, paraffin or other interfering compounds. Although these artifacts may be compensated mathematically through EMSC117 or other least-squares–based technique118, the most common actions are to remove contaminated spectral bands from the data set, improve the control of atmospheric conditions or take background spectra more often. In this aspect, before pre-processing, it is often useful to implement quality tests to verify SNR and minimize water vapor contribution. By following this approach, ‘bad quality’ spectra are discarded as they can influence subsequent analysis. The threshold values for defining ‘bad’ and ‘good’ spectra can be adjusted according to the biological application.

  • It is vital to normalize IR spectra to account for confounding factors such as varying thickness of sample. Common normalization methods are amide I/II peak normalization and vector normalization. Amide I/II normalization is often used after baseline correction, whereas vector normalization is often used after differentiation of spectra (after correction by differentiation, there is no longer a consistent amide I/II peak in the spectra to allow for amide I/II peak normalization). For imaging, leaving spectra non-normalized for chemical imaging or unsupervised clustering will reveal tissue structures primarily based on absorbance intensity, whereas normalization will highlight differences in biochemical structure. For diagnosis, some form of spectral normalization is conducted.

The optimal pre-processing method or sequence to apply is a subject of discussion and no universal best approach exists for all samples. Often the choices are based on the problems visually spotted in the spectra; a more objective criterion is to optimize the pre-processing method (e.g., through a genetic algorithm)119. In this protocol, we offer several alternatives based on cues identified by visual expression of raw (non-pre-processed) spectra, although objective validation will probably become more common in the future.


FE methods process the IR spectra to form new variables based on the original variables (which are absorbance intensities). FE has an important or even essential role in both imaging and diagnosis. For imaging, FE is responsible for generating a single value based on the whole of an input IR spectrum. This value can subsequently be used to set the color of a pixel in the image; FE is repeated for all spectra, thus forming the pseudocolor image. Popular FE methods for imaging include calculating the ratios between wavenumber absorbance intensities, area under a subregion of the spectrum, selecting a single wavenumber or an ensemble of wavenumbers, or performing PCA. PCA may be applied to the spectral data set, followed by selection of a single PCA factor for the color gradient.

For diagnosis, FE constitutes an important data reduction step in order to match the complexity of the subsequent supervised classifier with the amount of data available so as to avoid over-fitting or undertraining. PCA is one particular popular form of unsupervised FE that is used for this purpose103. The number of PCA factors to retain may be subject to optimization. One way out is to order the PCA factors from the most to the least discriminant on the basis of their P values as determined by a statistical test. The percentage of explained variance can also be taken into account. Within FE, the subgroup of feature selection (FS) methods is particularly interesting because it can confer biological interpretability (i.e., identify the wavenumbers most important for classification) to the classification system. Popular FS methods include forward FS120 and COVAR121. Variance analyses may also be used to select spectral variables for elimination122. Another approach to FS is to use spectral features that are obtained from a biochemical understanding of the problem123. These cases in which direct spectral interpretation is possible are termed metrics for measures of biochemical activity in the samples. It is important to note that not all metrics may be useful biomarkers. Thus, even FE may be a multistep process, (i.e., one in which metrics are converted to statistically relevant biomarkers).

Clustering (unsupervised classification)

Clustering aims at sorting different objects (i.e., spectra) into categories or clusters on the basis of a so-called distance measure124. Clustering methods such as hierarchical cluster analysis (HCA) and k-means clustering (KMC) are frequently used in IR-imaging studies to identify tissue morphology23,125. HCA groups spectra into mutually exclusive clusters; in IR-imaging studies, HCA-based segmentation is achieved by assigning a distinct color to the spectra in one cluster. Because each spectrum of an IR-imaging experiment has a unique spatial (x,y) position, pseudocolor segmentation maps can be easily generated by plotting specifically colored pixels as a function of the spatial coordinates.

Supervised classification

Supervised or concept-driven classification techniques are machine-learning techniques for creating a classification function from training data. These methods involve a supervised learning procedure in which models are created that map input objects (spectra) to desired outputs (class assignments). Popular supervised techniques are artificial neural networks, support vector machines (Supplementary Method 3), linear discriminant classifier11,103,126 and Bayesian inference-base methods77. Among the many criteria guiding the choice of classifier, the most important is probably the accuracy (related to sensitivity and specificity) when tested on an independent test data set. Other criteria include ease to train, computational time, spatial resolution considerations127 and software availability. Classifiers such as artificial neural networks and support vector machines may require a two-stage training, where the first stage is dedicated to finding optimal tuning parameters or architecture and the second stage fits the classifier model to the training data. Linear discriminant classifier (LDC) is a parameterless classifier that requires only the fitting stage. A general rule of thumb is that if two different classifiers are equally well performing on an independent test data set, the simplest one should be preferred over the more complex one, as simpler classifiers are more likely to be better generalizers103.



CRITICAL For sample preparation and analysis, please refer to Tables 1 and ​2 and the INTRODUCTION for further information.

  • FFPE blocks: see Reagent Setup for further information

  • Sample preparation: advice regarding collection of biofluids, cryosectioned tissue samples, fixed cells and live cells can be found in the Reagent Setup section! CAUTION Human tissues (including biofluids, cytology or FFPE blocks) should be obtained with appropriate local institutional review board (e.g., in the UK, this is a Local Research Ethics Committee (LREC)) approval; generally, ethical permission will be granted for a carefully designed study in which patient participants sign a consent form. Worldwide, studies using human tissues should adhere to the principles of the Declaration of Helsinki. Similarly, for research using animals, appropriate approvals are required; The Animals (Scientific Procedures) Act of 1986 is the legislation that regulates the use of animals in scientific procedures in the United Kingdom and this is enforced by the Home Office, which issues the licenses required.

Other reagents

  • ThinPrep (PreservCyt Solution, Cytyc)

  • SurePath (TriPath Care Technologies)

  • Formalin, 10% (vol/vol), neutral buffered (Sigma-Aldrich, cat. no. HT501128)

    ! CAUTION It is a potential carcinogen, an irritant and an allergenic. Always work in a fume hood while handling it.

  • Acetone (Fisher Scientific, cat. no. A/0600/17) ! CAUTION Its vapors may cause dizziness. Always work in a fume hood while handling it.

  • Ethanol, 2.5 liters (Fisher Scientific, cat. no. E/0600DF/17)

  • Virkon (Antec, DuPont, cat. no. A00960632)! CAUTION It is an irritant.

  • Paraplast Plus paraffin wax (Thermo Fisher Scientific, cat. no. SKU502004)

  • Xylene (Sigma-Aldrich, cat. no. 534056)! CAUTION It is a potential carcinogen, an irritant and an allergenic. Always work in fume hood while handling it.

  • Histoclear (Fisher Scientific, cat. no. HIS-010-010S)! CAUTION It is an irritant.

  • Isopentane (Fisher Scientific, cat. no. P/1030/08)! CAUTION It is an extremely flammable, irritant, aspiration hazard and toxic reagent. Always work in fume hood while handling it.

  • Optimal cutting temperature (OCT) compound (Agar Scientific, cat. no. AGR1180)

  • Liquid nitrogen (BOC, CAS no. 7727-37-9)! CAUTION May cause asphyxiation and contact with skin will cause burns. Wear cryoprotective clothing and use it in a fume hood.


Electronic equipment

For a list of commercial instruments available, please refer to Table 5

Table 5

Instruments and corresponding data acquisition software.


  • Low-E slides (Kevley Technologies, CFR)

  • BaF2 slides (Photox Optical Systems)

  • Silicon multi-well plate (Bruker Optics)

  • Superfrost slides: these can be obtained from various manufacturers, e.g., Menzel Glazer Superfrost slides (Menzel-Glaser, cat. no. AA00008132E); Thermo Scientific SuperFrost slides (Thermo Fisher Scientific); or Fisherbrand Superfrost slides (Fisher Scientific)


  • Coverslips (Thermo Fisher Scientific, cat. no. 102440)

  • Specac Golden Gate single-reflection diamond ATR accessory (Specac)

  • Microtomes: these can be obtained from various manufacturers, e.g., Microtome (Surgipath Medical Industries); Leica rotary microtomes (Leica Microsystems, Davy Avenue Knowlhill); or Bright Cryostat (Bright Instruments)

  • Microtome blades: these can be obtained from various manufacturers, e.g., Feather disposable microtome blades S35 (VWR, cat. no. SURG08315E), Edge-Rite disposable microtome blades (Thermo Fisher Scientific); or Leica Surgipath DB80 blade (Leica Microsystems)! CAUTION Blades are extremely sharp; handle and dispose of them with care.

  • Paraffin section mounting bath (40–75 °C; Electrothermal, cat. no. MH8515)

  • Desiccator: these can be obtained from various manufacturers, e.g., desiccator (Duran Group) or WHEATON Dry-Seal vacuum desiccators (Wheaton Industries)

  • Labofuge 400e (Heraeus Instruments)


FFPE blocks

These are prepared according to the standard methods used routinely in all pathology laboratories; the overall steps are: immerse fresh tissue in formalin solution that acts as a chemical fixative; dehydrate the tissue in sequential washes of xylene and ethanol; and embed the tissue in paraffin wax, which creates an airtight barrier. Tissue blocks can then be stored indefinitely at room temperature (20–22 °C).


These are primarily blood plasma or serum, but can also potentially include cerebrospinal fluid, saliva or urine. Typically, after acquisition, such samples should be stored in appropriate tubes at −85 °C until they are thawed before analysis.

Cryosectioned tissue samples

Tissue samples can be snap-frozen and stored at −80 °C before use. Tissue should be coated with optimal cutting temperature (OCT) compound before freezing, and it should be frozen with isopentane cooled with liquid nitrogen.

Fixed cells

Typically, these would originate from cytology specimens placed in a fixative buffer; an ideal example of this is cervical cytology. However, it could be extended to any cell type isolated in the form of a suspension in a preservative buffer solution.

Live cells

This is an emerging area within the field whereby viable cells can be spectrochemically analyzed, primarily in a constructed microfluidic platform (for a typical method, see Supplementary Method 4).



Two types of software are required: spectral acquisition and data analysis. Spectral acquisition software is normally provided by the instrument manufacturer. Most instrumentation software also provides a number of preprocessing and sometimes more advanced data analysis options. Various data analysis software programs and packages exist, ranging from those for general-purpose use to those targeting specific data analysis tasks (e.g., multiplicative curve resolution–alternating least squares (MCR-ALS)). A popular development environment and programming language is MATLAB ( in which customized software can be written for specific tasks. Python ( is another programming language that is becoming increasingly popular in the FTIR spectroscopy field, and it has the advantage of being open source. For a list of commonly used software and packages, please refer to Table 4.


Sample preparation

1| Prepare the samples by following the steps listed in one of the options given below. Perform the steps in option A for FFPE tissue samples; option B for cryosectioned tissue samples; option C for cytological specimens; and option D for biofluids.

Live cells may be prepared in three main ways for IR-transmission studies: grown directly onto IR substrates; grown in a 3D culture matrix (and then processed as described in options A and B); or fixed in suspension, e.g., as cervical cytological specimens in fixative obtained from hospital pathology laboratories. Cells that are fixed in suspension should be processed by following the steps in option C.

To grow cells on IR substrates, sterilize the IR substrate for 1 h in 70% (vol/vol) ethanol before growing cells directly onto the chosen IR substrate.

Cells grown onto IR substratesSterilize the IR substrate for 1 h in 70% (vol/vol) ethanol before growing cells directly onto the
chosen IR substrate. Generally, cellular materials are then fixed in order to preserve their architectural
integrity, and the samples are stored in a desiccator prior to spectral acquisition (Step 2).
Cells grown in 3D culture matrixCells may be grown on 3D culture matrices (a tissue culture environment or device in which live
cells can grow or interact with their surroundings in three dimensions), and subsequently fixed or
snap-frozen and sectioned as described for tissue samples in Step 1A and Step 1B

View it in a separate window

(A) FFPE tissue • TIMING 50 min

  • (i) Obtain FFPE tissue blocks of interest from a pathology laboratory.
  • (ii) Place a FFPE block onto an ice block for 10 min. Use a microtome to trim into the block to expose the entire tissue sample to the face of the block. This will ensure that a full tissue section is cut for analysis. Place trimmed blocks back on ice for 10 min.

    CRITICAL STEP Make sure that the blocks are cold before cutting sections. This hardens the wax, reducing the friction between the block surface and blade allowing a much smoother cut.

  • (iii) Cut a ribbon of 10-μm sections and float it onto a heated water bath (40–44 °C). Separate the individual sections with forceps.

    CRITICAL STEP Optimal tissue thickness for the maximum SNR should be determined in-house by applying variable thicknesses of sections (depending on the tissue type) to slides for IR interrogation, e.g., ~3 μm (e.g., for bone), 5 or 10 μm (e.g., for prostate tissue), and 15-μm serial sections to BaF2, CaF2 or Low-E slides. SNR is judged on the quality of the raw spectra; in particular, the presence of many narrow, sharp peaks indicates high noise. If using tissue for imaging and extraction of tissue cell type, sample thickness is not just an SNR issue. The thicker the tissue, the greater the chance of probing heterogeneous layers and perhaps multiple cell types, rendering the cell type signal less pure.

    CRITICAL STEP Depending on the melting point of the paraffin wax used for embedding tissue samples, the temperature of the water bath will need to be adjusted to prevent melting of the wax.

  • (iv) Prepare tissue slides by re-floating a single 10-μm-thick tissue section onto a BaF2, CaF2 or Low-E slide for FTIR microspectroscopy or ATR-FTIR spectroscopy. In our experience, a 5–10-μm section is the optimal thickness for maximum SNR.

    CRITICAL STEP As BaF2 slides can be 1 cm × 1 cm in size to fit common slide holders, a H&E-stained parallel section may be required to identify an area of interest for analysis. Once a section is floated onto the water bath, sections can be picked up on normal microscope slides, dissected using a scalpel for the area of interest and floated back onto water for application to BaF2 slide.

  • (v) Place the tissue slide in a 60 °C oven for 10 min.
  • (vi) De-wax the tissue slide by immersing it in xylene for 5 min at room temperature. Repeat this step twice with fresh xylene. For small, round slides that are difficult to handle during solvent immersion, slides can be encased into plastic histology cassettes that can be threaded round a large metal clip. The same procedure can be conducted using hexane.

    CRITICAL STEP For IR analysis, it is necessary to de-wax the tissue in order to probe unhindered the full wavenumber range. This is paramount as paraffin is known to have significant peaks at ~2,954, 2,920, 2,846, 1,462 and 1,373 cm−1. If there is uncertainty about paraffin removal, these regions of the spectrum can be removed from subsequent analysis. However, this comes at the cost of probing many solvent-resistant methylene components of the native tissue128,129.

  • (vii) Sequentially, wash and clear the tissue slide by immersing it in acetone or 100% ethanol for 5 min at room temperature.
  • (viii) Allow the tissue slide to air-dry before placing it into an adequate-sized Petri dish for storage in a desiccator.

    PAUSE POINT Slides can be stored in a desiccator before IR interrogation; in our experience, storage should be <1 year.

(B) Snap-frozen and cryosectioned tissue samples • TIMING 120 min + drying time (3 h)

! CAUTION Snap-freezing should be carried out in a fume hood while you are wearing cryoprotective gloves, clothing and a facemask.

  • (i) The fresh tissue should be no more than 2 cm in any one dimension; gently blot away any fluids from the surface, place a cryomold and fill the mold with OCT compound.
  • (ii) Fill a plastic cryobucket with 3–4 cm of liquid nitrogen. Pour isopentane into the stainless steel beaker until it is about 1–2 cm deep. Place the stainless steel beaker into the liquid nitrogen and allow temperatures to equilibrate (3–5 min).
  • (iii) Take the cryomold containing the tissue sample in OCT compound and use long forceps to lower it into the isopentane; hold until the OCT compound freezes (60–90 s).

Please, wait while we are validating your browser