Data processing steps applied to the .xye data series. One or more XYEDataset data series are read into a datasets collection, referred to as dataset_0..dataset_n-1 here. dataset_0 is treated as the reference set for zero correction and normalisation operations - that is, it is left unchanged. The following order of operations is followed:
Step 1. “Zero correction” is applied. The detector angle has an uncertainty of the order of 0.001 degree between acquisition runs so that I vs 2theta curves will be misaligned with each other. This correction will try to shift the datasets dataset_1.._n-1 along the 2theta axis to ensure all peaks are aligned. A peak detection algorithm will try to identify the largest peak in each series by fitting a linear combination of a Gaussian+Lorentzian to the peaks. Alternatively we can ask the user to select an angle range using the GUI within which a peak lies that can be used for alignment purposes. Once the peak is fitted the series will be interpolated using spline interpolation and the centre location used to resample the series at the offset sample locations. Not all data sets contain peaks that can be used for alignment purposes. In such cases, no alignment should be attempted. A GUI checkbox will be provided to allow enabling/disabling of this step.
Steps 2-4. All the data series have approximately 0.5deg gaps in the data for angles not covered by the detector array. To overcome this, multiple data series are captured and merged, with the detector offset by different angles.
Step 2. “Identify gaps.” To merge the data from multiple offsets of the detector array, first the 5 samples on each side of the gap are discarded. The gap is identified by taking the difference in 2theta value between adjacent samples. Samples are usually spaced by about 0.00375deg, except at gaps where the spacing is about 0.5 deg.
Step 3. “Normalise data.” The scaler1.S8 value in the .parab file contains the incoming beam intensity. The intensity values of dataset_1.._n-1 are rescaled according to the incoming beam intensity for that data series to the incoming beam intensity of a selected reference dataset_r. i.e. all values in dataset n are multiplied by .parab_r_scaler1.S8/.parab_n_scaler1.S8. Problems in capturing data can result in datasets containing all zeros. This condition should be identified and an error message displayed to the user.
Step 4. “Merge data” or “Splice data”. To combine the overlapping data, the data points are combined and sorted in order of increasing angle. An interpolation spline is then fitted to the data and the data is resampled every 0.00375deg.
Step 5. The data is output as an .xye file along with a .parab file. A GUI checkbox option enables the .parab file contents to be prepended to the .xye file.
data is an Nx2 array of xy data. Gaps in the detector array lead to gaps in the data and corruption of the sample values adjacent to the gaps. Therefore this routine is called prior to merging the data for the two detector positions to remove 5 (currently) samples from each side of the identified gaps.
d1 and d2 are Nx3 arrays of xye data. Merge the datasets by combining all data points in x-sorted order. Assumes that the ‘gaps’ in d1 have been cleaned by a call to clean_gaps(). If any points have exactly duplicated x values the d2 point is not merged.
d1 and d2 are Nx3 arrays of xye data. Replace the ‘gaps’ in d1 by valid data in the corresponding parts of d2. Assumes that the ‘gaps’ in d1 have been cleaned by a call to clean_gaps().
This 2D interpolation routine is used by the 2D surface plot to generate values between actual plot series data values.
Fits a linear combination of a Gaussian+Lorentzian to the specified data allowing only the x-location to change. The data is assumed to contain a single peak similar in shape to that from which the fit parameters were obtained. <data_x> is a 1D array containing the 2theta values. <data_y> is a 1D array containing the intensity values. <plot> is a debug hook for generating a matplotlib plot window showing the fit. p0 is the 6-element parameter list: # p0[0] and p0[3] are the amplitudes of the respective components # p0[1] and p0[4] are the centres of the respective functions # p0[2] and p0[5] are the fit parameters of the respective functions # p0[6] is a flat background Returns a tuple containing two items. The second item is the success of the peak fitting (True=success, False=failure). If the fit is successful the first item is the 2theta value of the higher of the Gaussian or Lorentzian peak. Since there is only one free parameter here, I should really use a different fitter which would be faster, but this is fast enough.
Fits a linear combination of a Gaussian+Lorentzian+Constant to the specified data. The data is assumed to contain just a single peak to be fitted. <data_x> is a 1D array containing the 2theta values. <data_y> is a 1D array containing the intensity values. <plot> is a debug hook for generating a matplotlib plot window showing the fit. Returns a tuple containing two items: The first item is the 2theta value of the higher of the gaussian or lorentzian peak and the second item is the list of all parameters found by the fitting algorithm.
Note, scipy v0.11 has a wavelet-based peak finding function scipy.signal.find_peaks_cwt which should do a much better job than the approach taken here according to http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631518/.
Perform peak detection within the defined range <range_low> to <range_high> for a dataset pair. <dataset_pair> is a tuple taken from the MainApp.dataset_pairs set container that contains keys into <datasets>.
The lowest (odd numbered) position is fitted and this fit is used to obtain the offset in the pair. If successful, the fit result is appended to the XYEDataset item metadata. The fit result can be a float or None in the case of a series that could not be fitted. Returns True if the peaks were fitted successfully in both positions, else returns False.
Perform peak detection for every dataset within the defined range. If successful, the fit result is appended to the XYEDataset item metadata. The fit result can be a float or None in the case of a series that could not be fitted.
Makes a new filename based on the existing one. A filename is of the form foo_[nnnn].xye, foo_[nnnn].xy, foo_[descriptor]_[nnnn].xy, or foo_[descriptor]_[nnnn].xye where [nnnn] is a 4-digit sequence id, and [descriptor] is a code string that describes the processing that has been performed, where the code may only contain characters from the ordered list [‘n’,’s’,’m’,’g’,’b’,’t’] which, if included, will be in the order shown. The insertion string may be a character or character combination from the list, which is inserted into the descriptor. For filenames without a descriptor, one is created. For filenames of form foo.ext returns foo_[descriptor].ext
Regrid data onto a regular grid of n=points x-values.
For a pair of datasets, normalise the y-values of the second dataset with respect to the first, based on the measured beam intensity.
A min- and max-preserving rebinning function, used by the plotting routines to represent peaks faithfully in visualisations. Reduces the data along the “long” axis to a length equal to twice the closest multiple of the number of samples. The reduced data contains alternating values representing the minimum and maximum value in each interval over which the data was measured. <a> is a 2D array with each row containing a data series. <samples> is the desired final number of each of the min and max values in each row of the returned 2D array. i.e. each row will contain 2x samples values. Also returns <truncate_at> which is where the array must be truncated so it divides into the desired number of intervals, also allowing for an even number of intervals.
data is an Nx3 array of xye data. Returns an Nx3 array of the data resampled at x values starting at start and spaced by interval. If start==None the start x value is the first x value in the input data. If end==None the end x value is the last value less than or equal to the last x value in the input data.
d1 and d2 are Nx3 arrays of xye data. d1 is assumed to cover an angular range starting below that of d2 so we trim shave_number samples off the RHS of d1 before concatenating d2 to it starting at the next highest value of 2theta.
This is called by the plotting routines so we only collect those data series that are set to active in the dataset editor