Tutorial: Development of new analysis types

From relax wiki
Revision as of 12:33, 11 December 2018 by Bugman (talk | contribs) (→‎Target functions: Formatting.)
Jump to navigation Jump to search

The ordering of the steps here is very important:

System / unit test development

  • Assemble an input data set.
  • A real data set can be used but, if real data is used, a second data set calculated from the equations (synthetic data) would be good for the sanity of the developer.
  • Write a 'hypothetical' relax script to perform a basic analysis (this will of course not work until the steps below have been completed).
  • Calculate by hand what some of the results should be.
  • Add the data to the test_suite directories and create a system/unit test which runs the script and checks the hand-calculated values to machine precision (~1e-15).

Data input

  • If the data is completely new and is complex, this is the hardest part.
  • "User functions" need to be created to read new input data types (file or direct value input).
  • The backend of the user function needs to pack the data into the richly structured relax data store.
  • New data structures might need to be developed to handle new data concepts (e.g. for holding a full chemical shift tensor, which is currently not supported).

Specific analysis

  • A new analysis type needs to be created (an easy task).
  • This simply uses the "specific analysis API" to assemble the data for target functions, disassembles the optimsed results, and performs a number of other functions.
  • This is easy to develop as you go. If something is missing, a RelaxError appears telling the developer what specific analysis API function needs to be added.

Target functions

  • This takes the input data and implements functions, gradients and Hessians for the chi-squared optimisation function for the analysis.
  • The first and second partial derivatives of the equations should be calculated by hand (computer algebra system (CAS) software such as maxima, mathematica, etc. can do this easily). The availability of gradients and Hessians opens up the possibility to use Newton optimisation which, speed- and precision-wise, is well worth the effort! It also allows for the use of the Method of Multipliers (or Augmented Lagrangian) constraint algorithm which, if parameter constraints are required, e.g. 0 ≤ S2 ≤ 1, is an incredible algorithm.

Result visualisation

  • relax has many user functions for visualising results, but these functions can be refined for the output data (adding units for Grace graph plots, for example).

GUI implementation

  • This is not at all essential, but many students see this as the "icing on the cake" ;)

Documentation

  • A new chapter should be added to the relax manual.
  • This can simply be a documentation of the system test script, as a minimal introduction.

Additional test development (system/unit/GUI tests)

  • For any new developments.
  • For handling corner cases (difficult data sets, etc.).
  • For different models.
  • For new results visualisations.

Demo data sets

  • New relax users often ask for a demo data set to test out the software with. This is totally unnecessary and is only needed to sooth the uncertainty of the software user.
  • Simply bundle a data set and a functional relax script.
  • I have recently created a git repository for this (e.g. at https://sourceforge.net/p/nmr-relax/relax-demo/ci/master/tree/)

It is essential that step 1) is completed first. Then the developer knows that their task is complete once the test passes. Most of this development would be based on mimicking what is already in place in relax. There are many different analysis types contributed by many different people, and development by copying/mimicking is highly recommended. It is also good to have complete test coverage added at step 8), as this ensures that the analysis will be robust and never break as relax is further developed. That future-proofs the code. I have uploaded the relax git repositories to SourceForge, github, gitlab, and bitbucket (and hopefully soon to Savannah) so that the software should survive for decades to come, independent of my presence.

Speed

Note that a lot of time can be spent on improving the isolated target function code, after implementation. Troels did this for relaxation dispersion in relax, allowing relax to likely be the fastest implementation in the field for this analysis. This requires a lot of numerical tricks to take advantage of the BLAS and LAPACK routines found in the Python numpy package, converting the target functions to operate on matrices as much as possible. Speed comparisons are difficult, as the optimisation in relax is far more precise than any of the other software (precision and quality is often dropped to speed up the software). But I have the source code for many of the dispersion analysis softwares, or I directly asked the developers, and I think that only relax uses these routines. Note that the use of BLAS and LAPACK routines has orders of magnitude greater impact on software optimisation speed than the choice of programming language (Python vs. C), but that the target functions in relax can later be converted from Python to C for additional, though less impressive, speed ups.

See also