.. -*- mode: rst -*-

=================================
 Images, headers and code design
=================================

In which we set out how we are thinking of medical image formats and
their commonalities.

Headers
=======

Headers contain two types of information:

#. *howto* data: stuff to tell you how to read the image array data from
   file. This must include the shape of the image array and the numeric
   representation (float32, int16), as well as implicit or explicit
   position of the data relative to the beginning of the data file
   (offset). It can be complicated; ECAT - for example - can contain
   more than one frame, and the datatype can be different for each
   frame.
#. *whatis* data: metadata about the meaning of the image array on file.
   We are interested in the relationship of the voxel positions in the
   data array to space in the real world.  In practice (SPM Analyze,
   NIfTI, MINC) this can always be represented as an affine relating
   voxel coordinates to real world coordinates.  We may also be
   interested in what the 'real world' is.  Neither MINC (1.x) nor
   Analyze stores this, but NIfTI can.

Howto data
----------

In order to get data out of files, any image reader will need either -
the header itself, or selected fields from the header.

Different images can make use of different parts of the header, because
the images will work with only a specified set of headers - as dictated
by the image itself.

* in-file data numeric type - ``io_dtype``.  This has no necessary
  relation to the dtype of the data in memory, because scaling factors
  may be applied.  For reading, we may not need this as part of the
  public interface, we can just use it internally to cast the read
  memory to an array.  Setting this attribute will change the output
  dtype on writing.  ECAT file format can have different dtypes per
  frame; for reading, we just cast up to a dtype that can hold all the
  frame dtypes; for writing, we may just write as one type, or disallow
  writing altogether.
* array shape - ``shape``.  
* byte offset - ``offset`` at which data starts.  This is not relevant
  for the way we currently read MINC files for example - and may not be
  relevant for ECAT files, in the sense that it may be the offset to
  only one of the frames in the image, and the frames are of different
  length.


Images
======

We think of an image as being the association of:

#. A data array, of at least three dimensions, where the first three
   dimensions of the array are spatial.
#. A transformation mapping the spatial array (voxel) coordinates to some real
   continuous space (real-world transform).
#. A definition of what this space *is* ('scanner', 'mni', etc).

.. note::

   Why are the first three dimensions spatial?  

   For simplicity, we want the transformation (above) to be spatial.
   Because the images are always at least 3D, and the transform is
   spatial, this means that the tranformation is always exactly 3D.  We
   have to know which of the N image dimensions are spatial. For
   example, if we have a 4D (space and time) image, we need to know
   which of the 4 dimensions are spatial.  We could ask the image to
   tell us, but the simplest thing is to assert which dimensions are
   spatial by convention, and obey that convention with our image
   readers.

   Right, but why the *first* three dimensions?

   Of course, it could be the last three dimensions.  We chose to use
   the first three dimensions because that is the convention encoded in
   the NIfTI standard, at least implicitly, and it will be familiar to
   users of packages like SPM.  Users of Numpy will have a slight
   preference for the first dimension of an array being the slowest
   changing on disk, and the instinct that time, rather than space, will
   usually be the slowest changing dimension on disk, but we didn't want
   to break the NIfTI and SPM conventions, on the basis of this
   instinct, because the instinct is difficult to explain to people who
   don't have it.

So, our image likely has::

   img.data
   img.affine
   img.output_space
   img.meta
   img.format

where meta is a dictionary and format is an object that implements the
image format API - see :ref:`image-formats`

This immediately suggests the following interface::

   img = Image(data, affine=None, output_space=None, 
         meta=None, format=None, filename=None)

The output space is a string

   img.output_space == 'mni'

When there is no known output space, it is ``None``.

The ``format`` attribute is part of the bridge pattern.  That is, the
``format`` object provides the implementation for things that an image
might want to do, or have done to it.  The format will differ depending
on the input or output type.  What might we want to do to an image? We
might imagine these methods::

   img.load(filename, format=None) # class method
   img.save(filename=None, format=None)
   img.as_file(filemaker=None, format=None)
   
and some things that formats generally support like::

   img.write_header(filename=None)
   img.write_data(data=None, filename=None, slicedef=None)
   
``img.as_file`` returns the image as saved to disk; the image might
completely correspond to something on disk, in which case it may return
its own filename, or it might not correspond to something on disk, in
which case it saves itself to disk with the given format, and returns
the filename it has used.

Data proxies - and lightweight images
-------------------------------------

A particular use-case is where we want to part-load the image, but we do
not yet want all the data, as the data can be very large and slow to
load, or take up a lot of memory.

For that case, the ``data`` attribute is a proxy object, subclassing
ndarray, that knows to load itself when the data is accessed.

The proxy object implements at least ``.shape``, but otherwise defers to
the on-disk version of the array.

The ``format`` object deals with this action.  That is, the ``data``
object will have a pointer to the ``format`` attribute in some form -
perhaps in the form of a ``format.get_data`` method.

Of course, this is an optimization, and does not affect the interface
for the ``Image`` (although it might affect the interface for
``Format``.


Empty image
-----------

This is a reminder of Souheil Inati's use-case - the iterative write.
Perhaps something like::

    empty_image = Image.empty(shape=(64,64,30,150), affine=np.eye(4))
    empty_image.set_filespec('some_image.nii.gz')
    empty_image.write_header()
    for i in range(150):
        slicer = (slice(None),)*3 + (i,)
        data = np.random.normal(size=(64,64,30))
        empty_image.write_data(data, slice=slicer)


Images and files and filenames
------------------------------

Various image formats can have more than one filename per image.  NIfTI
is the obvious example because it can be either a single file::

  some_image.nii

or a pair of files (like Analyze)::

  some_image.img
  some_image.hdr

SPM Analyze adds an optional extra data file in Matlab ``.mat`` format::

  some_image.img
  some_image.hdr
  some_image.mat

Of course there are rules / rules-of-thumb as to what extensions these
various filenames can be. 

We may want to associate an image with a filename or set of filenames.
But we may also want to be able to associate images with file-like
objects, such as open files, or anything else that implements a file
protocol.  

The image ``format`` will know what the ``image`` needs in terms of
files.  For example, a single file NIfTI image will need a single
filename or single file-like object, whereas a NIfTI pair will need two
files and two file-like objects.

Let's call a full specification of what the format needs a *filedef*.
For the moment, let's imagine that is a dictionary with keys ``image``,
``header``, and optional ``mat``.  The values can be filenames or
file-like objects.  A *filespec* is some argument or set of arguments
that allow us to fully specify a *filedef*.  

The simple case of a single-file NIfTI image::

   img = Image(data, filespec='some_image.nii')
   img.filedef == {'image': 'some_image.nii',
                   'header': 'some_image.nii'}

In this case, we haven't specified the format, and the Image constructor
tries to work out the format from the filespec.

Consider::

   img = Image(data, filespec='some_image.nii', 
               format=Nifti1SingleFormat)

also OK.  But::

   img = Image(data, filespec='some_image.nii', format=AnalyzeFormat)

might raise an error.

For SPM analyze format:

   img = Image(data, filespec='some_image.img', format=AnalyzeFormat)
   img.filedef == {'image': 'some_image.img',
                   'header': 'some_image.hdr'}

Now, for file-like objects::

   fobj = open('some_image.nii')
   img = Image(data, filespec=fobj)
   img.filedef == {'image': fobj,
                   'header': fobj}

might work - although the Image constructor would have to be smart
enough to work out that this was ``Nifti1SingleFormat``.  Or it might be
the default.

   img = Image(data, filespec=fobj, format=AnalyzeFormat)

might raise an error, on the lines of::

   FormatError('Need image and header file-like objects for Analyze')

- or it might just assume that you mean for the image and the header to
  be the same file.  Perhaps that is too implicit.




