.. -*- rst -*-

Change Log
==========

Release 0.5.2 (November 6, 2018)
--------------------------------
* Prevent exceptions when CULA Dense free is present (#146).
* Fix Python 3 issues with CUSOLVER wrapper functions (#145)
* Add support for using either CUSOLVER or CULA for computing SVD.
* Add support for using either CUSOLVER or CULA for computing determinant.
* Compressed Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
* Support for CUFFT extensible plan API (enh. by Bruce Merry).
* Wrappers for CUFFT size estimation (enh. by Luke Pfister).
* Wrappers for CUBLAS-XT functions.
* More wrappers for MAGMA functions (enh. by Nikul H. Ukani).
* Python 3 compatibility improvements (enh. by Joseph Martinot-Lagarde).
* Allow specification of order in misc.zeros and misc.ones.
* Preserve strides in misc.zeros_like and misc.ones_like.
* Add support for Cholesky factorization/solving using CUSOLVER (#198).
* Add cholesky() function that zeros out non-factor entries in result (#199).
* Add support for CUDA 8.0 libraries (#171).
* Workaround for libgomp + CUDA 8.0 weirdness (fix by Kevin Flansburg).
* Fix broken matrix-vector dot product (#156).
* Initialize MAGMA before CUSOLVER to prevent internal errors in certain
  CUSOLVER functions.
* Skip CULA-dependent unit tests when CULA isn't present.
* CUSOLVER support for symmetric eigenvalue decomposition (enh. by Bryant Menn).
* CUSOLVER support for matrix inversion, QR decomposition (#198).
* Prevent objdump output from changing due to environment language (fix by 
  Arnaud Bergeron).
* Fix diag() support for column-major 2D array inputs (#219).
* Use absolute path for skcuda header includes (enh. by S. Clarkson).
* Fix QR issues by reverting fix for #131 and raising PyCUDA version requirement 
  (fix by S. Clarkson).
* More batch CUBLAS wrappers (enh. by Li Yong Liu)
* Numerical integration with Simpson's Rule (enh. by Alexander Weyman)
* Make CUSOLVER default backend for functions that can use either CULA or
  CUSOLVER.
* Fix CUDA errors that only occur when unit tests are run en masse with nose or
  setuptools (#257).
* Fix MAGMA eigenvalue decomposition wrappers (#265, fix by Wing-Kit Lee).

Release 0.5.1 - (October 30, 2015)
----------------------------------
* More CUSOLVER wrappers.
* Eigenvalue/eigenvector computation (eng. by N. Benjamin Erichson).
* QR decomposition (enh. by N. Benjamin Erichson).
* Improved Windows 10 compatibility (enh. by N. Benjamin Erichson).
* Function for constructing Vandermonde matrix in GPU memory (enh. by N. Benjamin Erichson).
* Standard and randomized Dynamic Mode Decomposition (enh. by N. Benjamin Erichson).
* Randomized linear algebra routines (enh. by N. Benjamin Erichson).
* Add triu function (enh. by N. Benjamin Erichson).
* Support Bessel correction in computation of variance and standard 
  deviation (#143).
* Fix pip installation issues.

Release 0.5.0 - (July 14, 2015)
-------------------------------
* Rename package to scikit-cuda.
* Reductions sum, mean, var, std, max, min, argmax, argmin accept keepdims option.
* The same reductions now return a GPUArray instead of ndarray if axis=None.
* Switch to PEP 440 version numbering.
* Replace distribute_setup.py with ez_setup.py.
* Improve support for latest NVIDIA GPUs.
* Direct links to online NVIDIA documentation in CUBLAS, CUFFT wrapper
  docstrings.
* Add wrappers for CUSOLVER in CUDA 7.0.
* Add skcuda namespace package that contains all modules in scikits.cuda namespace.
* Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander
  Dieleman).
* Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
* Memoize elementwise kernel used by ifft scaling (#37).
* Speed up misc.maxabs using reduction and kernel memoization.
* Speed up misc.cumsum using scan and kernel memoization.
* Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
* Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
* Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
* Use ldconfig to find library paths rather than libdl (#39).
* Fix win32 platform detection.
* Add Cholesky factorization/solve routines (enh. by Steve Taylor).
* Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
* Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
* Python 3 compatibility improvements (enh. by Thomas Unterthiner).
* Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
* CULA-based matrix inversion (enh. by Thomas Unterthiner).
* Add add_diag() function (enh. by Thomas Unterthiner).
* Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
* Improved MacOSX compatibility (enh. by Michael M. Forbes).
* Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
* Get both major and minor version numbers from CUBLAS library when determining
  version.
* Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
* Fix library search on MacOS X (fix by capdevc).
* Fix library search on Windows.
* Add Windows support to CULA wrappers.
* Enable specification of memory pool allocator to linalg functions (enh.  by
  Thomas Unterthiner).
* Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
* Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
* Use ReadTheDocs theme when building HTML docs locally.
* Support additional cufftPlanMany() parameters when creating FFT plans (enh. by
  Gregory R. Lee).
* Improved Python 3.4 compatibility (enh. by Eric Larson).
* Avoid unnecessary import of cublas when importing fft module (enh. by Eric
  Larson).
* Matrix trace function (enh. by Thomas Unterthiner).
* Functions for computing simple axis-wise stats over matrices (enh. by Thomas
  Unterthiner).
* Matrix add_dot, add_matvec, div_matvec, mult_matvec functions (enh. by Thomas
  Unterthiner).
* Faster dot_diag implementation using CUBLAS matrix-matrix multiplication (enh.
  by Thomas Unterthiner).
* Memoize SourceModule calls to speed up various high-level functions (enh. by
  Thomas Unterthiner).
* Function for computing matrix determinant (enh. by Thomas Unterthiner).
* Function for computing min/max and argmin/argmax along a matrix axis
  (enh. by Thomas Unterthiner).
* Set default value of the parameter 'overwrite' to False in all linalg
  functions.
* Elementwise arithmetic operations with broadcasting up to 2 dimensions
  (enh. David Wei Chiang)

Release 0.042 - (March 10, 2013)
--------------------------------
* Add complex exponential integral.
* Fix typo in cublasCgbmv.
* Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
* Detect CUBLAS version without initializing the GPU.
* Work around numpy bug #1898.
* Fix issues with pycuda installations done via easy_install/pip.
* Add support for specifying streams when creating FFT plans.
* Successfully find CULA R13a libraries.
* Raise exceptions when functions in the full release of CULA Dense are invoked
  without the library installed.
* Perform post-fft scaling in-place.
* Fix broken Python 2.6 compatibility (#19).
* Download distribute for package installation if it isn't available.
* Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
* FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)

Release 0.041 - (May 22, 2011)
------------------------------
* Fix bug preventing installation with pip.

Release 0.04 - (May 11, 2011)
-----------------------------
* Fix bug in cutoff_invert kernel.
* Add get_compute_capability function and other goodies to misc module.
* Use pycuda-complex.hpp to improve kernel readability.
* Add integrate module.
* Add unit tests for high-level functions.
* Automatically determine device used by current context.
* Support batched and multidimensional FFT operations.
* Extended dot() function to support implicit transpose/Hermitian.
* Support for in-place computation of singular vectors in svd() function.
* Simplify kernel launch setup.
* More CULA routine wrappers.
* Wrappers for CULA R11 auxiliary routines.

Release 0.03 - (November 22, 2010)
----------------------------------
* Add support for some functions in the premium version of CULA toolkit.
* Add wrappers for all lapack functions in basic CULA toolkit.
* Fix pinv() to properly invert complex matrices.
* Add Hermitian transpose.
* Add tril function.
* Fix missing library detection.
* Include missing CUDA headers in package.

Release 0.02 - (September 21, 2010)
-----------------------------------
* Add documentation.
* Update copyright information.

Release 0.01 - (September 17, 2010)
-----------------------------------
* First public release.
